is it possible to wait until `.persist()` finishes caching in dask?

  • Last Update :
  • Techknowledgy :

Yes, the functions you're looking for are aptly named wait and progress.

from dask.distributed
import wait, progress

The progress function takes any dask thing and renders a progress bar

>>> progress(x)[XXXXXXX................] 5.2 seconds

If you do not want a progress bar, or if you are in the Jupyter notebook, then you may want to separately use the wait function, which will block until the computations finish.

wait(x)

Suggestion : 2

Wait until computation completes, gather result to local process.,You can submit individual tasks using the submit method:,Use submit method to send individual computations to the cluster,Sometimes situations arise where tasks, workers, or clients need to coordinate with each other in ways beyond normal task scheduling with futures. In these cases Dask provides additional primitives to help in complex situations.

from dask.distributed
import Client

client = Client() # start local workers as processes
# or
client = Client(processes = False) # start local workers as threads
def inc(x):
   return x + 1

def add(x, y):
   return x + y

a = client.submit(inc, 10) # calls inc(10) in background thread or process
b = client.submit(inc, 20) # calls inc(20) in background thread or process
>>> a
<Future: status: pending, key: inc-b8aaf26b99466a7a1980efa1ade6701d>
>>> a
<Future: status: finished, type: int, key: inc-b8aaf26b99466a7a1980efa1ade6701d>

   >>> a.result() # blocks until task completes and data arrives
   11
c = client.submit(add, a, b) # calls add on the results of a and b
futures = client.map(inc, range(1000))

Suggestion : 3

You can turn any dask collection into a concrete value by calling the .compute() method or dask.compute(...) function. This function will block until the computation is finished, going straight from a lazy dask collection to a concrete value in local memory.,The collection is returned immediately and the computation happens in the background on the cluster. Eventually all of the futures of this collection will be completed at which point further queries on this collection will likely be very fast.,Concrete values in local memory. Example include the integer 1 or a numpy array in the local process.,Because this is a single future the result must fit on a single worker machine. Like dask.compute above, the client.compute method is only appropriate when results are small and should fit in memory. The following would likely fail:

>>> df = dd.read_csv('s3://...') >>>
   df.value.sum().compute()
100000000
>>> df.compute()
MemoryError(...)
>>> df = dd.read_csv('s3://...') >>>
   total = client.compute(df.sum()) # Return a single future >>>
   total
Future(..., status = 'pending')

   >>>
   total.result() # Block until finished
100000000
>>> future = client.compute(df) # Blows up memory
>>> df = dd.read_csv('s3://...') >>>
   df.dask # Recipe to compute df in chunks {
      ('read', 0): (load_s3_bytes, ...),
      ('parse', 0): (pd.read_csv, ('read', 0)),
      ('read', 1): (load_s3_bytes, ...),
      ('parse', 1): (pd.read_csv, ('read', 1)),
      ...
   }

   >>>
   df = client.persist(df) # Start computation >>>
   df.dask # Now points to running futures {
      ('parse', 0): Future(..., status = 'finished'),
         ('parse', 1): Future(..., status = 'pending'),
         ...
   }
futures = client.scatter(args) # Send data
future = client.submit(function, * args, ** kwargs) # Send single task
futures = client.map(function, sequence, ** kwargs) # Send many tasks

Suggestion : 4

© Copyright 2017-2022 H2O.ai. Last updated on Jun 14, 2022.

                # Make a config directory
                mkdir config

                # Copy the config.toml file to the new config directory.
                docker run--runtime = nvidia\
                   --pid = host\
                   --rm\
                   --init\ -
                   u `id -u`: `id -g`\ -
                   v `pwd` / config: /config \
                   --entrypoint bash\
                h2oai / dai - ubi8 - x86_64: 1.10 .3 .1 - cuda11 .2 .2.xx -
                   c "cp /etc/dai/config.toml /config"
                docker run--runtime = nvidia\
                   --pid = host\
                   --init\
                   --rm\
                   --shm - size = 256 m\ -
                   u `id -u`: `id -g`\ -
                   p 12345: 12345\ -
                   e DRIVERLESS_AI_CONFIG_FILE = "/config/config.toml"\ -
                   v `pwd` / config: /config \ -
                   v `pwd` / data: /data \ -
                   v `pwd` / log: /log \ -
                   v `pwd` / license: /license \ -
                   v `pwd` / tmp: /tmp \
                h2oai / dai - ubi8 - x86_64: 1.10 .3 .1 - cuda11 .2 .2.xx
export DRIVERLESS_AI_CONFIG_FILE = “/config/config.toml
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630