This article highlights some ongoing Dask work that I personally find interesting and that I think you might enjoy too :)
Also, if this is the sort of work that appeals to you, consider joining us at Coiled. Coiled - OSS Engineer
Table of Contents
https://github.com/dask/distributed/pull/4857 by Jacob Tomlinson (NVIDIA)
Jacob is adding more information to our HTML representations of Clients and clusters.
https://github.com/dask/dask/pull/7763 by Genevieve Buckley (NumFOCUS/Dask)
Similarly, we’ve been providing more visual context around Dask’s internal graph structure.
https://github.com/dask/distributed/pull/4784 (and many others) by Florian Jetter (Coiled)
About six months ago, we started making larger changes to Dask internals. Unfortunately, this resulted in instabilities in larger and more complex workloads. See Stability of the Dask library for discussion on this topic.
Over the last couple of months, Florian has been resolving many very small issues with the scheduler. That work culminated this week in smooth functioning of some of the largest and nastiest workloads to which we have access (thank you hedge funds).
No fancy picture on this one, but it’s exciting news :)
https://distributed.dask.org/en/latest/worker.html#memory-not-released-back-to-the-os and https://github.com/dask/distributed/pull/4874 by Guido Imperiale (Coiled)
A common cause of running out of memory in Dask workloads is unmanaged memory. There are several things to do to solve this problem, but a big one is visualization (making people aware of the problem) and providing strategies for remediation. Guido is working on automating solutions for the full problem, but in the meantime, we’ve enhanced the memory usage dashboard plot, and written up documentation that explains what’s going on and how to remediate it if desired.
On a personal note, it’s been surprising how many times Coiled engineers have looked at old issues and said, “You know, we should just run this again and look at the memory plot. I’ll bet that this is unmanaged memory”. This is a surprisingly common situation.
Some workloads that should flow through Dask easily without taking up much memory don’t. Instead, they exhibit poor data allocation, which results in excess communication and slows things down.
This is due to a subtle decision in task scheduling which is good in many but not all situations. In particular, this tends to hit large embarrassingly parallel computations followed by simple reductions, which are particularly common among earth science workloads.
Gabe has been working on cheap heuristics that can identify and correct for this situation in particular, while also applying to Dask’s scheduling more generally.
https://github.com/gjoseph92/dask-profiling-coiled/tree/main/results#cython-shuffle-nogc-20ms-batched-send-2 and https://github.com/dask/distributed/issues/4881#issuecomment-859208156 by Gabe Joseph (Coiled)
As part of the larger scheduler performance effort, we’ve taken a deeper look at our use of Tornado for communication. After accelerating the Dask scheduler with high-level graphs and Cythonization, we’ve become increasingly suspicious that communications play an increasingly large role in overhead (as everything else gets faster, previously unimportant things become important).
It was hard to be sure though because profiling network stacks is hard, both because it’s highly concurrent and because it engages system calls outside of Python, both of which screw with traditional performance measurement techniques.
Fortunately, this week we were able to confirm the relevance here by increasing the time batch window in Dask workers. This resulted in non-trivial performance increases.
This allows us to now develop on the networking stack with greater confidence of returns. We’re looking at several options, including dropping from Tornado down to Asyncio, which lets us better leverage uvloop/libuv, and also dynamically adjusting batching time windows up from our current default of two milliseconds.
https://github.com/dask/distributed/pull/4886 by Naty Clementi (Coiled)
Finally, we’ve been looking at a more scalable version of the beloved Task Stream plot. The task stream plot is the most commonly used dashboard plot. It tells the activity of every core in the cluster over time.
However, this plot doesn’t scale well. As your cores reach into the hundreds, it starts making less sense to visualize every one of them. Additionally, as you start processing thousands of tasks per second, it stops making sense to plot every one onto the screen. Our eyes just can’t make sense of this information.
But how do we replace this beloved plot? How do we answer the same questions that plot answers with different, aggregated charts?
A big step towards that is to visualize Task Groups. Task Groups are collections of related tasks. They scale with the complexity of a computation, not with the size of the data. They also store a ton of information about what’s going on. We’re now figuring out how best to convey this information visually in a way that is tasteful and pragmatic.