The Dask community is highly distributed with different teams working independently. This is powerful but sometimes makes it hard for people within the community to see everything that is going on. The Dask Heartbeat by Coiled is a monthly publication intended to centralize and broadcast Dask news over the previous month.
Freyam Mehta, Genevieve Buckley, Jacob Tomlinson, and others are doing exciting work around making task scheduling faster using high-level graphs. You can read more about the overall objectives in Faster Scheduling. As Genevieve writes in High Level Graphs update, there is ongoing work to use a Blockwise high-level graph layer wherever possible, investigate a high-level graph for Dask’s `map_overlap`, and visualize high-level graphs in Jupyter Notebooks.
Doug Davis helped add support for Dask Array equivalents of NumPy’s `histogram2d` and `histogramdd` functions. This feature is available in Dask version 2021.07.1 and above.
Guido Imperiale has continued working on active memory management and as of version 2021.07.2, the MALLOC_TRIM_THRESHOLD_ environment variable is set automatically on workers. Gabe Joseph from Coiled also continued improving Dask’s memory scheduling by short-circuiting root-ish checks for some group dependencies.
Over the month of July, both Dask and Distributed versions 2021.07.0, 2021.07.1, and 2021.07.2 were released.
Some highlights from the July Dask community meeting:
Full meeting notes are available here.
That’s it. Thanks for reading.
If you’re interested in taking Coiled Cloud for a spin, which provides hosted Dask clusters, docker-less managed software, and one-click deployments, you can do so for free today when you click below.