The Dask community is highly distributed with different teams working independently. This is powerful but sometimes makes it hard for people within the community to see everything that is going on. The Dask Heartbeat by Coiled is a monthly publication intended to centralize and broadcast Dask news over the previous month.
There is an ongoing effort to refactor Dask’s documentation, led by Jacob Tomlinson and Julia Signell. As a first step, the documentation sphinx theme has been migrated to the Executable Book Theme, and the overall structure and navigation have been improved.
Gabe Joseph has been working on a promising new approach to Dask DataFrame shuffling. Shuffling refers to interacting with the entire dataset to complete a particular operation, which can get more expensive as the dataset gets larger. This experimental approach is performing well on terabytes of larger-than-memory datasets. To learn more and try the feature yourself, check out Better shuffling in Dask: a proof-of-concept.
Thanks to Erik Welch, dask.order now eagerly computes dependent tasks to allow the parent tasks to be released from memory. This update improves Dask’s memory usage. Learn more about the work and see the graph optimizations in this PR.
Guido Imperiale has continued to improve the Dask distributed scheduler’s memory management. It now has the capability to run multiple active memory managers in parallel and automatically purge any replicated tasks.
Jacob Tomlinson is working to improve the lifecycle of Dask clusters, which involves creating, listing, scaling, and deleting clusters. A key part of this effort is to improve how Dask manages and stores the state of a cluster, Jacob added this functionality in a recent PR.
Dask is a very active project with a lot of work happening simultaneously. Therefore, maintaining the project and making sure it stays stable is essential. We’d like to thank all the Dask maintainers who help review PRs, stay on top of CI, release Dask in a timely cadence, and build the awesome Dask community. :)
Some highlights from the October Dask community meeting:
Full meeting notes are available here.
That’s it. Thanks for reading.
If you’re interested in taking Coiled Cloud for a spin, which provides hosted Dask clusters, docker-less managed software, and one-click deployments, you can do so for free today when you click below.