The Dask community is highly distributed with different teams working independently. This is powerful but sometimes makes it hard for people within the community to see everything that is going on. The Dask Heartbeat by Coiled is a bi-weekly publication intended to centralize and broadcast Dask news over the previous two weeks.
Dask and Distributed version 2020.12.0 was released last week. This release contains many updates (it’s the first release in two months). Some highlights include:
The newly released version 1.3.0 of XGBoost contains several updates that improve XGBoost + Dask integration. This is part of the larger effort to migrate the functionality of Dask-XGBoost into the mainline XGBoost codebase.
NVTabular, a library for processing tabular data needed to train and deploy recommender-systems models on GPUs, introduced a new Dask-CuDF backend to support scalable preprocessing. Rick Zamora (NVIDIA) outlines some recent NVTabular developments in this blogpost https://medium.com/rapids-ai/nvtabular-all-in-on-dask-6241b4e9ca19.
Nils Braun (Bosch) shares his blogpost using SQL to drive Dask on Kubernetes
Also, in other fun Dask/Pandas/SQL news, we discover that the Dask-SQL project also magically works on Pandas.
This is one nice side effect of the close partnership between the two projects.
The STUMPY library for time series analysis improves its dask support in its recent release.
Maintainers of the popular yt framework for computation and visualization of volumetric data are busy implementing Dask support. A slide deck on their recent progress is below
Ben Zaitlen (NVIDIA) presented Dask to OSS maintainers and Life Science practitioners at the Chan Zuckerberg Initiative’s Essential Open Source Software for Science gathering.
Video available here (Dask was on Day Three at the end)
We’re also glad to announce that Genevieve Buckley will be joining full time in February as the Dask Life Science fellow (generously funded by the CZI EOSS program). We’ll have a more detailed announcement next month, and are very excited. Genevieve will be the first employee of Dask itself as an organization, rather than one of the supporting companies.
Holden Karau walks through how to deploy Jupyter Lab/Notebook on ARM on Kubernetes with Dask support in this blogpost https://scalingpythonml.com/2020/12/12/deploying-jupyter-lab-notebook-for-dask-on-arm-on-k8s.html
The American Geophysical Union runs an annual conference. Dask took this community by storm a couple of years ago with the Pangeo project. This year is no different
For reference, CMIP is the Climate Model Intercomparison Project. It’s the standard multi-institutional model for climate change and one of the grander humanity-focused projects we see today.
There are many other happenings at this conference, including this announcement from the climpred project
2i2c is looking to hire an open-source infrastructure engineer to work on cloud infrastructure for research and education using projects like JupyterHub and Dask. For more information, see their job posting at https://2i2c.org/job/osie-pangeo.
John Kirkham (NVIDIA) has continued making micro-optimization of the scheduler as part of a larger effort to boost performance:
And has recently begun to decouple the state machine and networking communication parts of the scheduler.
See https://jobqueue.dask.org/en/latest/changelog.html for the full list of changes.
That’s it. Thanks for reading all.
If you’re interested in taking Coiled Cloud for a spin, you can do so for free today when you click below.