The Dask community is highly distributed with different teams working independently. This is powerful but sometimes makes it hard for people within the community to see everything that is going on. The Dask Heartbeat by Coiled is a bi-weekly publication intended to centralize and broadcast Dask news over the previous two weeks.
If you want something added to this list either send an e-mail at info@coiled.io, or tweet and tag @dask_dev and we’ll try to include it.
Dask and Distributed version 2020.12.0 was released last week. This release contains many updates (it’s the first release in two months). Some highlights include:
The newly released version 1.3.0 of XGBoost contains several updates that improve XGBoost + Dask integration. This is part of the larger effort to migrate the functionality of Dask-XGBoost into the mainline XGBoost codebase.
NVTabular, a library for processing tabular data needed to train and deploy recommender-systems models on GPUs, introduced a new Dask-CuDF backend to support scalable preprocessing. Rick Zamora (NVIDIA) outlines some recent NVTabular developments in this blogpost https://medium.com/rapids-ai/nvtabular-all-in-on-dask-6241b4e9ca19.
https://twitter.com/RAPIDSai/status/1337460938404810752
Nils Braun (Bosch) shares his blogpost using SQL to drive Dask on Kubernetes
https://twitter.com/_nilsbraun/status/1336433868749103105
Also, in other fun Dask/Pandas/SQL news, we discover that the Dask-SQL project also magically works on Pandas.
https://twitter.com/_nilsbraun/status/1334942010835234816
This is one nice side effect of the close partnership between the two projects.
The STUMPY library for time series analysis improves its dask support in its recent release.
https://twitter.com/stumpy_dev/status/1339287323796791298
Maintainers of the popular yt framework for computation and visualization of volumetric data are busy implementing Dask support. A slide deck on their recent progress is below
https://twitter.com/s_i_r_h_c/status/1337082862768689154
Dhavide Aruliah (Quansight) https://twitter.com/quansightai/status/1334161550504968192
Ben Zaitlen (NVIDIA) presented Dask to OSS maintainers and Life Science practitioners at the Chan Zuckerberg Initiative’s Essential Open Source Software for Science gathering.
Video available here (Dask was on Day Three at the end)
We’re also glad to announce that Genevieve Buckley will be joining full time in February as the Dask Life Science fellow (generously funded by the CZI EOSS program). We’ll have a more detailed announcement next month, and are very excited. Genevieve will be the first employee of Dask itself as an organization, rather than one of the supporting companies.
Holden Karau walks through how to deploy Jupyter Lab/Notebook on ARM on Kubernetes with Dask support in this blogpost https://scalingpythonml.com/2020/12/12/deploying-jupyter-lab-notebook-for-dask-on-arm-on-k8s.html
https://twitter.com/RAPIDSai/status/1336740763397308417
The American Geophysical Union runs an annual conference. Dask took this community by storm a couple of years ago with the Pangeo project. This year is no different
https://twitter.com/rabernat/status/1337476101719920641
For reference, CMIP is the Climate Model Intercomparison Project. It’s the standard multi-institutional model for climate change and one of the grander humanity-focused projects we see today.
There are many other happenings at this conference, including this announcement from the climpred project
https://twitter.com/realaaronspring/status/1337003047562735617
2i2c is looking to hire an open-source infrastructure engineer to work on cloud infrastructure for research and education using projects like JupyterHub and Dask. For more information, see their job posting at https://2i2c.org/job/osie-pangeo.
John Kirkham (NVIDIA) has continued making micro-optimization of the scheduler as part of a larger effort to boost performance:
And has recently begun to decouple the state machine and networking communication parts of the scheduler.
See https://jobqueue.dask.org/en/latest/changelog.html for the full list of changes.
That’s it. Thanks for reading all.
If you’re interested in taking Coiled Cloud for a spin, you can do so for free today when you click below.