Science Thurssday: Scalable Machine Learning in Python

Hugo Bowne-Anderson
August 10, 2020

Tom Augspurger, who works at Anaconda maintaining libraries like pandas, Dask, and Dask-ML, joins Matt Rocklin and Hugo Bowne-Anderson to discuss scalable machine learning in Python.

Dask-ML provides tools for scalable machine learning. It works with libraries like scikit-learn and XGBoost to scale out to larger datasets or larger problems.

We’re fortunate to have great, high-performance libraries like NumPy, SciPy and Scikit-Learn for machine learning. They work great for problems that fit on a single machine. For larger problems, however, you’ll run into compute or memory constraints that slow down the iterative process of developing a machine learning model. Dask-ML let’s you restore that rapid cycle by scaling your familiar machine learning workflow with Dask.

After attending, you’ll know

  • When (and when not!) to reach for distributed machine learning tools
  • The different types of scaling challenges you might run into
  • How to distribute a hyperparameter search on a Dask Cluster
  • How to scale out to larger-than-memory datasets

Level up your Dask using Coiled

Coiled makes it easy to scale Dask maturely in the cloud