Science Thursday: Scalable Machine Learning with Dask

Hugo Bowne-Anderson
July 13, 2020

Join us for Coiled’s first machine learning at scale live stream!

Eric Ma, notorious Pythonista and data scientist at Novartis, joins regulars Matt Rocklin and Hugo Bowne-Anderson, to show how Dask and distributed compute allows him to accelerate his science and machine learning by several orders of magnitude.

We’ll cover both the opportunities and the challenges of scaling data science workloads with Dask using the real-world example of building a machine learning model of protein melting points:

  1. Check out how to pass protein sequences through a recurrent neural network (RNN) for machine learning feature engineering (really, we’re not just using buzz terms);
  2. Fit a machine learning model using Dask-accelerated random forests;
  3. See the types of pain points that arise in the process (more on those here).

If you know a bit of machine learning, you’ll learn how to scale up your data work to larger datasets with Dask.

If you’re comfortable with Dask, you’ll see how to seamlessly move from local data analysis to operating in the cloud at scale in a few minutes, making it easier to iterate and innovate: science!

Level up your Dask using Coiled

Coiled makes it easy to scale Dask maturely in the cloud