Parallelize your custom Python workflows by scheduling tasks with Dask Futures. Learn to scale for-loop-y code that scrapes, parses, and cleans data from Stack Overflow.
Learn best practices for working with larger-than-memory DataFrames. Tune Parquet storage, navigate inconvenient file sizes and data types, optimize queries, build features, and explore the challenging NYC Uber/Lyft dataset with pandas and Dask.
This 6-module tutorial walks through the basics of Dask and shows how PyData users can unlock the power of parallel computing.
Dask and Pandas work together to provide intuitive data processing at very large scale. This video loads a few hundred gigabytes of Parquet data loaded from Amazon S3, and then does some basic analysis. It gives a sense for how Dask and Pandas are to use together.
You have a Python script. It works. You just need to run it on 100x more data.