Launch clusters from anywhere you run Python from local Jupyter notebooks to CI jobs to automation tools like Prefect and Airflow. It's straightforward to scale existing systems.
With its DataFrame interface, Dask can replace Spark for large-scale data engineering. Equivalent workloads are easier to develop and debug thanks to Dask's all-Python core and rich dashboard. They're also typically cheaper to run.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat.
Submit arbitrary functions for computation in a parallelized, eager, and non-blocking way.
Dask Bag implements operations like map, filter, groupby and aggregations on collections of Python objects.
What if you don’t have an array or dataframe? Instead of having blocks where the function is applied to each block, you can decorate functions with @delayed and have the functions themselves be lazy.
Advanced techniques to bring messy datasets into a Dask DataFrame.
A large parallel DataFrame which mirrors the pandas API, internally composed of many smaller pandas DataFrames.
What if you don’t have an array or dataframe? Instead of having blocks where the function is applied to each block, you can decorate functions with @delayed and have the functions themselves be lazy.
What if you don’t have an array or dataframe? Instead of having blocks where the function is applied to each block, you can decorate functions with @delayed and have the functions themselves be lazy.
What if you don’t have an array or dataframe? Instead of having blocks where the function is applied to each block, you can decorate functions with @delayed and have the functions themselves be lazy.