Use Case

Data Pipelines

Dask scales data workflows with libraries you already know and love.

Launch clusters from anywhere you run Python from local Jupyter notebooks to CI jobs to automation tools like Prefect and Airflow. It's straightforward to scale existing systems.

With its DataFrame interface, Dask can replace Spark for large-scale data engineering. Equivalent workloads are easier to develop and debug thanks to Dask's all-Python core and rich dashboard. They're also typically cheaper to run.

For complex workflows and evolving demands, Dask offers more flexibility than traditional data-engineering tools like Spark or SQL.

You can parallelize your own custom Python code, and drop down to lower-level abstractions like delayed tasks. For non-tabular data, Dask supports distributed NumPy Arrays and Futures.

For complex workflows and evolving demands, Dask offers more flexibility than traditional data-engineering tools like Spark or SQL.

You can parallelize your own custom Python code, and drop down to lower-level abstractions like delayed tasks. For non-tabular data, Dask supports distributed NumPy Arrays and Bags.

import dask.dataframe as dd
import coiled
from .local_tools import clean, score

records = dd.read_csv("s3://bucket/*.csv")
products = dd.read_parquet("s3://bucket/products.parq")

cleaned = records.map_partitions(clean)
enriched = cleaned.merge(products, on="pid")
scored = enriched.map_partitions(score)

with coiled.Cluster(n_workers=50).get_client() as client:
    scored.to_parquet("s3://bucket/scored.parq")
Dask + Coiled

Docs

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat.

Submit arbitrary functions for computation in a parallelized, eager, and non-blocking way.

Explore
Learn More arrow

Dask Bag implements operations like map, filter, groupby and aggregations on collections of Python objects.

Explore
Learn More arrow

What if you don’t have an array or dataframe? Instead of having blocks where the function is applied to each block, you can decorate functions with @delayed and have the functions themselves be lazy.

Explore
Learn More arrow
Heading

What if you don’t have an array or dataframe? Instead of having blocks where the function is applied to each block, you can decorate functions with @delayed and have the functions themselves be lazy.

Explore
Learn More arrow
Heading

What if you don’t have an array or dataframe? Instead of having blocks where the function is applied to each block, you can decorate functions with @delayed and have the functions themselves be lazy.

Explore
Learn More arrow
Heading

What if you don’t have an array or dataframe? Instead of having blocks where the function is applied to each block, you can decorate functions with @delayed and have the functions themselves be lazy.

Explore
Learn More arrow

With GitHub, Google or email.

Use your AWS or GCP account.

Start scaling.

$ pip install coiled
$ coiled setup
$ ipython
>>> import coiled
>>> cluster = coiled.Cluster(n_workers=500)