Dask is a library for parallel computing. It can be on its own, where it’s kinda like multiprocessing on steroids, or it can be used with other PyData libraries. Dask + Pandas is big Pandas, kinda like Spark. Dask + numpy is big Numpy. And so on with PyTorch or XGBoost or Airflow, etc.
You can deploy Dask anywhere with technologies like HPC Job schedulers, Kubernetes, or cloud deployment APIs. Coiled manages Dask clusters on the cloud, deploying within your own account (we never see your data). If you are truly on-prem (like on openstack), then we can connect you to other companies and contractors that can set things up for you.
Actually! That’s what we do! Dask is fully open source and you can deploy it yourself with any technology. Check out the Dask docs to get started.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Dask is great, but it’s kind of a pain to set up and use in a serious way. Coiled makes it easy to run Dask in the cloud. This includes things like managing clusters, tracking usage, limiting costs, etc.
Different tools for different folks. Spark is better if you’re doing pure SQL.
Dask is better if:
You have folks that like Python.
Dask is more Python native. For example the Pandas API is the same, and integration with all of the Python libraries is more native. Debugging is easier.
You need more flexibility.
Many people using Python need to do things that are more complicated than SQL. This includes complex workflows, ML, etc. Python teams tend to do a lot of crazy unstructured stuff.
Dask is great, but it’s kind of a pain to set up and use in a serious way. Coiled makes it easy to run Dask in the cloud. This includes things like managing clusters, tracking usage, limiting costs, etc.
Dask is a python library for parallel computing. It can be used on its own, where it’s kinda like multiprocessing on steroids, or it can be used with other PyData libraries. Dask + pandas is big pandas, like Spark. Dask + Numpy is big Numpy. And so on with PyTorch, XGBoost, Prefect, Airflow, etc...
You can deploy Dask anywhere with technologies like HPC job schedulers, Kubernetes, or cloud deployment APIs. See the Dask documentation. Coiled manages Dask clusters in the cloud, deploying within your own cloud provider account (we never see your data).
If you are truly on-prem (like on OpenStack) then we can connect you to other companies and contractors that can set things up for you. Send us a note and we’ll connect you to groups that we trust to do excellent work.
Actually! That’s what we do! Dask is fully open source and you can deploy it yourself with any technology. Check out the Dask docs to get started.
This ends up being easy to start, but kinda hard to actually use seriously in a corporate setting. Coiled makes this super easy on the cloud. See our Build vs. Buy page for more details.
Coiled manages resources in your cloud account. Most of our customers don’t trust us with their sensitive data. We work hard to manage cloud resources for you without ever having direct access to sensitive data. For more information on our security posture, see our Coiled Security page.
Coiled makes it easy to set up and use Dask in the cloud. This isn’t one big thing. It’s dozens of small things. For example Coiled does the following:
Dask is an open source project, like pandas or Jupyter. Coiled is a for-profit company around Dask. We work on Dask and we also sell a cloud platform to make it easy to deploy Dask in the cloud. Do you know Databricks? We're like that. Request a demo to give it a try.
Nothing until you use 10,000 CPU hours per month. After that we do usage-based pricing. If we’re managing $1M of spend on AWS/GCP then we expect to receive hundreds of thousands of dollars. If we’re managing hundreds of dollars then we don’t care and are happy to give away the product for free. In general we find that our cost-saving measures save customers more money than we charge. More details on our Pricing and Build vs. Buy pages.
Yes! You don’t even have to ask:
pip install coiled
coiled setup
Coiled is free to use for the first 10,000 CPU hours per month (you still have to pay your cloud provider). You don’t need to give us a credit card or anything. See our documentation to get started or reach out to us, we'd love to chat and are happy to help.
If you’re using Coiled you’ll find that our engineers reach out frequently. We’re constantly tracking failures and engaging with users to see how we can make Dask and Coiled better. If you want to ask questions we’re happy to help.
If you’re not using Coiled these engagements are less efficient and so it’s harder to help. For a few large partner organizations with heavy use of Dask on-prem we do sell Enterprise Dask Support.
Spark is great. If Spark does what you want, use Spark.
Dask is lower level and lighter weight than Spark. It is more flexible and can do more things. If you’re doing primarily SQL queries or common dataframe operations, Spark will likely outperform Dask. People tend to choose Dask for the following two reasons:
They like Python.
Dask is more Python native. For example the pandas API is the same, and integration with all of the Python libraries is more native. Debugging is easier.
They need more flexibility.
Many people using Python need to do things that are more complicated than SQL. This includes complex workflows, ML, etc. Python teams tend to do a lot of crazy unstructured stuff.
Check out our blog post Dask vs. Spark for more details.
Dask is a library for parallel computing. It can be on its own, where it’s kinda like multiprocessing on steroids, or it can be used with other PyData libraries. Dask + Pandas is big Pandas, kinda like Spark. Dask + numpy is big Numpy. And so on with PyTorch or XGBoost or Airflow, etc.
Dask is an open source project, like Pandas or Jupyter.Coiled is a for-profit company around Dask. We work on Dask and we also sell a cloud platform to make it easy to deploy Dask in the cloud. Do you know Databricks?
You can deploy Dask anywhere with technologies like HPC Job schedulers, Kubernetes, or cloud deployment APIs. Coiled manages Dask clusters on the cloud, deploying within your own account (we never see your data). If you are truly on-prem (like on openstack), then we can connect you to other companies and contractors that can set things up for you.
Dask is great, but it’s kind of a pain to set up and use in a serious way. Coiled makes it easy to run Dask in the cloud. This includes things like managing clusters, tracking usage, limiting costs, etc.
Actually! That’s what we do! Dask is fully open source and you can deploy it yourself with any technology. Check out the Dask docs to get started.
Different tools for different folks. Spark is better if you’re doing pure SQL.
Dask is better if:
You have folks that like Python.
Dask is more Python native. For example the Pandas API is the same, and integration with all of the Python libraries is more native. Debugging is easier.
You need more flexibility.
Many people using Python need to do things that are more complicated than SQL. This includes complex workflows, ML, etc. Python teams tend to do a lot of crazy unstructured stuff.