People new to dask often ask “Who uses Dask?”
They typically mean one of two questions:
The answer to both questions is “definitely”. Dask users are numerous and varied. The user community spans a wide range of applications and professions. This post tries to paint a very rough sketch of Dask’s current user base.
The majority of Dask users are professionals who …
Dask can be a good fit for anyone with these criteria. As we’ll see this includes data folk working in finance, geosciences, biomedical research, urban planning, and astrophysics, among many other verticals.
Yes. Dask has grown in an organic grass-roots way over the last five years. As a result the Dask community is far reaching, and has roots in most data communities today. Quantifying the adoption of a community-driven open source project is notoriously hard. Last February we looked at download counts and page views and decided that even though Dask has 100k daily downloads, the truer user count is probably closer to 15,000 (honest metrics are hard to do well).
Judging from questions on Stack Overflow and issues posted on Github issue trackers, these users come from every conceivable industry and scientific discipline where Python is heavily used.
Additionally, about 5% of Python users use Dask (according to the Python user survey). This is less than a project like Apache Spark, but more than a project like Apache Hive.
If your field solves data-intensive problems and prefers Python, then the answer is “almost surely”.
The easiest way to break up who uses Dask is to talk about how people use Dask, and list sectors and companies within each common usage pattern.
Many traditional data science sectors use Dask with data science libraries like Pandas, Scikit-Learn, nltk, spacy, and others. These include companies in banking and finance, like Capital One and JP Morgan Chase, insurance, like All State and State Farm, government and city planning, like the US Government, and the NYPD.
Other groups in finance (like Blackrock, TD Ameritrade), heavy industry (like Caterpillar), automotive (like Tesla), and logistics (like JDA/Blue Yonder) use Dask to analyze large volumes of telemetry to reduce costs and plan daily activities.
Dask is unique among data analytics tooling in that it can operate on gridded data structures in 2d (like satellite or microscope images), 3d volumes (like fluid dynamics simulations or MRI scans), 4d volumes (like climate simulations) at scale.
This has brought a huge set users in
And many, many more.
Through a multi-year collaboration with the Scikit-Learn developers at Columbia University, Inria, and other institutions, as well as a large backing by NVIDIA (a sizable number of Dask developers are employed by NVIDIA), Dask has established itself as a pragmatic choice for traditional machine learning. In practice these are only really necessary for larger institutions with very large datasets, like Capital One and Walmart, for whom the native integration with XGBoost has been particularly helpful.
The fact that Dask exposes its internal task scheduler to users has made it incredibly valuable for institutions with highly complex workloads, especially those that change rapidly. This makes Dask a good fit for quantitative trading firms, as well as credit lenders like Capital One and Barclays.
Bespoke parallelism also enables a number of other projects which have spun up around Dask, each of which brings in their own set of users. So to answer the question of “who uses Dask?” we also have to ask “who uses … “
And this is the really exciting trend of where we see Dask going today. Increasingly we see people build more domain specific systems on top of Dask, and that brings new verticals into the fold. Practitioners from a specific domain are able to use Dask to build scalability directly into a software solution that is just right for their audience. They understand how their users think and what they want much better than the Dask core team ever could.
I’ll be speaking more about this at AnacondaCon in June. Sign up for the Coiled email list below and we’ll share the video with you once it’s live!