Data scientists increasingly solve large machine learning and data problems with Python. But historically Python struggled with parallel computing. This led many of us in the community to make Dask, a library for parallel computing and data science for Python.
Dask has been a go-to solution for scalability in the Python data science stack for years, with deep integrations to dozens of the most commonly used libraries. However, while some groups get great results from Dask, others struggle, typically with deployment challenges. Setting up distributed computing systems within an organizational environment is hard.
That's why we created Coiled, to increase the accessibility to computing for everyone. Today we’re happy to announce a new product to solve this problem, Coiled Cloud, as well as our recent funding.
Python drives some of the most exciting research today across several verticals:
These domains derive scientific and business value from massive amounts of data, showing what is possible when we apply community-based open source software to the world's toughest problems.
Unfortunately, many organizations have trouble adopting OSS institution-wide. PyData at scale is great, but only if you are able to
These problems are both critical to get right, and also outside the experience of most data science / machine learning practitioners. These devops challenges are the primary bottleneck to adoption of data tooling that we see today.
To address this, we’re launching a product, Coiled Cloud, which manages Dask across diverse contexts. Coiled provides everything that we’ve seen groups need in our long history of deploying Dask in different institutions.
https://youtu.be/aM6RfUhmqVw
We could talk about Coiled features for several blog posts, but for now we’re going to point you to the Coiled product page, the docs, and suggest that you give it a try (spinning up a cluster following the quickstart takes about two minutes). Instead we’re going to talk a little bit about our objectives, and some recent funding news.
Dask is used by both some of the largest companies on Wall Street, as well as by countless individual researchers and students around the world. At Coiled we strive to support all of these stakeholders as they in turn strive to impact their world.
This accessibility design constraint forced us to rethink how we architect hosted systems, focusing on accelerating the individual user experience. The result is, we believe, the smoothest way for anyone to scale computation today.
If you’re a data scientist and want to try it out, then try the following:
//$ pip install coiled//]]>//>>> import coiled
>>> cluster = coiled.Cluster()//]]>
You’ll be up and running in about a minute.
There are many things that you will want to change over time. You’ll want to create your own software environments. If you work for a company you’ll want to provide cloud credentials so that computations are run in your private account. If you live outside of North America you’ll want to specify different regions to run in. If you have students you’ll want to manage teams, craft notebooks, and share them.
Coiled supports all of these features, as you would expect, but it doesn’t force them. It starts simple, with room to grow. This results in a product that is both easy to get started, and also incredibly nimble.
While Coiled hosts Jupyter notebooks, it doesn’t force you into them. You can connect to Coiled from your laptop or other cloud services. This opens up Coiled to a whole host of other applications outside of the typical cloud hosted data science flow.
As an example, this makes it easy to couple Coiled to scientific imaging applications like Napari that operate on the desktop, or combine with other cloud services, like Prefect, without Coiled having to make explicit integrations. Like Dask itself, Coiled was designed for integration.
https://www.youtube.com/watch?v=KG_ye5qzFmk&feature=youtu.be
Coiled live stream on Napari with Nick Sofroniew and Talley Lambert
To get things started, we took on some investment. We’re happy with how this turned out.
Earlier this year, Coiled raised $5M in seed funding in a round co-led by Costanoa and IA Ventures, with individual investments by Kaggle co-founders, Anthony Goldbloom and Ben Hamner, Techammer spearheaded by Cloudera co-founder Jeff Hammerbacher, and early Mesosphere employee Tim Chen.
Their help and advice has been instrumental so far in setting up the company and we’re excited about the capacity that this gives us to explore this space quickly.
We hope that, along with contemporary advances in algorithms, user interfaces, and datasets, Coiled’s focus on accessible scalable computing helps us improve the ability for society at large to benefit from modern at-scale data science and machine learning.
Building a product that solves the challenges of modern data professionals means we’re taking every opportunity we can to speak with them. If you’d like to chat with us about the challenges you face in scaling your Pythonic data work, we’d love to hear from you.
If you’re doing data science and/or machine learning at scale and like to break things, we’d love for you to take Coiled for a test drive.
--matt, hugo, and the entire Coiled team