AWS computation costs roughly the following today:
On DemandSpotCPU hour$0.04$0.0125GiB hour$0.0045$0.0015
On top of that different services charge a premium:
PremiumAWS EMR40%AWS SageMaker40%DataBricks100%
However, when you pre-commit to a large allocation then you can usually negotiate this down, and get extra perks like consulting hours.
So if you rent 100 4-core / 16 GiB machines for an eight-hour workday, and run Databricks
on them then your costs look like the following:
This is roughly the cost of a decently well paid data scientist, about $150k / year.
There are many things you could do to reduce these costs:
And yet the practice of leaving the cluster on all day is still common in many large companies today. This is because it’s often awkward and slow for data scientists to release old resources and request new ones frequently during their analysis. They also aren’t usually individually responsible for bearing costs.
So why haven’t companies made it easier to adaptively scale up and down and also use spot pricing by default?
The standard way to price cloud compute today is a percentage tax on top of infrastructure costs, like the 100% premium from Databricks, or the 40% premium from SageMaker. This premium model incentivizes cloud-based companies to keep you running infrastructure, even when you don’t need it.
This is especially concerning for data science workloads which are typically bursty, where an analyst might use a large cluster to load in a dataset and produce a plot for two minutes, and then they stare at that plot for twenty minutes while their machines idle. This workload is common, and results in 90% waste, and yet no one here is incentivized to reduce that waste.
There are many alternatives out there, all of which induce pathological behaviors.
We don’t have a good solution to this problem. We’re still trying to work through it ourselves and find a solution that both helps us earn money while aligning our incentives with our users, rather than with the commercial cloud providers.
If you have thoughts we’d love to hear them. Feel free to get in touch.