Science Thursday: Scalable Computing in Oceanography

Hugo Bowne-Anderson
September 13, 2020

Deepak Cherian, a physical oceanographer and project scientist at the National Center for Atmospheric Research, joins Matt Rocklin and Hugo Bowne-Anderson to discuss scalable computing in oceanography and how he leverages Dask, Xarray, and terabyte-scale datasets to study the physics of oceans.

At the National Center for Atmospheric Research, Deepak Cherian studies the physics of the equatorial Pacific ocean using large (approx 1TB) dense multi-dimensional array datasets generated by running FORTRAN-based numerical ocean models.

Deepak analyzes these large datasets leveraging such associated metadata using Xarray and Dask:

  • Xarray provides coordinate labels and named dimensions on top of nD arrays such as numpy or Dask arrays,
  • Combining Xarray and Dask allows Deepak to analyze large arrays with both Xarray's expressiveness and Daks's easy parallel scaling on NCAR's Cheyenne cluster.

After attending, you’ll know

  • How to leverage Xarray to build on top on numpy for coordinate and named data, such as in oceanography,
  • How to harness the power of Dask and Xarray to analyze terabytes of physical data,
  • The importance of software tools (such as Dask & Xarray) that enable easy, intuitive, convenient and scalable analysis of datasets, both big and small.

Level up your Dask using Coiled

Coiled makes it easy to scale Dask maturely in the cloud