Scalable Hyperparameter Optimization with Optuna and Dask

We were recently joined by Crissman Loomis, AI Engineer at Preferred Networks, and James Bourbeau, Lead OSS Engineer at Coiled for a webinar on Training ML Models Faster: Scalable Hyperparameter Optimization with Optuna and Dask.

Optuna is a framework that automates hyperparameter optimization and Dask is a library for scaling Python. In the webinar, Crissman introduces hyperparameter optimization, demonstrates Optuna code, and talks in-depth about how Optuna works internally to make the process efficient. Then, James walks us through Dask’s integration with Optuna and how it can be used to scale hyperparameter optimization.

In this post, we will cover:

  • What are hyperparameters?
  • Evolution of hyperparameter optimization
  • A brief look at Optuna (more about Optuna in the webinar!)
  • Key takeaways about hyperparameter optimization and Optuna

What are Hyperparameters?

Hyperparameters are attributes that control the behavior of machine learning algorithms. They have a direct influence on the algorithm’s performance. A lot of times, these attributes are predefined or defined manually. In a general neural network, the number of layers, number of nodes in each layer, etc. are defined manually, and are all examples of hyperparameters.

We also find hyperparameters outside of machine learning applications. Wherever there are objective functions, you can expect hyperparameters. For example: linpack parameters, database performance settings, etc.

Finding the right set of hyperparameters, widely known as hyperparameter optimization, is important because it can have a significant impact on your application or model. In a specific machine learning example of object detection in images, Crissman and team were working to find the threshold to display a bounding box. The following slide shows the results before and after hyperparameter tuning.

Bad threshold hyperparameter creates multiple overlapping bounding boxes around each object. Good threshold hyperparameter creates an accurate single bounding box.
Bad and good threshold hyperparameter comparison

The difference is stark!

Moreover, the threshold was only one hyperparameter in the entire process. Crissman describes how hyperparameters can be found everywhere, from the ML models to even the chips at a hardware-level.

Hyperparameters at different points in the workflow - network trainer, detector model, suppression, chip.
Hyperparameters at different sections of the workflow

Traditionally, hyperparameter optimization is done manually. You start with some random values and get an accuracy reading. You then continue tweaking the hyperparameter values by hand and find the best accuracy using trial-and-error.

Ideally, we want to automate this process and that’s where Optuna comes in. As we see in the webinar, Optuna not only makes automation easy, but also helps find the right hyperparameters to adjust and provides a multitude of other helpful features!

Evolution of Hyperparameter Optimization

It’s interesting to look at the evolution people go through while working with hyperparameters.

Case 1: Not tuning hyperparameters

A significant number of people do not optimize hyperparameters (as found in a recent survey). Researchers who are replicating papers tend to use the same default hyperparameter values or use the baseline parameters.

Case 2: Manually fidgeting with hyperparameters

In the next stage, they realize the importance of hyperparameters. They fidget with the hyperparameters manually to find a satisfactory accuracy value.

Case 3: Grid search

After working with random values, the next step is making the process more systematic. They develop a complete grid using tools like an excel spreadsheet to make sure the entire hyperparameter space is searched.

Case 4: Using Optuna

Finally, they consider automating the process using a framework like Optuna.

A Brief Look at Optuna

Optuna is a very powerful open source framework that helps automate hyperparameter search. It is easy to implement and uses state-of-the-art algorithms to maximize efficiency. You can introduce Optuna into your workflow without making any major changes to your original code!

import optunadef objective (trial):    (your code here!)    return evaluation_scorestudy = optuna.create_study()study.optimize(objective, n_trials =(number of trials)

Optuna comes with a unique set of advantages over other tools and methods of hyperparameter optimization. For instance, some existing frameworks require you to define the search space before optimization using the library’s own syntax, but Optuna defines the search space during optimization using Python. This makes Optuna incredibly useful.

Internally, Optuna has a sampling strategy and a pruning strategy. Sampling refers to the process of finding relevant hyperparameters to optimize, and pruning involves stopping unpromising trails early. Learn more about the gears of the machine in the webinar recording!

Key Takeaways about Hyperparameter Optimization and Optuna

Crissman talks more about how Optuna works, different types of samplers within Optuna, pruning strategies, and some bonus benefits of using Optuna in the webinar. Some key takeaways include:

  1. Bayesian and evolutionary strategies used in Optuna help determine the best points to find the best hyperparameters, unlike random search where trial points are randomly distributed in the search space. The following slide shows how random search selects points randomly on the curve, while Optuna selects more points near the global minimum.
Two line graphs showing the points selected by random search vs optuna.
  1. Optuna provides a variety of samplers like TPE (Tree-Structured Parzen Estimator), GPC (Gaussian Processor), and CMA-EC (Covariance Matrix Adaptation - Evolutionary strategy) that have specific advantages. For example, GPC performs better with correlated hyperparameters while CMA-EC is strong when you have a large number of trials.
Algorithm Cheat Sheet.Less than 1000 trials or uses categoricals?1. Yes: Are the parameters correlated?    1.1. Yes: Gaussian Processes    1.2. No: TPE2. No: CMA-ES
Algorithm cheat sheet to select the right sampler
  1. Pruning can make optimization almost twice as fast! Optuna looks at the learning curve, compares performance with previous trials, and stops unpromising trials early, saving you a lot of compute time.
Graph showing some trials terminated early
  1. Automating hyperparameter optimization allows you to use a strong framework like Dask to scale-up and use more nodes in parallel. Crissman demonstrates this parallel execution in the webinar!
6 parallel process running Optuna
  1. Visualizations can help understand questions like: What are the most important hyperparameters? What is the contribution of each parameter to the overall performance of your algorithm? How do particular variables change over time?
optuna.visulaization.plot_param_importances(study)showing a horizontal bar graph of the most important hyperparameters

Distributed Hyperparameter Optimization

In the second part of the webinar, James demonstrates Dask-Optuna, a library for integrating Dask and Optuna. Dask-Optuna allows you to run optimization trials in parallel on a Dask cluster. James walks through an example -- optimizing several hyperparameters for an XGBoost classifier trained on the breast cancer dataset, and uses Coiled to create a remote Dask cluster on AWS for this demonstration.

Check out the webinar recording and follow along in the demo notebook!

Level up your Dask using Coiled

Coiled makes it easy to scale Dask maturely in the cloud