We were recently joined by Crissman Loomis, AI Engineer at Preferred Networks, and James Bourbeau, Lead OSS Engineer at Coiled for a webinar on Training ML Models Faster: Scalable Hyperparameter Optimization with Optuna and Dask.
Optuna is a framework that automates hyperparameter optimization and Dask is a library for scaling Python. In the webinar, Crissman introduces hyperparameter optimization, demonstrates Optuna code, and talks in-depth about how Optuna works internally to make the process efficient. Then, James walks us through Dask’s integration with Optuna and how it can be used to scale hyperparameter optimization.
In this post, we will cover:
Hyperparameters are attributes that control the behavior of machine learning algorithms. They have a direct influence on the algorithm’s performance. A lot of times, these attributes are predefined or defined manually. In a general neural network, the number of layers, number of nodes in each layer, etc. are defined manually, and are all examples of hyperparameters.
We also find hyperparameters outside of machine learning applications. Wherever there are objective functions, you can expect hyperparameters. For example: linpack parameters, database performance settings, etc.
Finding the right set of hyperparameters, widely known as hyperparameter optimization, is important because it can have a significant impact on your application or model. In a specific machine learning example of object detection in images, Crissman and team were working to find the threshold to display a bounding box. The following slide shows the results before and after hyperparameter tuning.
The difference is stark!
Moreover, the threshold was only one hyperparameter in the entire process. Crissman describes how hyperparameters can be found everywhere, from the ML models to even the chips at a hardware-level.
Traditionally, hyperparameter optimization is done manually. You start with some random values and get an accuracy reading. You then continue tweaking the hyperparameter values by hand and find the best accuracy using trial-and-error.
Ideally, we want to automate this process and that’s where Optuna comes in. As we see in the webinar, Optuna not only makes automation easy, but also helps find the right hyperparameters to adjust and provides a multitude of other helpful features!
It’s interesting to look at the evolution people go through while working with hyperparameters.
Case 1: Not tuning hyperparameters
A significant number of people do not optimize hyperparameters (as found in a recent survey). Researchers who are replicating papers tend to use the same default hyperparameter values or use the baseline parameters.
Case 2: Manually fidgeting with hyperparameters
In the next stage, they realize the importance of hyperparameters. They fidget with the hyperparameters manually to find a satisfactory accuracy value.
Case 3: Grid search
After working with random values, the next step is making the process more systematic. They develop a complete grid using tools like an excel spreadsheet to make sure the entire hyperparameter space is searched.
Case 4: Using Optuna
Finally, they consider automating the process using a framework like Optuna.
Optuna is a very powerful open source framework that helps automate hyperparameter search. It is easy to implement and uses state-of-the-art algorithms to maximize efficiency. You can introduce Optuna into your workflow without making any major changes to your original code!
Optuna comes with a unique set of advantages over other tools and methods of hyperparameter optimization. For instance, some existing frameworks require you to define the search space before optimization using the library’s own syntax, but Optuna defines the search space during optimization using Python. This makes Optuna incredibly useful.
Internally, Optuna has a sampling strategy and a pruning strategy. Sampling refers to the process of finding relevant hyperparameters to optimize, and pruning involves stopping unpromising trails early. Learn more about the gears of the machine in the webinar recording!
Crissman talks more about how Optuna works, different types of samplers within Optuna, pruning strategies, and some bonus benefits of using Optuna in the webinar. Some key takeaways include:
In the second part of the webinar, James demonstrates Dask-Optuna, a library for integrating Dask and Optuna. Dask-Optuna allows you to run optimization trials in parallel on a Dask cluster. James walks through an example -- optimizing several hyperparameters for an XGBoost classifier trained on the breast cancer dataset, and uses Coiled to create a remote Dask cluster on AWS for this demonstration.
Check out the webinar recording and follow along in the demo notebook!