ml:hyperparameter_tuning

Table of Contents

Hyperparameter Tuning

Hyperparameter Tuning

Random search within a bounding box is a good baseline method (Bergstra 2012). Bayesian optimization methods can also be applied, see here for software implementations. See also Wikipedia - Hyperparameter Optimization. When publishing, it is recommended to report the method of tuning hyperparameters, the bounding box, and number of hyperparameter evaluations (Dodge 2019).

Overviews

Papers

Bergstra & Bengio 2012 - Random Search for Hyper-Parameter Optimization Shows that random search is better than grid search
Snoek et al 2012 - Practical Bayesian Optimization of Machine Learning Algorithms
Murray & Chiang 2015 - Auto-Sizing Neural Networks: With Applications to n-gram Language Models
Chen 2017 - Learning to Learn without Gradient Descent by Gradient Descent Learns a black-box optimizer (gradient-free optimizer). Can be applied to hyperparameter tuning.
Golovin et al 2017 - Google Vizier: A Service for Black-Box Optimization Was, or still is, “the de facto parameter tuning engine at Google.”
Franceschi et al 2021 - Forward and Reverse Gradient-Based Hyperparameter Optimization Uses forward gradient for hyperpameter tuning
Melis et al 2017 - On the State of the Art of Evaluation in Neural Language Models Uses Google Vizier for large-scale automatic black-box hyperparameter tuning
Asha: 2018 - A System for Massively Parallel Hyperparameter Tuning. A good method. Ray-tune has an implementation
Dodge et al 2019 - Show Your Work: Improved Reporting of Experimental Result

Software

See also list of software in Ch 10, p. 322 (p. 51 in pdf) of HOML.

Ray-Tune (for PyTorch) tutorial
Optuna Nicer interface than Ray-tune
Scikit-Optimize (skopt)

Related Pages

ml/hyperparameter_tuning.txt · Last modified: 2025/03/06 10:20 by jmflanig