ml:hyperparameter_tuning
Table of Contents
Hyperparameter Tuning
Random search within a bounding box is a good baseline method (Bergstra 2012). Bayesian optimization methods can also be applied, see here for software implementations. See also Wikipedia - Hyperparameter Optimization. When publishing, it is recommended to report the method of tuning hyperparameters, the bounding box, and number of hyperparameter evaluations (Dodge 2019).
Overviews
Papers
- Bergstra & Bengio 2012 - Random Search for Hyper-Parameter Optimization Shows that random search is better than grid search
- Chen 2017 - Learning to Learn without Gradient Descent by Gradient Descent Learns a black-box optimizer (gradient-free optimizer). Can be applied to hyperparameter tuning.
- Golovin et al 2017 - Google Vizier: A Service for Black-Box Optimization Was, or still is, “the de facto parameter tuning engine at Google.”
- Franceschi et al 2021 - Forward and Reverse Gradient-Based Hyperparameter Optimization Uses forward gradient for hyperpameter tuning
- Melis et al 2017 - On the State of the Art of Evaluation in Neural Language Models Uses Google Vizier for large-scale automatic black-box hyperparameter tuning
- Asha: 2018 - A System for Massively Parallel Hyperparameter Tuning. A good method. Ray-tune has an implementation
Software
See also list of software in Ch 10, p. 322 (p. 51 in pdf) of HOML.
- Optuna Nicer interface than Ray-tune
- Scikit-Optimize (skopt)
Related Pages
ml/hyperparameter_tuning.txt · Last modified: 2025/03/06 10:20 by jmflanig