====== Hyperparameter Tuning ====== Random search within a bounding box is a good baseline method ([[https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf|Bergstra 2012]]). Bayesian optimization methods can also be applied, see [[https://en.wikipedia.org/wiki/Hyperparameter_optimization#Bayesian|here]] for software implementations. See also [[https://en.wikipedia.org/wiki/Hyperparameter_optimization|Wikipedia - Hyperparameter Optimization]]. When publishing, it is recommended to report the method of tuning hyperparameters, the bounding box, and number of hyperparameter evaluations ([[https://arxiv.org/pdf/1909.03004.pdf|Dodge 2019]]). ===== Overviews ===== * [[https://en.wikipedia.org/wiki/Hyperparameter_optimization|Wikipedia - Hyperparameter Optimization]] * [[https://arxiv.org/pdf/2007.15745|Yang & Shami 2020 - On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice]] ===== Papers ===== * **[[https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf|Bergstra & Bengio 2012 - Random Search for Hyper-Parameter Optimization]]** Shows that random search is better than grid search * [[https://papers.nips.cc/paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf|Snoek et al 2012 - Practical Bayesian Optimization of Machine Learning Algorithms]] * [[https://arxiv.org/pdf/1508.05051.pdf|Murray & Chiang 2015 - Auto-Sizing Neural Networks: With Applications to n-gram Language Models]] * [[http://proceedings.mlr.press/v70/chen17e/chen17e.pdf|Chen 2017 - Learning to Learn without Gradient Descent by Gradient Descent]] Learns a black-box optimizer (gradient-free optimizer). Can be applied to hyperparameter tuning. * [[https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf|Golovin et al 2017 - Google Vizier: A Service for Black-Box Optimization]] Was, or still is, "the de facto parameter tuning engine at Google." * [[https://arxiv.org/pdf/1703.01785.pdf|Franceschi et al 2021 - Forward and Reverse Gradient-Based Hyperparameter Optimization]] Uses forward gradient for hyperpameter tuning * [[https://arxiv.org/pdf/1707.05589.pdf|Melis et al 2017 - On the State of the Art of Evaluation in Neural Language Models]] Uses Google Vizier for large-scale automatic black-box hyperparameter tuning * Asha: **[[https://arxiv.org/pdf/1810.05934.pdf|2018 - A System for Massively Parallel Hyperparameter Tuning]]**. A good method. Ray-tune has an [[https://docs.ray.io/en/latest/tune/api_docs/schedulers.html|implementation]] * **[[https://arxiv.org/pdf/1909.03004.pdf|Dodge et al 2019 - Show Your Work: Improved Reporting of Experimental Result]]** ===== Software ===== See also list of software in [[https://drive.google.com/uc?export=view&id=1UBPdRsJIy494_Go6KzCKLdHRYLbfMfMF#page=51|Ch 10, p. 322 (p. 51 in pdf)]] of [[book:HOML]]. * [[https://docs.ray.io/en/latest/tune/index.html|Ray-Tune]] (for PyTorch) [[https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html|tutorial]] * [[https://optuna.org/|Optuna]] Nicer interface than Ray-tune * Scikit-Optimize (skopt) ===== Related Pages ===== * [[nlp:Experimental Method]] * [[ml:optimizers#Gradient-Free Optimizers]] * [[Scaling Laws]]