User Tools

Site Tools


ml:normalization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:normalization [2021/03/04 14:19] jmflanigml:normalization [2023/06/15 07:36] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== Normalization ====== ====== Normalization ======
 Normalization can improve the optimizer's ability to train a neural network.  There are two main categories of normalization procedures: activation normalization and weight normalization ([[https://arxiv.org/pdf/2003.07845.pdf|Shen 2020]]). Normalization can improve the optimizer's ability to train a neural network.  There are two main categories of normalization procedures: activation normalization and weight normalization ([[https://arxiv.org/pdf/2003.07845.pdf|Shen 2020]]).
 +
 +===== Overviews =====
 +  * Blog post: [[https://medium.com/techspace-usict/normalization-techniques-in-deep-neural-networks-9121bf100d8|2019 - Normalization Techniques in Deep Neural Networks]]
  
 ===== Activation Normalization Schemes ===== ===== Activation Normalization Schemes =====
Line 10: Line 13:
     * [[https://arxiv.org/pdf/1510.01378.pdf|Laurent et al 2015 - Batch Normalized Recurrent Neural Networks]]     * [[https://arxiv.org/pdf/1510.01378.pdf|Laurent et al 2015 - Batch Normalized Recurrent Neural Networks]]
     * [[https://arxiv.org/pdf/1603.09025.pdf|Cooijmans et al 2016 - Recurrent Batch Normalization]]     * [[https://arxiv.org/pdf/1603.09025.pdf|Cooijmans et al 2016 - Recurrent Batch Normalization]]
-  * [[https://arxiv.org/pdf/1806.02375.pdf|Bjorck et al 2018 - Understanding Batch Normalization]]+  * [[https://arxiv.org/pdf/1806.02375.pdf|Bjorck et al 2018 - Understanding Batch Normalization]]. See also section 3 of [[https://arxiv.org/pdf/2002.10444.pdf|De & Smith 2020 - Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks]] for a different perspective.
   * [[https://arxiv.org/pdf/2003.07845.pdf|Shen et al 2020 - PowerNorm: Rethinking Batch Normalization in Transformers]]   * [[https://arxiv.org/pdf/2003.07845.pdf|Shen et al 2020 - PowerNorm: Rethinking Batch Normalization in Transformers]]
  
 ==== Layer Normalization ==== ==== Layer Normalization ====
   * [[https://arxiv.org/pdf/1607.06450.pdf|Layer Normalization]]   * [[https://arxiv.org/pdf/1607.06450.pdf|Layer Normalization]]
-  * [[https://arxiv.org/pdf/1910.07467.pdf|RMSNorm]]. Improvement to layer normalization. Computationally more efficient, and gives improved invariance properties. Confirmed by [[https://arxiv.org/pdf/2102.11972.pdf|Narang et al 2021]] to work well.+  * [[https://arxiv.org/pdf/1910.07467.pdf|RMSNorm]]. Improvement to layer normalization. Computationally more efficient, and gives improved invariance properties. Shown to work well for Transformers by [[https://arxiv.org/pdf/2102.11972.pdf|Narang et al 2021]].
  
 ===== Weight Normalization Schemes ==== ===== Weight Normalization Schemes ====
ml/normalization.1614867546.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki