Differences

This shows you the differences between two versions of the page.

--- ml:nn_tricks [2022/09/02 22:16] – [Neural Network Tricks] jmflanig
+++ ml:nn_tricks [2023/10/11 22:19] (current) – jmflanig
@@ Line 12: / Line 12: @@
     * [[Curriculum Learning]]
     * Overcoming [[Catastrophic Forgetting]]
-    * Adjust the batch size, or use gradient accumulation to simulate larger batch sizes
+    * Adjust the batch size, or use gradient accumulation (see [[https://kozodoi.me/blog/20210219/gradient-accumulation|this blog]], for example) to simulate larger batch sizes
-    * Try a different [[optimizers#modern_deep_learning_optimizers|optimizer]], such as [[https://arxiv.org/pdf/1908.03265.pdf|RAdam]]
+    * Try a different [[optimizers#modern_deep_learning_optimizers|optimizer]], such as [[ https://arxiv.org/pdf/1908.03265.pdf|RAdam]]
     * Adjust [[https://arxiv.org/pdf/2011.02150.pdf|epsilon]] in Adam
+  * Fine-tuning Specific Tricks
+    * [[https://aclanthology.org/2022.acl-short.76/|NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better]] : Before fine-tuning, adding a very small amount of uniform noise to each weight matrix can help performance (noise scaled by variance of the weight)
   * Regularization Tricks (see [[Regularization]])
     * [[Regularization#Dropout]]