nlp:experimental_method

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:experimental_method [2022/07/11 10:16] – [Resources and Tutorials] jmflanignlp:experimental_method [2023/06/15 07:36] (current) – external edit 127.0.0.1
Line 16: Line 16:
  
 For an overview of applying tests of statistical significance to NLP, see: For an overview of applying tests of statistical significance to NLP, see:
 +  * **NLP 203 slides on statistical significance**: [[https://drive.google.com/file/d/1e4qtAgF_xAtMUSR7xLKzEdaRT9tOzizo/view|Spring 2021]]
   * Section 11.3 from [[http://www.phontron.com/class/mtandseq2seq2018/assets/slides/mt-fall2018.chapter11.pdf|here]] (applied to MT, but the same techniques are used elsewhere in NLP)   * Section 11.3 from [[http://www.phontron.com/class/mtandseq2seq2018/assets/slides/mt-fall2018.chapter11.pdf|here]] (applied to MT, but the same techniques are used elsewhere in NLP)
   * [[https://cs.stanford.edu/people/wmorgan/sigtest.pdf|Slides from Stanford NLP Group]]   * [[https://cs.stanford.edu/people/wmorgan/sigtest.pdf|Slides from Stanford NLP Group]]
Line 23: Line 24:
   * [[https://aclanthology.org/P19-1266.pdf|Dror et al 2019 - Deep Dominance - How to Properly Compare Deep Neural Models]] Caveat: some researchers have advocated tuning the random seed as a hyperparameter, see [[nlp:experimental_method#Effects of the Random Seed]]   * [[https://aclanthology.org/P19-1266.pdf|Dror et al 2019 - Deep Dominance - How to Properly Compare Deep Neural Models]] Caveat: some researchers have advocated tuning the random seed as a hyperparameter, see [[nlp:experimental_method#Effects of the Random Seed]]
   * [[https://arxiv.org/pdf/2204.06815.pdf|Ulmer et al 2022 - Deep-Significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks]]   * [[https://arxiv.org/pdf/2204.06815.pdf|Ulmer et al 2022 - Deep-Significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks]]
 +  * [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html|Wilcoxon Signed-Rank Test Docs in SciPy]]. An issue to consider is how to include outcomes in which system A & B produce the same prediction/score. See the ''zero_method'' parameter and associated links.
  
 ==== Boostrap Resampling and Permutation Tests ==== ==== Boostrap Resampling and Permutation Tests ====
Line 72: Line 74:
   * Model cards   * Model cards
     * [[https://arxiv.org/pdf/1810.03993.pdf|Mitchell et al 2018 - Model Cards for Model Reporting]]     * [[https://arxiv.org/pdf/1810.03993.pdf|Mitchell et al 2018 - Model Cards for Model Reporting]]
-    * Examples: An AllenNLP [[https://demo.allennlp.org/reading-comprehension/transformer-qa|model card]]+    * Examples: An AllenNLP [[https://demo.allennlp.org/reading-comprehension/transformer-qa|model card]], InstructGPT [[https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md|model card]]
  
 ===== Other Topics in Experimental Design ===== ===== Other Topics in Experimental Design =====
nlp/experimental_method.1657534609.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki