Differences

This shows you the differences between two versions of the page.

--- nlp:experimental_method [2022/07/11 10:16] – [Resources and Tutorials] jmflanig
+++ nlp:experimental_method [2023/06/15 07:36] (current) – external edit 127.0.0.1
@@ Line 16: / Line 16: @@
 For an overview of applying tests of statistical significance to NLP, see:
+  * **NLP 203 slides on statistical significance**: [[https://drive.google.com/file/d/1e4qtAgF_xAtMUSR7xLKzEdaRT9tOzizo/view|Spring 2021]]
   * Section 11.3 from [[http://www.phontron.com/class/mtandseq2seq2018/assets/slides/mt-fall2018.chapter11.pdf|here]] (applied to MT, but the same techniques are used elsewhere in NLP)
   * [[https://cs.stanford.edu/people/wmorgan/sigtest.pdf|Slides from Stanford NLP Group]]
@@ Line 23: / Line 24: @@
   * [[https://aclanthology.org/P19-1266.pdf|Dror et al 2019 - Deep Dominance - How to Properly Compare Deep Neural Models]] Caveat: some researchers have advocated tuning the random seed as a hyperparameter, see [[nlp:experimental_method#Effects of the Random Seed]]
   * [[https://arxiv.org/pdf/2204.06815.pdf|Ulmer et al 2022 - Deep-Significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks]]
+  * [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html|Wilcoxon Signed-Rank Test Docs in SciPy]]. An issue to consider is how to include outcomes in which system A & B produce the same prediction/score. See the ''zero_method'' parameter and associated links.
 ==== Boostrap Resampling and Permutation Tests ====
@@ Line 72: / Line 74: @@
   * Model cards
     * [[https://arxiv.org/pdf/1810.03993.pdf|Mitchell et al 2018 - Model Cards for Model Reporting]]
-    * Examples: An AllenNLP [[https://demo.allennlp.org/reading-comprehension/transformer-qa|model card]]
+    * Examples: An AllenNLP [[https://demo.allennlp.org/reading-comprehension/transformer-qa|model card]], InstructGPT [[https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md|model card]]
 ===== Other Topics in Experimental Design =====