User Tools

Site Tools


nlp:experimental_method

Experimental Method and Reproducibility

Reproducibility

Statistical Significance

See also Statistical Tests.

For an overview of applying tests of statistical significance to NLP, see:

Boostrap Resampling and Permutation Tests

Papers

Software

Below is from an email I sent to a student Jan 20, 2019

It is recommended to use a non-parametric test, such as the permutation test or paired bootstrap, rather than a t-test, since they don't have distribution assumptions. An example of how to do this is (use the R-package at the mentioned at the end):

https://thomasleeper.com/Rcourse/Tutorials/permutationtests.html

Other references: https://cs.stanford.edu/people/wmorgan/sigtest.pdf http://www.aclweb.org/anthology/D/D12/D12-1091.pdf

There are other tests which also re-sample the test data, which is necessary if the test data is small. A script to do all this is:

https://github.com/mgormley/sigtest

You only need 3-5 different runs for each experiments. If you don't get significance but want to show it, you can do more runs.

Significance testing can be daunting since there are so many methods. To keep it simple, I recommend just doing 3-5 runs for each experiment, and using the permutation test in the first link. You can also report the sample standard deviation as error bars in the table (can to this with just 3-5 samples).

Reproducibility Checklists, Datasheets and Model Cards

Other Topics in Experimental Design

Effects of the Random Seed

For many common tasks and neural architectures, the choice of random seed has only a small effect on the accuracy or BLEU score (a standard deviation across random seeds of say .1-.5). For this reason, many software packages fix the random seed in advance. However, for some tasks or models, it is possible for the random seed to have a larger effect. For example, Rongwen has found it has a large effect on neural models for Compositional Generalization.
Overview: 2021 - We Need to Talk About Random Seeds Advocates tuning the random seed

Resources and Tutorials

nlp/experimental_method.txt · Last modified: 2023/06/15 07:36 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki