User Tools

Site Tools


nlp:dataset_creation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:dataset_creation [2023/12/10 06:01] – [Annotation] jmflanignlp:dataset_creation [2023/12/10 06:18] (current) – [Building Your own Annotation Tool] jmflanig
Line 20: Line 20:
   * Software   * Software
     * R's [[https://cran.r-project.org/web/packages/irr/index.html|Inter-Annotator Reliability Package]] (IRR) is great. [[https://cran.r-project.org/web/packages/irr/irr.pdf|docs]] [[https://www.andywills.info/rminr/irr.html|example]]     * R's [[https://cran.r-project.org/web/packages/irr/index.html|Inter-Annotator Reliability Package]] (IRR) is great. [[https://cran.r-project.org/web/packages/irr/irr.pdf|docs]] [[https://www.andywills.info/rminr/irr.html|example]]
 +
 +
 +==== Building Your own Annotation Tool ====
 +  * For simple projects, annotation can be done in a spreadsheet
 +  * When building your own annotation tool, here are some things to consider
 +    * The purpose of the tool is to make the annotation faster.  Think carefully about what interface will be fastest for trained annotators.
 +    * To speed up development, use whatever language and API you are familiar with or find easiest.
 +    * Think very carefully about ways to reduce unnecessary mouse clicks, typing, reading text, etc.  Every mouse click counts.  Aggressively remove anything that is unnecessary, like typing escape or enter to save.  Instead, automatically save when you go to the next example, etc.
 +    * Plan on doing some iterations on the tool.  You will need to try it, and change it based on your experience.
 +    * It doesn't need to be perfect, it just needs to be fast to use.  It's ok to have bugs in the annotation tool if it's not widely used, and they don't slow down annotation.
 +    * Don't make it full-featured.  You just need the features that make annotation fast.
  
 ===== Dataset and Data Selection Issues ===== ===== Dataset and Data Selection Issues =====
nlp/dataset_creation.1702188097.txt.gz · Last modified: 2023/12/10 06:01 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki