User Tools

Site Tools


paper:etc

This is an old revision of the document!


ETC: Encoding Long and Structured Inputs in Transformers

ETC: Encoding Long and Structured Inputs in Transformers

TLDR; ETC encodes long inputs using global-local attention and represents structures by combining relative position representations and flexible masking. It also employs CPC pre-training for hierarchical global tokens (structures).

Key Points

  • Global-local attention: input tokens divided into two sets
    • Global: can attend to all input tokens
    • Long: can only locally attend to nearby tokens
    • Related to Longformer.
  • Relative position representations: allow encoding arbitrary structure relations between input tokens.
  • Contrastive predictive coding (CPC): pre-training objective that helps the model learn how to use global summary tokens.
  • The experiments use benchmarks with long/structured data
    • Question answering: Natural Questions (NQ), HotpotQA, WikiHop
    • Keyphrase extraction: OpenKP
  • It was pre-trained using BERT original datasets but filtering out documents with fewer than 7 sentences.
    • MLM
    • CPC
    • Lifting weights directly from RoBERTa (with RoBERTa’s vocabulary)
paper/etc.1605752515.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki