paper:etc
ETC: Encoding Long and Structured Inputs in Transformers
ETC: Encoding Long and Structured Inputs in Transformers
TLDR; ETC encodes long inputs using global-local attention and represents structures by combining relative position representations and flexible masking. It also employs CPC pre-training for hierarchical global tokens (structures).
Key Points
- Global-local attention: input tokens divided into two sets
- Global: can attend to all input tokens
- Long: can only locally attend to nearby tokens
- Related to Longformer.
- Relative position representations: allow encoding arbitrary structure relations between input tokens.
- Contrastive predictive coding (CPC): pre-training objective that helps the model learn how to use global summary tokens.
- The experiments use benchmarks with long/structured data
- Question answering: Natural Questions (NQ), HotpotQA, WikiHop
- Keyphrase extraction: OpenKP
- It was pre-trained using BERT original datasets but filtering out documents with fewer than 7 sentences.
- MLM
- CPC
- Lifting weights directly from RoBERTa (with RoBERTa’s vocabulary)
paper/etc.txt · Last modified: 2023/06/15 07:36 by 127.0.0.1