ETC: Encoding Long and Structured Inputs in Transformers

ETC: Encoding Long and Structured Inputs in Transformers

TLDR; ETC encodes long inputs using global-local attention and represents structures by combining relative position representations and flexible masking. It also employs CPC pre-training for hierarchical global tokens (structures).

Key Points