Natural Language Processing • Attention Mechanisms • 8 methods
The original self-attention component in the Transformer architecture has a $O\left(n^{2}\right)$ time and memory complexity where $n$ is the input sequence length and thus, is not efficient to scale to long inputs. Attention pattern methods look to reduce this complexity by looking at a subset of the space.
Method | Year | Papers |
---|---|---|
2019 | 1324 | |
2019 | 1323 | |
2020 | 84 | |
2020 | 78 | |
2020 | 77 | |
2020 | 14 | |
2022 | 13 | |
2020 | 7 |