Hierarchical Transformers Are More Efficient Language Models

Published in NAACL 2022, 2022

Hierarchical Transformers Are More Efficient Language Models

Hourglass is a hierarchical Transformer with downsampling and upsampling layers. It studies how explicit hierarchy can reduce computation in long-sequence modeling.

The model is evaluated on ImageNet32 and enwik8, where it improves efficiency relative to standard Transformer baselines.

Links:

Recommended citation: Nawrot, P., Tworkowski, S., Tyrolski, M., Kaiser, Ł., Wu, Y., Szegedy, C., & Michalewski, H. (2022). Hierarchical transformers are more efficient language models. Findings of the Association for Computational Linguistics: NAACL 2022, 1559-1571.
Download Paper | Download Slides