This repository contains an implementation of the A Lite BERT (ALBERT) model in PyTorch. ALBERT is a transformer-based model that uses factorized embedding parameterization to lower parameter numbers while maintaining high performance in various natural language processing tasks.
The implementation is based on the following paper:
[1] Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv:1909.11942. Retrieved from the paper
[2] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), paper