BigBird: Transformers for Longer Sequences is a sparse-attention based model that can handle longer sequences than a normal BERT.
🦅 Longer Sequence - Handles up to 4096 tokens, 8 times the BERT, which can handle up to 512 tokens
⏱️ Computational Efficiency - Improved from O(n2) to O(n) using Sparse Attention instead of Full Attention
- Available on the 🤗 Huggingface Hub!
- Recommend to use
transformers>=4.11.0
, which some issues are fixed (PR related with MRC issue) - You have to use
BertTokenizer
instead of BigBirdTokenizer (BertTokenizer
will be loaded if you useAutoTokenizer
) - For detail guideline, see BigBird Tranformers documentation.
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("monologg/kobigbird-bert-base") # BigBirdModel
tokenizer = AutoTokenizer.from_pretrained("monologg/kobigbird-bert-base") # BertTokenizer
For more information, see [Pretraining BigBird]
Hardware | Max len | LR | Batch | Train Step | Warmup Step | |
---|---|---|---|---|---|---|
KoBigBird-BERT-Base | TPU v3-8 | 4096 | 1e-4 | 32 | 2M | 20k |
- Trained with various data such as Everyone's Corpus, Korean Wiki, Common Crawl, and news data
- Use
ITC (Internal Transformer Construction)
model for pretraining. (ITC vs ETC)
For more information, see [Finetune on Short Sequence Dataset]
NSMC (acc) |
KLUE-NLI (acc) |
KLUE-STS (pearsonr) |
Korquad 1.0 (em/f1) |
KLUE MRC (em/rouge-w) |
|
---|---|---|---|---|---|
KoELECTRA-Base-v3 | 91.13 | 86.87 | 93.14 | 85.66 / 93.94 | 59.54 / 65.64 |
KLUE-RoBERTa-Base | 91.16 | 86.30 | 92.91 | 85.35 / 94.53 | 69.56 / 74.64 |
KoBigBird-BERT-Base | 91.18 | 87.17 | 92.61 | 87.08 / 94.71 | 70.33 / 75.34 |
For more information, see [Finetune on Long Sequence Dataset]
TyDi QA (em/f1) |
Korquad 2.1 (em/f1) |
Fake News (f1) |
Modu Sentiment (f1-macro) |
|
---|---|---|---|---|
KLUE-RoBERTa-Base | 76.80 / 78.58 | 55.44 / 73.02 | 95.20 | 42.61 |
KoBigBird-BERT-Base | 79.13 / 81.30 | 67.77 / 82.03 | 98.85 | 45.42 |
- Pretraing BigBird
- Finetune on Short Sequence Dataset
- Finetune on Long Sequence Dataset
- Download Tensorflow v1 checkpoint
- GPU Benchmark result
If you apply KoBigBird to any project and research, please cite our code as below.
@software{jangwon_park_2021_5654154,
author = {Jangwon Park and Donggyu Kim},
title = {KoBigBird: Pretrained BigBird Model for Korean},
month = nov,
year = 2021,
publisher = {Zenodo},
version = {1.0.0},
doi = {10.5281/zenodo.5654154},
url = {https://doi.org/10.5281/zenodo.5654154}
}
KoBigBird is built with Cloud TPU support from the Tensorflow Research Cloud (TFRC) program.
Also, thanks to Seyun Ahn for a nice logo:)