Pretrained BigBird Model for Korean

What is BigBird • How to Use • Pretraining • Evaluation Result • Docs • Citation

한국어 | English

What is BigBird?

BigBird: Transformers for Longer Sequences is a sparse-attention based model that can handle longer sequences than a normal BERT.

🦅 Longer Sequence - Handles up to 4096 tokens, 8 times the BERT, which can handle up to 512 tokens

⏱️ Computational Efficiency - Improved from O(n²) to O(n) using Sparse Attention instead of Full Attention

How to Use

Available on the 🤗 Huggingface Hub!
Recommend to use transformers>=4.11.0, which some issues are fixed (PR related with MRC issue)
You have to use BertTokenizer instead of BigBirdTokenizer (BertTokenizer will be loaded if you use AutoTokenizer)
For detail guideline, see BigBird Tranformers documentation.

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("monologg/kobigbird-bert-base")  # BigBirdModel
tokenizer = AutoTokenizer.from_pretrained("monologg/kobigbird-bert-base")  # BertTokenizer

Pretraining

For more information, see [Pretraining BigBird]

	Hardware	Max len	LR	Batch	Train Step	Warmup Step
KoBigBird-BERT-Base	TPU v3-8	4096	1e-4	32	2M	20k

Trained with various data such as Everyone's Corpus, Korean Wiki, Common Crawl, and news data
Use ITC (Internal Transformer Construction) model for pretraining. (ITC vs ETC)

Evaluation Result

1. Short Sequence (<=512)

For more information, see [Finetune on Short Sequence Dataset]

	NSMC (acc)	KLUE-NLI (acc)	KLUE-STS (pearsonr)	Korquad 1.0 (em/f1)	KLUE MRC (em/rouge-w)
KoELECTRA-Base-v3	91.13	86.87	93.14	85.66 / 93.94	59.54 / 65.64
KLUE-RoBERTa-Base	91.16	86.30	92.91	85.35 / 94.53	69.56 / 74.64
KoBigBird-BERT-Base	91.18	87.17	92.61	87.08 / 94.71	70.33 / 75.34

2. Long Sequence (>=1024)

For more information, see [Finetune on Long Sequence Dataset]

	TyDi QA (em/f1)	Korquad 2.1 (em/f1)	Fake News (f1)	Modu Sentiment (f1-macro)
KLUE-RoBERTa-Base	76.80 / 78.58	55.44 / 73.02	95.20	42.61
KoBigBird-BERT-Base	79.13 / 81.30	67.77 / 82.03	98.85	45.42

Docs

Pretraing BigBird
Finetune on Short Sequence Dataset
Finetune on Long Sequence Dataset
Download Tensorflow v1 checkpoint
GPU Benchmark result

Citation

If you apply KoBigBird to any project and research, please cite our code as below.

@software{jangwon_park_2021_5654154,
  author       = {Jangwon Park and Donggyu Kim},
  title        = {KoBigBird: Pretrained BigBird Model for Korean},
  month        = nov,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {1.0.0},
  doi          = {10.5281/zenodo.5654154},
  url          = {https://doi.org/10.5281/zenodo.5654154}
}

Contributors

Jangwon Park and Donggyu Kim

Acknowledgements

KoBigBird is built with Cloud TPU support from the Tensorflow Research Cloud (TFRC) program.

Also, thanks to Seyun Ahn for a nice logo:)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_EN.md

README_EN.md

Pretrained BigBird Model for Korean

What is BigBird?

How to Use

Pretraining

Evaluation Result

1. Short Sequence (<=512)

2. Long Sequence (>=1024)

Docs

Citation

Contributors

Acknowledgements

Files

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

Pretrained BigBird Model for Korean

What is BigBird?

How to Use

Pretraining

Evaluation Result

1. Short Sequence (<=512)

2. Long Sequence (>=1024)

Docs

Citation

Contributors

Acknowledgements