名称 | 来源 | 说明 | 状态 | 备注 |
---|---|---|---|---|
综述 | NULL | NULL | NULL | NULL |
《Pre-trained Models for Natural Language Processing: A Survey》 | Arxiv2020 | 邱锡鹏预训练模型综述: 1 Introduction 2 Background 3 Overview of PTMs 4 Extensions of PTMs 5 Adapting PTMs to Downstream Tasks 6 Resources of PTMs 7 Applications 8 Future Directions 9 Conclusion |
done | NULL |
预训练模型探索 | NULL | NULL | NULL | NULL |
《Semi-supervised Sequence Learning》 | NIPS2015 | 上下文信息预训练模型最早的论文: 1 单链LSTM的预训练模型; |
done | NULL |
《Learned in Translation: Contextualized Word Vectors》 | NIPS2017 | The first paper that proposes a contextualized text representation approach? | NULL | NULL |
《Semi-supervised sequence tagging with bidirectional language models》 | ACL2017 | AllenNLP,TagLM,ELMO的前身: 已经引入了ELMO的双向LSTM预训练向量机制,但重点在序列标注任务; |
done | NULL |
《Universal Language Model Fine-tuning for Text Classification》 | ACL2018 | FastAI的ULMFiT; 号称ELMO之前的,引入上下文的预训练模型: 三阶段; |
done | NULL |
《Deep contextualized word representation》 | NAACL2018 | AllenNLP ELMO:非常简单 | done | NULL |
预训练模型兴起 | NULL | NULL | NULL | NULL |
《BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》 | ACL2019 | BERT | done | 哈工大/讯飞联合实验室,中文全词(WWM): https://github.com/ymcui/Chinese-BERT-wwm |
《Cross-lingual Language Model Pretraining》 | 2019 | Facebook 跨语言XLMs | NULL | NULL |
《MASS: Masked Sequence to Sequence Pre-training for Language Generation》 | ICML2019 | 微软MASS: Encoder-Decoder架构的预训练模型 |
NULL | NULL |
《Unified Language Model Pre-training for Natural Language Understanding and Generation》 | NULL | 微软UniLM1: XLNet之前,同时结合自回归、自编码、Encoder-Decoder思想的预训练模型 |
NULL | https://zhuanlan.zhihu.com/p/584193190 |
《UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training》 | NULL | 微软UniLM2: 相比与UniLM1,将自编码、自回归、Encode-DeCoder,变成了自自编码+部分自回归(可能受XLNet启发) |
NULL | https://zhuanlan.zhihu.com/p/114028314 |
《MPNet: Masked and Permuted Pre-training for Language Understanding》 | 2020 | MPNet | NULL | NULL |
《RoBERTa: A Robustly Optimized BERT Pretraining Approach》 | NULL | RoBERTa | NULL | 哈工大/讯飞联合 git地址: https://github.com/ymcui/Chinese-BERT-wwm |
《ALBERT: A Lite BERT for Self-supervised Learning of Language Representations》 | done | ALBERT: 嵌入层因子分解; 共享参数; 句子连贯性预测; |
done | 中文ALBERT:https://github.com/brightmart/albert_zh ; 如何看待瘦身成功版BERT——ALBERT? https://mp.weixin.qq.com/s/LF8TiVccYcm4B6krCOGVTQ |
《Pre-Training with Whole Word Masking for Chinese BERT》 | arXiv2019 | 主要是引入了WWM(Google官网只有英文的WWM,没有中文的) | done | git地址: https://github.com/ymcui/Chinese-BERT-wwm |
《ERNIE: Enhanced Language Representation with Informative Entities》 | ACL2019 | 清华ERNIE: 输入原始query和query中的实体,并同时预测query中的token和实体 |
done | NULL |
《ERNIE: Enhanced Representation through Knowledge Integration》 | ArXiv2019 | 百度ERNIE 1.0: 和BERT的输入一致,但预测实体和短语(全词MASK) |
done | NULL |
《ERNIE 2.0: A CONTINUAL PRE-TRAINING FRAMEWORK FORLANGUAGE》 | AAAI2020 | 百度ERNIE 2.0: 多任务,持续学习框架 |
done | NULL |
《ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators》 | ICLR2020 | ELECTRA | NULL | NULL |
《K-BERT: Enabling Language Representation with Knowledge Graph》 | 2019 | K-BERT | NULL | NULL |
《StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding》 | 2019 | 阿里StructBERT | NULL | NULL |
《Semantics-aware BERT for Language Understanding》 | AAAI2020 | NULL | NULL | NULL |
长文本预训练模型 | NULL | NULL | NULL | NULL |
《XLNet: Generalized Autoregressive Pretraining for Language Understanding》 | NIPS2019 | XLNet: 没怎么看懂,看了很多博客,但还是不太清楚 |
done | 哈工大/讯飞联合实验室,开源中文XLNet预训练模型: https://github.com/ymcui/Chinese-XLNet |
《Longformer: The Long-Document Transformer》 | arXiv2020 | 长文本处理Transformer预训练模型: 1 滑动膨胀Attention; 2 全局Attention; 3 CUDA实现; 4 在RoBert的参数基础上,进行进一步预训练(基于Autoregressive语言模型); |
done | https://zhuanlan.zhihu.com/p/223430086 |
《Big Bird: Transformers for Longer Sequences》 | 2020 | Big Bird | NULL | NULL |
预训练模型压缩 | NULL | NULL | NULL | NULL |
《Well-Read Students Learn Better: On the Importance of Pre-training Compact Models》 | arXiv2019 | NULL | NULL | Google官方蒸馏方法: github:https://github.com/google-research/bert |
《ALBERT: A Lite BERT for Self-supervised Learning of Language Representations》 | 2019 | ALBERT | NULL | NULL |
《DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter》 | NIPS2019 | DistilBERT | NULL | NULL |
《TinyBERT: Distilling BERT for Natural Language Understanding》 | EMNLP2020 | TinyBERT | NULL | NULL |
《FastBERT: a Self-distilling BERT with Adaptive Inference Time》 | ACL2020 | FastBERT | NULL | NULL |
超大规模预训练模型 | NULL | NULL | NULL | NULL |
《THE COST OF TRAINING NLP MODELS A CONCISE OVERVIEW》 | NULL | 预训练模型成本对比: | done | NULL |
特定任务领域预训练模型 | NULL | NULL | NULL | NULL |
《SpanBERT: Improving Pre-training by Representing and Predicting Spans》 | TACL2020 | SpanBERT | NULL | NULL |
《Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks》 | EMNLP2019 | Sentence-BERT | NULL | NULL |
中文预训练模型 | NULL | NULL | NULL | NULL |
《Pre-Training with Whole Word Masking for Chinese BERT》 | 2019 | 中文WWM预训练BERT | NULL | NULL |
《NEZHA: Neural Contextualized Representation for Chinese Language Understanding》 | 2019 | NEZHA | NULL | NULL |
《ZEN: Pre-training Chinese (Z) Text Encoder Enhanced by N-gram Representations 》 | 2019 | ZEN | NULL | NULL |
《Revisiting Pre-Trained Models for Chinese Natural Language Processing》 | 2020 | MacBERT | NULL | NULL |
类似GPT3的大规模预训练模型 | NULL | NULL | NULL | NULL |
《BloombergGPT: A Large Language Model for Finance》 | arXiv2023 | 彭博社的GPT预训练大模型: 1 数据上是领域的金融数据+通用数据; 2 参数量50B; 3 在模型实现上主要参考了BLOOM项目; 4 在AWS的A100集群行训练; 5 核心还是训练一个领域的预训练模型,和GPT-NeoX、OPT-66B、BLOOM-176B对比起来,效果优化不少,但直接的结果也并不惊艳(预训练模型都这样); |
NULL | NULL |
《BLOOM: A 176B-Parameter Open-Access Multilingual Language Model》 | arxiv2022 | BigScience开源组织的BLOOM(数据、模型、参数、硬件算力、训练框架、评测): 1 560M、1.1B、3B、7.1B、176B的预训练模型; 2 在A100-80GB GPU集群上,基于Megatron-DeepSpeed框架(数据并行、张量并行、Pipeline并行)进行训练; 3 在结果对比中有GPT-3,但并不是主要的对比对象。反而OPT模型的对比较多 |
NULL | NULL |
《OPT: Open Pre-trained Transformer Language Models》 | arXiv2022 | Meta的OPT模型(数据、模型、参数、硬件算力、训练框架、评测): 1 模型参数量:125M to 175B; 2 基于Megatron-LM代码; 3 基于A100-80G的GPU计算集群,通过数据并行和基于Megatron-LM的模型并行进行训练加速; |
NULL | NULL |
《GLM: General Language Model Pretraining with Autoregressive Blank Infilling》 | ACL2022 | 智谱的预训练模型GLM:兼顾自编码和自回归 1 相比于XLNet、UniLM设计上更容易理解; 2 特点: -自编码:随机Mask输入中的跨度token; -自回归:从左向右重建跨度中的token; -2D的编码技术,来表示跨间和跨内信息; |
NULL | NULL |
NULL | NULL | NULL | NULL | NULL |
NULL | NULL | NULL | NULL | NULL |