BERT-Bidirectional Encoder Representations from Transformers

百川AI

发布于 2021-10-19 16:56:33

6090

发布于 2021-10-19 16:56:33

BERT, or Bidirectional Encoder Representations from Transformers

BERT是google最新提出的NLP预训练方法，在大型文本语料库（如维基百科）上训练通用的“语言理解”模型，然后将该模型用于我们关心的下游NLP任务（如分类、阅读理解）。 BERT优于以前的方法，因为它是用于预训练NLP的第一个**无监督，深度双向**系统。

简单的说就是吊打以前的模型，例如 Semi-supervised Sequence Learning,Generative Pre-Training,ELMo, ULMFit，在多个语言任务上（SQuAD, MultiNLI, and MRPC）基于BERT的模型都取得了state of the art的效果。

BERT 的核心过程:

从句子中随机选取15%去除，作为模型预测目标，例如：

Input: the man went to the [MASK1] . he bought a [MASK2] of milk.
Labels: [MASK1] = store; [MASK2] = gallon

为了学习句子之间的关系。会从数据集抽取两个句子，其中第二句是第一句的下一句的概率是 50%，

Sentence A: the man went to the store .
Sentence B: he bought a gallon of milk .
Label: IsNextSentence

Sentence A: the man went to the store .
Sentence B: penguins are flightless .
Label: NotNextSentence

最后再将经过处理的句子传入大型 Transformer 模型，并通过两个损失函数同时学习上面两个目标就能完成训练。

主要在于Transformer模型。后续需要再分析其模型机构以及设计思想。

BERT-Base, Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Large, Uncased: 24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Cased: 12-layer, 768-hidden, 12-heads , 110M parameters
BERT-Large, Cased: 24-layer, 1024-hidden, 16-heads, 340M parameters (Not available yet. Needs to be re-generated).
BERT-Base, Multilingual: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, Chinese: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters

其中包含：

A TensorFlow checkpoint (bert_model.ckpt) containing the pre-trained weights (which is actually 3 files).
A vocab file (vocab.txt) to map WordPiece to word id.
A config file (bert_config.json) which specifies the hyperparameters of the model.

其他语言见： Multilingual README。开放了中文数据集。

BERT-Base, Multilingual: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, Chinese: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters

（算力紧张情况下单独训练了一版中文，中文影响力可见一斑，我辈仍需努力啊）