This problem uses recurrent neural network to do language translation.
To setup the environment on Ubuntu 16.04 (16 CPUs, one P100, 100 GB disk), you can use these commands. This may vary on a different operating system or graphics card.
# Install docker
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt update
# sudo apt install docker-ce -y
sudo apt install docker-ce=18.03.0~ce-0~ubuntu -y --allow-downgrades
# Install nvidia-docker2
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt install nvidia-docker2 -y
sudo tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
sudo pkill -SIGHUP dockerd
sudo apt install -y bridge-utils
sudo service docker stop
sleep 1;
sudo iptables -t nat -F
sleep 1;
sudo ifconfig docker0 down
sleep 1;
sudo brctl delbr docker0
sleep 1;
sudo service docker start
Download the data using the following command:
bash download_dataset.sh
Verify data with:
bash verify_dataset.sh
We use WMT16 English-German for training.
Script uses subword-nmt package to segment text into subword units (BPE), by default it builds shared vocabulary of 32,000 tokens. Preprocessing removes all pairs of sentences that can't be decoded by latin-1 encoder.
Training uses WMT16 English-German dataset, validation is on concatenation of newstest2015 and newstest2016, BLEU evaluation is done on newstest2014.
By default training script does bucketing by sequence length. Before each epoch dataset is randomly shuffled and split into chunks of 80 batches each. Within each chunk it's sorted by (src + tgt) sequence length and then batches are reshuffled within each chunk.
Implemented model is similar to the one from Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation paper.
Most important difference is in the attention mechanism. This repository
implements gnmt_v2
attention: output from first LSTM layer of decoder goes
into attention, then re-weighted context is concatenated with inputs to all
subsequent LSTM layers in decoder at current timestep.
The same attention mechanism is also implemented in default GNMT-like models from tensorflow/nmt and NVIDIA/OpenSeq2Seq.
- general:
- encoder and decoder are using shared embeddings
- data-parallel multi-gpu training
- dynamic loss scaling with backoff for Tensor Cores (mixed precision) training
- trained with label smoothing loss (smoothing factor 0.1)
- encoder:
- 4-layer LSTM, hidden size 1024, first layer is bidirectional, the rest are undirectional
- with residual connections starting from 3rd layer
- uses standard LSTM layer (accelerated by cudnn)
- decoder:
- 4-layer unidirectional LSTM with hidden size 1024 and fully-connected classifier
- with residual connections starting from 3rd layer
- uses standard LSTM layer (accelerated by cudnn)
- attention:
- normalized Bahdanau attention
- model uses
gnmt_v2
attention mechanism - output from first LSTM layer of decoder goes into attention, then re-weighted context is concatenated with the input to all subsequent LSTM layers in decoder at the current timestep
- inference:
- beam search with default beam size 5
- with coverage penalty and length normalization
- BLEU computed by sacrebleu
Cross entropy loss with label smoothing (smoothing factor = 0.1), padding is not considered part of the loss.
Adam optimizer with learning rate 5e-4.
BLEU score on newstest2014 dataset. BLEU scores reported by sacrebleu package
Uncased BLEU score of 22.00.
Evaluation of BLEU score is done after every epoch.
Evaluation uses all of newstest2014.en
.