This is a PyTorch implementation of the simple sequence-to-sequence paraphrase generator using Transformer (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).
python train.py \
--train_source_file <file_path> \
--train_target_file <file_path> \
--valid_source_file <file_path> \
--valid_target_file <file_path> \
--spm_file <file_path>
train_source_file
one-sentence-per-line raw corpus file for training.train_target_file
one-sentence-per-line raw corpus file for training.valid_source_file
one-sentence-per-line raw corpus file for validation.valid_target_file
one-sentence-per-line raw corpus file for validation.spm_file
SentencePiece model file.
python generate.py \
--input_file <file_path> \
--output_file <file_path> \
--spm_file <file_path> \
--search_width 3
input_file
one-sentence-per-line raw corpus file to paraphrase.output_file
output filesearch_width
beam search width.
Download data from here. Download trained model parameters and generated paraphrases from here.