BPE Encoding files #11

fr4nc3sc4 · 2021-07-13T13:04:01Z

Hello,
During the BPE encoding, subword-nmt is generating the file {codes_file}. Can you please share this file? If it is not possible, can you share the training set for obtaining the {codes_file}?
I would like to use OpenVocabNLM with a certain dataset and compare my results with the ones obtained in your research.

Furthermore, I have another question. Do you run create_subtoken_data.py and non-ascii_sequences_to_unk.py before BPE encoding?

Thank you.

lapplislazuli · 2022-01-27T09:55:38Z

@fr4nc3sc4

Not an author, but this might be what you are looking for:
https://github.com/giganticode/icse-2020
There are zenodo artifacts for the separate datasets:
https://zenodo.org/record/3628636

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BPE Encoding files #11

BPE Encoding files #11

fr4nc3sc4 commented Jul 13, 2021

lapplislazuli commented Jan 27, 2022 •

edited

Loading

BPE Encoding files #11

BPE Encoding files #11

Comments

fr4nc3sc4 commented Jul 13, 2021

lapplislazuli commented Jan 27, 2022 • edited Loading

lapplislazuli commented Jan 27, 2022 •

edited

Loading