v0.2:
- different, more consistent handling of end-of-word token (commit a749a7)
- allow passing of vocabulary and frequency threshold to apply_bpe.py, preventing the production of OOV (or rare) subword units (commit a00db)
v0.1:
- consistent cross-version unicode handling
- all scripts are now deterministic