Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning at16k-t2t model for domain-specific entries #5

Open
sauravjoshi opened this issue May 26, 2020 · 1 comment
Open

Fine-tuning at16k-t2t model for domain-specific entries #5

sauravjoshi opened this issue May 26, 2020 · 1 comment

Comments

@sauravjoshi
Copy link

sauravjoshi commented May 26, 2020

I'm relatively new to t2t and was studying leveraging it for ASR when I came across your work.
Amazing work done @mohitsshah with proper explanation over at16k. The results are pretty impressive.
I'm planning to extend the model for domain-specific approach with an overview of extending the vocab.
Would like your assist onto the following.

As I could drill down, the problem registered through class At16kSubword which extends from class asr, is done similarly like class say LibriSpeech which inherits from class SpeechRecognitionProblem.

  1. The class At16kSubword has property of multiprocess_generate set as true this certainly means the data is being generated as several multiple processes. What was your config over this and depending upon the hours of data, what was time-spent?

  2. Also the core generate_data and generator functions aren't defined, what was the data you build it? Did you leveraged the librispeech and added you'rs own data.
    Those two function definition would be required to keep in sync with the additional data I'll be fine-tuning on. Could you provide that?

  3. The approx_vocab_size is defined as 1000 only?
    If our goal is to extend the vocab, and we are utilising the existing vocab which is getting used in feature_encoders(), will the new sub-words be added to this or a new vocab with additional sub-words shall be created, because as far as i know the vocab is generated in data-gen phase ?

    How much should the approx_vocab_size be tweaked?

Can you provide the data generation command that you used that would be containing the additional FLAGS.
The training command used with the additional FLAGS could be provided?

@MoslemTCM
Copy link

MoslemTCM commented Jul 28, 2020

I have a similar question for @mohitsshah:
I am trying to fine-tune your model using a new dataset for example Librispeech. However, when I try to to generate Librispeech data and continue training with your provided weights, obtained results are completely wrong and doesn't make sense. I am using the following script to continue training the model:

DATA_DIR=/media/disk3/Voice2text/t2t_data/
TMP_DIR=/media/disk3/Voice2text/t2t_datagen/
TRAIN_DIR=/media/disk3/Voice2text/t2t_train/librispeech_english/

PROBLEM=at16k_subword

python /media/disk3/Voice2text/env/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py
--t2t_usr_dir=/media/disk3/Voice2text/
--data_dir=$DATA_DIR
--output_dir=$TRAIN_DIR
--model=transformer
--worker_gpu_memory_fraction=0.9
--hparams_set=transformer_librispeech_tpu
--hparams=max_length=295650,max_input_seq_length=3650,max_target_seq_length=250
--train_steps=7000000
--problem=$PROBLEM
--allow_growth=True

Can you provide the data generation command that you used applied for example to Librispeech dataset and containing your flags?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants