Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce e2e NER experiments #40

Open
thomasnguyen92 opened this issue Dec 4, 2023 · 2 comments
Open

Reproduce e2e NER experiments #40

thomasnguyen92 opened this issue Dec 4, 2023 · 2 comments

Comments

@thomasnguyen92
Copy link

I attempted to replicate a Named Entity Recognition (NER) experiment but encountered several issues during the process.

Firstly, when executing the command python slue_toolkit/prepare/prepare_voxpopuli_nel.py create_manifest to generate manifest files, I noticed that the dev.tsv, fine-tune.tsv, and test.tsv files were merely symbolic links. They were unusable for running the end-to-end NER model. To resolve this, I had to manually copy dev.tsv and fine-tune.tsv from slue-toolkit/manifest/slue-voxpopuli into the e2e_ner directory.

Additionally, I faced a problem while performing evaluations with the command bash baselines/ner/e2e_scripts/eval-ner.sh w2v2-base test combined nolm. It appears that the processed test files are missing. Could you provide guidance on how to properly prepare these files for evaluation?

@ankitapasad
Copy link
Contributor

Hi @thomasnguyen92

Thank you for your interest in our work!

The test data is not public yet. We'll update the repo when we make it public (soon). Until then, if you'd like the test set evaluated, you can follow the instructions here.

Can you point out the specific step/script that gave you trouble because of the symbolic links?

@maherr13
Copy link

maherr13 commented Feb 4, 2024

I can provide the steps as i faced the same problem

following the steps as mentioned after python slue_toolkit/prepare/prepare_voxpopuli_nel.py create_manifest

when you run the cmd bash baselines/ner/e2e_scripts/ft-w2v2-base.sh manifest/slue-voxpopuli/e2e_ner save/e2e_ner/w2v2-base

you would get the following error

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 347, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 107, in run
    return run_job(
  File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 129, in run_job
    ret.return_value = task_function(task_cfg)
  File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq_cli/hydra_train.py", line 27, in hydra_main
    _hydra_main(cfg)
  File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq_cli/hydra_train.py", line 56, in _hydra_main
    distributed_utils.call_main(cfg, pre_main, **kwargs)
  File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq/distributed/utils.py", line 404, in call_main
    main(cfg, **kwargs)
  File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq_cli/train.py", line 134, in main
    task.load_dataset(valid_sub_split, combine=False, epoch=1)
  File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq/tasks/audio_finetuning.py", line 140, in load_dataset
    super().load_dataset(split, task_cfg, **kwargs)
  File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq/tasks/audio_pretraining.py", line 153, in load_dataset
    self.datasets[split] = FileAudioDataset(
  File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq/data/audio/raw_audio_dataset.py", line 269, in __init__
    with open(manifest_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/content/manifest/slue-voxpopuli/e2e_ner/dev.tsv'

even if the file exists as a symbolic link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants