Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the code for tedlium2_v2 with merged apostroph #218

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Marvin84
Copy link
Contributor

This is the recipes version that uses the fix from here.

from i6_experiments.common.tools.sctk import compile_sctk

RASR_BINARY_PATH = compile_rasr_binaries_i6mode(
branch="apptainer_tf_2_8", configure_options=["--apptainer-patch=2023-05-08_tensorflow-2.8_v1"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not your change, but I dont like merging something that will not work, since the branch does not exist anymore. I think we can just remove this and replace this by None and then let the assert do the work.

Comment on lines +15 to +17
RASR_BINARY_PATH = compile_rasr_binaries_i6mode(
branch="apptainer_tf_2_8", configure_options=["--apptainer-patch=2023-05-08_tensorflow-2.8_v1"]
) # use most recent RASR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RASR_BINARY_PATH = compile_rasr_binaries_i6mode(
branch="apptainer_tf_2_8", configure_options=["--apptainer-patch=2023-05-08_tensorflow-2.8_v1"]
) # use most recent RASR
RASR_BINARY_PATH = None # TODO please set RASR

SCTK_BINARY_PATH = compile_sctk(branch="v2.4.12") # use last published version
SCTK_BINARY_PATH.hash_overwrite = "TEDLIUM2_DEFAULT_SCTK_BINARY_PATH"

SRILM_PATH = tk.Path("/work/tools/users/luescher/srilm-1.7.3/bin/i686-m64/")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to RASR I think this needs to be either changed to some u22 path or just asserted as False

@@ -0,0 +1,423 @@
from i6_core.features.filterbank import filter_width_from_channels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just hijack this line just to make sure: You talked about some possible changes to FSA and Silence etc. for GMM, are they already included here? Or is this for now a straight copy?



def run_tedlium2_common_baseline(
alias_prefix="baselines/tedlium2/gmm/common_baseline",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have the same prefix as the other baseline?
This would cause overwriting if both are enabled. But I would not mind keeping the "true" alias to this file.

from typing import Dict

from i6_experiments.common.datasets.tedlium2.constants import CONCURRENT
from i6_experiments.common.datasets.tedlium2_v2.corpus import get_corpus_object_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file pushed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am waiting for the referenced PR to be merged.


from i6_experiments.common.datasets.tedlium2.constants import CONCURRENT
from i6_experiments.common.datasets.tedlium2_v2.corpus import get_corpus_object_dict
from i6_experiments.common.datasets.tedlium2_v2.lexicon import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pushed?

from collections import defaultdict
from typing import Dict

from i6_experiments.common.datasets.tedlium2.constants import CONCURRENT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I know this does not change, if we do a "full copy" shouldn't we for completeness also copy this? @JackTemaki opinions?

from i6_core.corpus.convert import CorpusToTxtJob
from i6_core.lexicon.conversion import LexiconToWordListJob

from i6_experiments.common.datasets.tedlium2_v2.corpus import get_bliss_corpus_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for data.py, some files are not pushed?

@Marvin84 Marvin84 force-pushed the ted-apostropf-merge branch 2 times, most recently from d5d3ebc to cf5884a Compare June 21, 2024 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants