Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop optional CTM n-best header #542

Merged
merged 2 commits into from
Sep 11, 2024
Merged

Conversation

NeoLegends
Copy link
Contributor

This optional header fields leads to issues in GLM mapping, resulting in high (almost 100 percent) deletion rates, because all lines with fewer than 7 columns are dropped from the file.

This optional header fields leads to issues in GLM mapping leading to high (almost 100 percent) deletion rates, because all lines with fewer than 7 columns are dropped from the file.
@NeoLegends NeoLegends added the bug Something isn't working label Sep 11, 2024
@NeoLegends NeoLegends self-assigned this Sep 11, 2024
Copy link
Contributor

@michelwi michelwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with dropping the field in the header, but the actual bug, I think, lies in our (AppTek) scoring pipeline and we are diskussing how to best fix it atm.

@albertz
Copy link
Member

albertz commented Sep 11, 2024

Just to clarify: Can you explain why this was not an issue so far? Because we usually don't do GLM mapping?

@michelwi
Copy link
Contributor

Just to clarify: Can you explain why this was not an issue so far? Because we usually don't do GLM mapping?

The "bug" is - I think - in the AppTek scoring pipeline where we recently started to use the number of fields in the header to discard "invalid" lines in the CTM which do not have the correct number of fields.

So far this was not an issue because no one would use the ctm outputs of this job with the AppTek scoring pipeline and the AppTek asr pipeline would not print this optional field in the header.

@albertz
Copy link
Member

albertz commented Sep 11, 2024

Ok, then I would mention exactly that in a code comment. But in any case, I think the code change is otherwise ok then.

@NeoLegends NeoLegends merged commit 74711c5 into main Sep 11, 2024
4 checks passed
@NeoLegends NeoLegends deleted the moritz-search-words-ctm-header branch September 11, 2024 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants