Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Test String" would result in NN(Norwegian Nynorsk?) detection #246

Open
AutisticShark opened this issue Dec 7, 2024 · 0 comments
Open

Comments

@AutisticShark
Copy link

I am doing a translator project that involves automatic language detection, I set up the lingua lib as instructed, however, it exhibited strange behavior during testing.

Here is part of my code:

    if source_language == "auto":
        detector = LanguageDetectorBuilder.from_all_languages().with_preloaded_language_models().build()
        source_language = detector.detect_language_of(text).iso_code_639_1.name.lower()
        print(source_language)
        if source_language not in ['en', 'zh','ja','ko','fr','de','es','it','pt','ru','tr']:
            return "Unsupported language"

And here is the console log:

test string auto zh
nn
Unsupported language
Successful Response auto zh
en
成功回应
Test String auto zh
nn
Unsupported language
This is a Test String auto zh
en
这是一个测试弦。
A Test String auto zh
nn
Unsupported language

Note the lib can successfully detect Successful Response and This is a Test String as English, but it failed to detect test string, Test String, and A Test String.

Is this some kind of edge case, or I am doing something wrong here? Please advise, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant