Skip to content

One-liners for finding whether something is English or not #184

Answered by TomLucidor
TomLucidor asked this question in Q&A
Discussion options

You must be logged in to vote

Sorry I found my solution for language classification for a quote dataset (realizing that "builders" looks intimidating but there should be defaults for spoken language), and that in most cases such quotes are monolingual (assumed so even though they can be bilingual e.g. French with English translation).

from lingua import Language, LanguageDetectorBuilder
detector = LanguageDetectorBuilder.from_all_spoken_languages().build() # was trying to find this
array = ["some string", "some other string"] # as example
for i in array:
    if detector.detect_language_of(i) != Language.ENGLISH: print(i)

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@TomLucidor
Comment options

Answer selected by TomLucidor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants