Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate alternative typos with a translation table #2985

Merged
merged 2 commits into from
Aug 8, 2023

Commits on Aug 7, 2023

  1. Demonstrate issue with typographic apostrophe U+2019 (’)

    We miss misspellings using the typographic apostrophe or acute accent
    U+2019 (’) because the typos in our dictionaries use the typewriter
    apostrophe U+0027 (').
    DimitriPapadopoulos committed Aug 7, 2023
    Configuration menu
    Copy the full SHA
    333c2a0 View commit details
    Browse the repository at this point in the history
  2. Generate alternative typos with a translation table

    This way we can catch misspellings with alternative characters,
    typically typographic apostrophe or acute accent U+2019 (´)
    instead of typewriter apostrophe U+0027 ('). In this case,
    the alternative character is a valid character and will be
    used both in the misspelling and the fix(es).
    
    The above is different from detecting Unicode phishing, where
    some characters like `A` are intentionally, or not, replaced
    by lookalikes such as `A`, `Α`,  `А`,  `ᗅ`, `ᴀ`,  `A`.
    In that case, the alternative character is invalid and should
    be replaced by its valid counterpart in the fix. We do not
    address that case here.
    DimitriPapadopoulos committed Aug 7, 2023
    Configuration menu
    Copy the full SHA
    2010c78 View commit details
    Browse the repository at this point in the history