Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undersplitting sentence ending in URL #25

Open
peter-lang-dealogic opened this issue Feb 28, 2022 · 0 comments
Open

Undersplitting sentence ending in URL #25

peter-lang-dealogic opened this issue Feb 28, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@peter-lang-dealogic
Copy link
Contributor

peter-lang-dealogic commented Feb 28, 2022

Example:
Jackson Hospital's website at https://www.jackson-hospital.com. Individuals may also write to Jackson Hospital's Privacy Officer at 4250 Hospital Drive, Marianna, Florida 32446.

Tokens around expected splitting: ["website", "at", "https://www.jackson", "-hospital.com", ".", "Individuals", "may"]

Rule which causes sentence not to split:

elif "." in token_before and token_after != ".":

This rule cannot be changed easily, as it catches most dot-separated abbreviations, eg.: "U.S."

Additional examples:

  • The quarterly report and presentation will be made available on the company's website www.polight.com and www.newsweb.no. CEO Øyvind Isaksen and CFO Alf Henning Bekkevik will present the company's results at 09:00 am CEST through live webcast.
  • "Brian Trisler is a recognized and lauded visionary in the healthcare community," said Patrick Nagle, founder and CEO of Rehab.com. "As a co-founder of A Place for Mom, Brian created a multi-billion dollar company that enriches the lives of seniors and their families.
@fnl fnl added the bug Something isn't working label Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants