You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems common to hyphenate the definite article (al-Fahd).
"Al-" and its variants (ash-, ad-, ar-, etc.) are always written in lower case (unless beginning a sentence), and a hyphen separates it from the following word.
Looking closer at the regular expressions fixing "son (daughter) of" etc , they seem to have different terminators based on whether the prefix can also be a 'forename' (Ben, Al, or Van). In those cases, rather than using \b, it uses (?=\s+\w). If \b(?=.+\w) were used instead, I think it would fix the Arabic issue.
Hebrew seems to also be not quite up to the current standard, as the test case 'ben Gurion' is actually more commonly seen as Ben-Gurion:
The text was updated successfully, but these errors were encountered:
Hello, @tenderlove! Thank you for porting this library!
While using this library, I received feedback that the way it handle the Arabic definite article 'al-' is not quite right.
Currently the rule looks like:
The corresponding test asserts
al Fahd
is the accepted standard.It seems common to hyphenate the definite article (
al-Fahd
).Looking closer at the regular expressions fixing "son (daughter) of" etc
, they seem to have different terminators based on whether the prefix can also be a 'forename' (Ben, Al, or Van). In those cases, rather than using
\b
, it uses(?=\s+\w)
. If\b(?=.+\w)
were used instead, I think it would fix the Arabic issue.Hebrew seems to also be not quite up to the current standard, as the test case 'ben Gurion' is actually more commonly seen as
Ben-Gurion
:The text was updated successfully, but these errors were encountered: