Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance nickname processing #122

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
remove testREGEXES
remove testREGEXES.py from repository
aikimark committed Mar 21, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit 4a395cffa3c1fe82a042f2f5e7b74d70575db9f3
49 changes: 0 additions & 49 deletions nameparser/config/testREGEXES.py

This file was deleted.

2 changes: 2 additions & 0 deletions nameparser/config/titles.py
Original file line number Diff line number Diff line change
@@ -166,6 +166,7 @@
'chef',
'chemist',
'chief',
'chief justice',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of surprised if this works, but I guess I could see it because of how the titles chain together. "Justice" is a somewhat common first name so we couldn't just add that as it's own title, so if this works it's a nice workaround.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that "Justice" was a common first name. I don't think my "Chief Justice" string is being matched due to the prior parsing actions. I'm not sure what qualifies as "common first name". "Justice" is around the 580th most common first name in America. However, I think it is probably more common than the number of judges.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke with a judge and asked him about the use of the title "Justice". He said it was rare. I'll undo this change, since it was based on a false assumption.

The judge expressed some dismay that titles were being used as first name. He has encountered people with first names, such as "King" and "Queen", in his courtroom.

We might want to include the parser's bias for first names over titles.

If someone is parsing names of titled people (think UN delegations), what should they do?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bias for first names over titles is already a feature of the parser, and why there are no potential first names in the titles constant. First job of a name parser is to parse names, then it can optionally parse titles but not if it messes up names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do some testing after removing "Justice" and "Chief Justice" from the list. I might add a tests for "David Justice" and "Justice, David", the baseball player.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I work in legal data, and I'll note that there are very few "Justices" though I suppose "justice of the peace" is a title. But in my experience, "justice" is reserved pretty much exclusively for the SCOTUS justices. You can see the way this shakes out on this page (though it doesn't discuss this topic): https://www.uscourts.gov/judges-judgeships/about-federal-judges

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, meant to say that relatedly, "J." is a very common title among judges. I'm guessing it can't be added b/c it's one letter, but I thought I'd throw that out there.

'chieftain',
'choreographer',
'civil',
@@ -339,6 +340,7 @@
'judicial',
'junior',
'jurist',
'justice',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My friend Justice would be upset that the parser would not recognize his first name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are about the same number of people named "Justice" as there are judges in America (~33000). What does the name parser do, or what should it do, when it encounters several names? What if "Justice" is one of the first of the words in a multi-name string?

This is a question similar to the one that I posed for myself when I first approached the problem of nicknames that might also be suffixes. I didn't have a good answer, so I abandoned that original approach. It is still an unanswered question.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplest use case for this parser is just Firstname Lastname. I feel like when there is conflict with other things the parser should/could do (ex: recognize titles), those other things should be sacrificed to preserve it's ability to split up a simple name. There is a fairly simple workaround if someone using the parser wants to change it, and a human interacting with the parser could add their fist name and the parser would then figure out that it's a title, kind of like if you were interacting with a human and they had the same confusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't noticed a test for this. I'll look at it and alter my John Roberts test accordingly.

I'll remove "Justice" from the titles list.

'keyboardist',
'kingdom',
'knowledge',