You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Matching names are a complex problem:
Some systems store them separately, others store it splitted up.
When matching, in many cases, the names come in without structure and need to be dynamically matched across multiple fields... or possibly a single one with different naming order.
Example:
company - firstname lastname
firstname lastname, company
And many more possible orders.
When matching a mail address, the data comes in as a single string, in my example:
"Miro Dietiker, MD Systems" [email protected]
Separating the mail is easy.
We could try splitting by know separators such as comma:
Miro Dietiker
MD Systems
And we could try splitting further by spaces.
Miro, Dietiker, MD, Systems
And then apply a contains (or even a equals) match on normalised name component fields.
Possibly we should only consider components that are 3+ characters long and keep shorter connected?
Also we could identify known titles.
We could also first try the longer connected names for a full match. Full matches would receive a higher score.
With partial matches of components, we possibly should divide the score by the total components somehow.
The matcher would need to offer a virtual field to pass in the combined data.
The text was updated successfully, but these errors were encountered:
Matching names are a complex problem:
Some systems store them separately, others store it splitted up.
When matching, in many cases, the names come in without structure and need to be dynamically matched across multiple fields... or possibly a single one with different naming order.
Example:
company - firstname lastname
firstname lastname, company
And many more possible orders.
When matching a mail address, the data comes in as a single string, in my example:
"Miro Dietiker, MD Systems" [email protected]
Separating the mail is easy.
We could try splitting by know separators such as comma:
Miro Dietiker
MD Systems
And we could try splitting further by spaces.
Miro, Dietiker, MD, Systems
And then apply a contains (or even a equals) match on normalised name component fields.
Possibly we should only consider components that are 3+ characters long and keep shorter connected?
Also we could identify known titles.
We could also first try the longer connected names for a full match. Full matches would receive a higher score.
With partial matches of components, we possibly should divide the score by the total components somehow.
The matcher would need to offer a virtual field to pass in the combined data.
The text was updated successfully, but these errors were encountered: