You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working with a number of PDF documents from which I need to pull specific data. So far, I've successfully extracted both the date and status from these documents. However, I've been unable to pull out full names.
There are several challenges when it comes to name extraction:
The location of the name isn't consistent.
The name may be in different formats (either first name + last name, or first name + middle name + last name).
There might be varying labels either before or after the name (e.g., "Location:", "112street -", commas, etc.).
Because of these factors, using Microsoft Syntex to extract full names from the PDFs has proved quite difficult.
I've attempted to try different methods to solve this issue, using the following methods:
With regular expression => [A-Za-z]+[ ]+[A-Za-z] (for First name + space + Last name)
Tried with before label and after label. The after label works partially.
The Invoice model is working and can pull the full name from the document(most cases), but not able to pull the status.
But unfortunately, I'm yet to achieve success. I'd welcome any suggestions you may have.
This discussion was converted from issue #29 on August 23, 2023 21:03.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi Community,
I'm working with a number of PDF documents from which I need to pull specific data. So far, I've successfully extracted both the date and status from these documents. However, I've been unable to pull out full names.
There are several challenges when it comes to name extraction:
The location of the name isn't consistent.
The name may be in different formats (either first name + last name, or first name + middle name + last name).
There might be varying labels either before or after the name (e.g., "Location:", "112street -", commas, etc.).
Because of these factors, using Microsoft Syntex to extract full names from the PDFs has proved quite difficult.
I've attempted to try different methods to solve this issue, using the following methods:
With regular expression => [A-Za-z]+[ ]+[A-Za-z] (for First name + space + Last name)
Tried with before label and after label. The after label works partially.
The Invoice model is working and can pull the full name from the document(most cases), but not able to pull the status.
But unfortunately, I'm yet to achieve success. I'd welcome any suggestions you may have.
I want to extract the full name using the Syntex.
Beta Was this translation helpful? Give feedback.
All reactions