You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
plantR::prepName cannot deal with cases such as "Sobrinho, J. de P.L. (no. 1441)"
> plantR::prepName('Sobrinho, J. de P.L. (no. 1441)')
Error in gsub(x, "", y, perl = TRUE) :
expressão regular inválida ')|Sobrinho'
Além disso: Warning message:
In gsub(x, "", y, perl = TRUE) : erro de compilação de padrão PCRE
'unmatched closing parenthesis'
at ')|Sobrinho'
This is because when a parenthesis (or bracket) is found, the function only tracks them if they are at the beginning and the end, i.e., "(João Silva)":
Cases such as "Sobrinho, J. de P.L. (no. 1441)" are not accounted for and an error is returned. I've been thinking about how to solve this, since at the end of the function those brackets and parenthesis are returned, but for cases like the one I mentioned, this exercise becomes too complicated. So I looked at thousands of cases like this and pretty much all of them are one of
some location, ("Parc National de Port-Cros)"
some institute name, "(INFLOVAR (Association))"
another name, "Franklin, M.A. (Ben)" or
a (potential) collector number, "Luetzelburg, P. von (no. 23045)"
But since collector numbers are not extracted from these columns (prepName is used on $recordedBy and $identifiedBy), I think in cases such as "Sobrinho, J. de P.L. (no. 1441)", everything inside the parentheses (including them) could be removed. The function preps a name only, and what's inside the within-string parenthesis is not used for anything else. Also, if only the parentheses are removed, i.e., "Sobrinho, J. de P.L. no. 1441", then the output gets messy and considers "No." the surname.
If wanted, to remove the within-string parentheses and what's inside, it's possible to use
x <- trimws(ifelse(grepl("(?<!^)\\(", x, perl = TRUE) | grepl("\\)(?!$)", x, perl = TRUE), gsub("\\([^)]*\\)", "", x), x))
It will still keep cases such as "(João Silva)" as is.
The lines above could be added just before this step (line 14) in the prepName function: parent <- grepl("^\\(", x, perl = TRUE) & grepl("\\)$", x, perl = TRUE)
The text was updated successfully, but these errors were encountered:
plantR::prepName cannot deal with cases such as "Sobrinho, J. de P.L. (no. 1441)"
This is because when a parenthesis (or bracket) is found, the function only tracks them if they are at the beginning and the end, i.e., "(João Silva)":
Below are lines 11 to 18 of
prepName
Cases such as "Sobrinho, J. de P.L. (no. 1441)" are not accounted for and an error is returned. I've been thinking about how to solve this, since at the end of the function those brackets and parenthesis are returned, but for cases like the one I mentioned, this exercise becomes too complicated. So I looked at thousands of cases like this and pretty much all of them are one of
But since collector numbers are not extracted from these columns (
prepName
is used on$recordedBy
and$identifiedBy
), I think in cases such as "Sobrinho, J. de P.L. (no. 1441)", everything inside the parentheses (including them) could be removed. The function preps a name only, and what's inside the within-string parenthesis is not used for anything else. Also, if only the parentheses are removed, i.e., "Sobrinho, J. de P.L. no. 1441", then the output gets messy and considers "No." the surname.If wanted, to remove the within-string parentheses and what's inside, it's possible to use
x <- trimws(ifelse(grepl("(?<!^)\\(", x, perl = TRUE) | grepl("\\)(?!$)", x, perl = TRUE), gsub("\\([^)]*\\)", "", x), x))
It will still keep cases such as "(João Silva)" as is.
The lines above could be added just before this step (line 14) in the
prepName
function:parent <- grepl("^\\(", x, perl = TRUE) & grepl("\\)$", x, perl = TRUE)
The text was updated successfully, but these errors were encountered: