-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functions handle NAs in input incorrectly #224
Comments
I had a feeling this one was more pervasive. Something we should keep in mind for future contributions as well. |
Thanks for bringing this up in PR #223! The issue came up in an example for the paper where |
Just need to edit each function to have something like: if(is.na(query){
return(NA)
} And add tests to make sure it still returns the expected data type (i.e. a tibble). |
Yes, something like this. However, if a function returns e.g. a list of 8 elements, then it has to return the same structure when the input is |
I agree that the same structure should be returned, but not necessarily with the same number of elements. So if a function returns a l = list(iris[1:10, ], data.frame(Species = NA))
dplyr::bind_rows(l)
data.table::rbindlist(l, fill = TRUE) So, a structure like you suggest in PR #225 in the chebi.R file is in my opinion not necessary. What do you think? |
Great point! In case of I am unsure about the query functions like |
Yes, it's a rather complex data structure that is returned from ChEBI (i.e. a ontology - a data graph) which is too difficult, or better an overkill to squash into one data.frame in my opinion. So I think there's no reason to bind it in general (be it for the user or us webchem devs). I thought the discussion in #193 whether to uniform the output only concerns the l = chebi_comp_entity(c('CHEBI:27744', 'whatever', 'CHEBI:17790'))
l2 = l[ sapply(l, function(x) !is.na(x[[1]][1])) ] Generally it's not the easiest topic. Hm, maybe your approach, to have the Actually I never really bothered much about erroneous ID inputs because my workflow in ChEBI looks as such:
|
I expect that most querry functions like |
With the new PubChem PUG-View web service we can access a whole PubChem page as a JSON object. Depending on the compound, these objects might not have the same paragraphs, e.g. the acetic acid page contains pKa information but the sulfuric acid page doesn't, and so a simple pKa extractor function won't work if we don't handle the missing list element. With PubChem this will be very common, and this is completely natural. Also very similar to the issue with invalid inputs. In our data flow we often have "input"-> Example:
|
Ok, that sound also like a good plan to me, to handle NAs in the extractor functions!
## list() approach
l = list(iris[1:10, ],
list(),
iris[1:10, ])
# remove erroneous results
l[ !sapply(l, length) == 0 ]
l[ !lengths(l) == 0 ]
## NA approach
l2 = list(iris[1:10, ],
NA,
iris[1:10, ])
# remove erroneous results
l2[ !is.na(l2) ] Sorry, to bring it up agian. What do you think? Actually, This discussion definitely also belongs to #218 |
|
Great conversation, btw! |
I see no reason for |
The following functions still have an unexpected behavior when given
Edit: I removed functions that errored when only given NA from this list. |
Thanks @Aariq, just couldn't get there lately. |
If you'd like to work on this issue, feel free to reassign it to yourself. |
aw_query(NA)
returns http error 404.get_chebiid(NA)
returns sodium containing compounds.ci_query(NA)
returnsNA
, but the output structure is different from a valid query.get_csid(NA)
and all other ChemSpider functions return client error: (400) bad request.These are just the first few, I believe most functions in the package are affected. Since a lot of webchem functions are vectorised, I think it is important that these functions always return the same output structure even if the input is invalid. I think we have to find a robust solution and implement it systematically in all of our affected functions.
The text was updated successfully, but these errors were encountered: