Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rethink data filtering #850

Open
adf-ncgr opened this issue Aug 21, 2023 · 0 comments
Open

rethink data filtering #850

adf-ncgr opened this issue Aug 21, 2023 · 0 comments

Comments

@adf-ncgr
Copy link
Contributor

currently, we have some brute force filtering implemented client side with regex matching applied to track names (micro and macro). While this has given us a lot of bang for the implementation buck when coupled with the naming conventions we use in LIS, it suffers from a few problems that seem worth addressing for GCV3:

  • as a client-side operation, it requires that the servers do a lot of extra computational work to deliver results that may only be a nuisance in the context of a specific use case (e.g. user is only interested in comparisons within glycine, but gets results for every species in LIS before applying a gly.* naming filter)
  • relying upon naming conventions isn't generally a good idea and sometimes fails to work (e.g. suppose the "glycine" user's gly.* filter based on our "gensp" naming also returned genomes from genus Glycyrrhiza)

since we may be upgrading the data model as part of GCV3, it might be a good time to also consider this (even though the arguments above would still have weight even if we left the data model as is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant