-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-filtering steps and requirements per sample #17
Comments
Some extra info: I have 460 samples with 3 replicates each. Multiple samples (and their replicates) can belong to a given condition, so that does make it slightly different compared to your example synthetic datasets (where it seems 1 sample belongs to 1 condition?). |
Hi Joanna, Oh,, wow that sounds like an impressively big dataset! I have nelly tested the package with more than a few dozen samples. To get an estimate how long
I find it difficult to give generic advice on this without a better understanding of your data and your research question. In general, it is not necessary to filter out samples where the protein is missing in all replicates, Best, |
Hi Constantin, thanks for your reply and time :) I have quite some more questions and I'm wondering if the GitHub issues tab is the most suitable place to discuss them, haha. Is it ok if I ask them in this thread? If not, please let me know how I should best contact you.
|
Hi there! I'm faced with a dataset where I can have up to 90% missing values for a given feature (m/z value but equivalent to a protein readout), and that means there are many cases where some samples have all NA's for all of their 3 replicates.
What would you recommend in terms of input filtering? Would you make sure that a feature has at least 1 replicate present for every sample? Or set some kind of maximum missing value threshold (ie. don't include features that have more than X% of samples missing)?
Currently I remove all m/z that have more than 90% of samples missing, but that leaves me with approx. 60 000 features and approx. 1500 samples.
I'm noticing that I can run the algorithm but it's not converging (at least not after 48 hours).
Thanks :)
The text was updated successfully, but these errors were encountered: