Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing of CIGAR #101

Open
chlee-tabin opened this issue Feb 24, 2020 · 0 comments
Open

Parsing of CIGAR #101

chlee-tabin opened this issue Feb 24, 2020 · 0 comments

Comments

@chlee-tabin
Copy link

chlee-tabin commented Feb 24, 2020

Dear dropEst maintainers/developers:

I have been using the dropEst and found that in the code, dropEst actually does not parse out the CIGAR (it has a comment of "TODO"). Would this impact accurate mapping of the UMIs?

Three questions that I see it could matter is (if I understood the code correctly):

(1) it only checks the start and end coordinates which in some cases aligners map erroneously to a very distant part of the genome (with huge gap sequences in between) mapping/discarding the UMI erroneously. Would there be a fix of mapping the bulk aligned part to the gene?

(2) if I use -f option to use the tags from 10x cellranger, how does dropEst generate and count the intronic only UMIs (which are typically not tagged in cellranger generated .bam files)?

(3) Would there be a quick remedy of cases where the last coordinate is just outside an annotated 3'UTR, instead of dropEst categorizing the UMI to fall into HAS_NOT_ANNOTATED and discards it, to save it or even suggest a modified annotation?

Thank you so much for the package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants