-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to properly link sequence_spec to library_spec #62
Comments
Hi Ramon, thank you for submitting this issue. It’s great to see you putting together a seqspec for data—I’m happy to help debug. I’ll take a look at the generated HTML page. Quick question: what does the sequencer report for the final two bases? If the sequenceable part of the library is length n and the sequencer produces reads of length n+2, are random bases being added? |
Hi Sina, thanks so much for looking into it! Exactly, the two extra bases are from the UMI. For reference, this is the library structure ![]() (obtained from page 20 in this protocol) Read 1 contains the cell barcode (16bp) + UMI (12bp), so should look like this ( from the 10x-RNA-v3 seqspec ) ![]() Read 2 contains probe insert (50bp) + constant sequence + probe BC + pCS1, so they should also be under "rna-R2.fastq.gz". rna-I2.fastq.gz contains the rna-index5, and rna-I1.fastq.gz the rna-index7. I understood from looking at the examples that Hope this helps! |
Hi @massonix, I’ve updated seqspec print to display the reads on the sequence. You can test it by installing seqspec from the devel branch, formatting your spec file, and then running seqspec print: pip install git+https://github.com/pachterlab/seqspec.git@devel
seqspec format -o spec.yaml spec.txt
seqspec print -f seqspec-html -o spec.html spec.yaml Let me know if you run into any issues! |
Hi Sina, Apologies for the delay, I finally had time to look into this. I ran the code that you provided and obtained this html: ![]() I love the arrows added in the "Final Library" section, they make it easier to understand the connection between library structure and sequencing reads. However, the "Sequence structure" and "Library structure" sections still appear disconnected. I think the true genius of these htmls is the hierarchical dropdown arrows, where users can click in R1.fastq and immediately find the UMI and cell barcode contained there, which is the case for all the htmls in IGVF. For instance: ![]() To discard that this is an error specific to my seqspec, I ran the same line on 10x_rna_v3.spec.yaml and obtained this: ![]() which is different from the html in IGVF linked above. This may be an overkill to implement tho in the edge cases where one of the reads extends into the other, so I totally understand if this feature is not prioritized. Thanks in advance and happy to discuss and test this further if needed :) |
Hi Sina, I'm building another spec for a new version of NTT-seq. I set the
After converting to html, I get this: See how I need to scroll all the way to the right to get the arrows with the actual reads. I guess that seqspec is picking the max-len to draw the reads, but it'd be better if it was something like: Another option could be to encode in the sequence that the number of nucleotides can vary (something like XXX...XXX), to prevent long sequences, similar to In any case, I'm really happy with how seqspec works, it's truly the "lingua franca" of genomics assays ^^, my labmates are already loving it! Let me know if I can help with anything on my end. Ramon |
Hi Sina and Pachter lab,
Thank you for developing this wonderful tool, it's going to be super helpful for our lab.
I've built a seqspec for the Flex kit, as I couldn't find it in the assays directory of this repo. I plan to submit it for review following the instructions provided here as soon as I solve this issue.
After running the following command:
I get this html:
As you can see, seqspec is not properly linking the library structure to the fastq files. I've created it based on other examples I found here.Here the spec file:
spec.txt
For reference, this is how the read1 and 2 looks like for my example dataset:
R1:
R2:
Note that the R2 should be 88bp: 25 RHS + 25 RHS + 16 constant sequence + 10 probe barcode + 12 pCS1. However, the actual read in the fastq is 90bp long, which may be problematic.
I'm running seqspec 0.3.1 and python 3.10.16
Many thanks in advance for your kind help!
Ramon
The text was updated successfully, but these errors were encountered: