Unable to properly link sequence_spec to library_spec #62

massonix · 2025-02-16T21:20:26Z

Hi Sina and Pachter lab,

Thank you for developing this wonderful tool, it's going to be super helpful for our lab.

I've built a seqspec for the Flex kit, as I couldn't find it in the assays directory of this repo. I plan to submit it for review following the instructions provided here as soon as I solve this issue.

After running the following command:

seqspec print -f seqspec-html spec.yaml > spec.html

I get this html:

As you can see, seqspec is not properly linking the library structure to the fastq files. I've created it based on other examples I found here.Here the spec file:

spec.txt

For reference, this is how the read1 and 2 looks like for my example dataset:

R1:

R2:

Note that the R2 should be 88bp: 25 RHS + 25 RHS + 16 constant sequence + 10 probe barcode + 12 pCS1. However, the actual read in the fastq is 90bp long, which may be problematic.

I'm running seqspec 0.3.1 and python 3.10.16

Many thanks in advance for your kind help!

Ramon

The text was updated successfully, but these errors were encountered:

sbooeshaghi · 2025-02-20T20:11:32Z

Hi Ramon, thank you for submitting this issue. It’s great to see you putting together a seqspec for data—I’m happy to help debug.

I’ll take a look at the generated HTML page. Quick question: what does the sequencer report for the final two bases? If the sequenceable part of the library is length n and the sequencer produces reads of length n+2, are random bases being added?

sbooeshaghi · 2025-02-21T01:06:29Z

Ah sorry that was kind of a silly question. The answer is obviously that it would incorporate bases that correspond to the pCS1 and go into the UMI. I've mocked up the html output for your seqspec below. Would appreciate your feedback as to whether this correctly visualizes the link between your reads and library structure.

massonix · 2025-02-21T03:06:07Z

Hi Sina, thanks so much for looking into it! Exactly, the two extra bases are from the UMI.

For reference, this is the library structure

(obtained from page 20 in this protocol)

Read 1 contains the cell barcode (16bp) + UMI (12bp), so should look like this ( from the 10x-RNA-v3 seqspec )

Read 2 contains probe insert (50bp) + constant sequence + probe BC + pCS1, so they should also be under "rna-R2.fastq.gz". rna-I2.fastq.gz contains the rna-index5, and rna-I1.fastq.gz the rna-index7.

I understood from looking at the examples that seqspec figured these connections by combining primer_id, strand, min_len and max_len, but I likely specified something wrong.

Hope this helps!

massonix · 2025-02-21T16:18:02Z

This indeed perfectly matches the link between reads and the library structure 👍

sbooeshaghi · 2025-02-21T22:46:16Z

Hi @massonix,

I’ve updated seqspec print to display the reads on the sequence. You can test it by installing seqspec from the devel branch, formatting your spec file, and then running seqspec print:

pip install git+https://github.com/pachterlab/seqspec.git@devel
seqspec format -o spec.yaml spec.txt
seqspec print -f seqspec-html -o spec.html spec.yaml

Let me know if you run into any issues!

massonix · 2025-02-26T16:52:18Z

Hi Sina,

Apologies for the delay, I finally had time to look into this. I ran the code that you provided and obtained this html:

I love the arrows added in the "Final Library" section, they make it easier to understand the connection between library structure and sequencing reads. However, the "Sequence structure" and "Library structure" sections still appear disconnected. I think the true genius of these htmls is the hierarchical dropdown arrows, where users can click in R1.fastq and immediately find the UMI and cell barcode contained there, which is the case for all the htmls in IGVF. For instance:

To discard that this is an error specific to my seqspec, I ran the same line on 10x_rna_v3.spec.yaml and obtained this:

which is different from the html in IGVF linked above. This may be an overkill to implement tho in the edge cases where one of the reads extends into the other, so I totally understand if this feature is not prioritized. Thanks in advance and happy to discuss and test this further if needed :)

massonix · 2025-03-05T23:34:55Z

Hi Sina, I'm building another spec for a new version of NTT-seq. I set the sequence_type of the gDNA region to random, with a min_len of 100 and a max_len of 500:

  - !Region
    parent_id: histone_mark
    region_id: histone_mark-gDNA
    region_type: gdna
    name: transposed gDNA next to targeted histone mark
    sequence_type: random
    sequence: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    min_len: 100
    max_len: 500
    onlist: null
    regions: null

After converting to html, I get this:

See how I need to scroll all the way to the right to get the arrows with the actual reads. I guess that seqspec is picking the max-len to draw the reads, but it'd be better if it was something like:

Another option could be to encode in the sequence that the number of nucleotides can vary (something like XXX...XXX), to prevent long sequences, similar to

In any case, I'm really happy with how seqspec works, it's truly the "lingua franca" of genomics assays ^^, my labmates are already loving it! Let me know if I can help with anything on my end.

Ramon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to properly link sequence_spec to library_spec #62

Unable to properly link sequence_spec to library_spec #62

massonix commented Feb 16, 2025

sbooeshaghi commented Feb 20, 2025

sbooeshaghi commented Feb 21, 2025

massonix commented Feb 21, 2025

massonix commented Feb 21, 2025

sbooeshaghi commented Feb 21, 2025

massonix commented Feb 26, 2025

massonix commented Mar 5, 2025

Unable to properly link sequence_spec to library_spec #62

Unable to properly link sequence_spec to library_spec #62

Comments

massonix commented Feb 16, 2025

sbooeshaghi commented Feb 20, 2025

sbooeshaghi commented Feb 21, 2025

massonix commented Feb 21, 2025

massonix commented Feb 21, 2025

sbooeshaghi commented Feb 21, 2025

massonix commented Feb 26, 2025

massonix commented Mar 5, 2025