Dorado 0.5.1 basecaller segfaults. #1206

Patrick-McKeever · 2025-01-01T03:08:21Z

Issue Report

Dorado occasionally segfaults.

Please describe the issue:

The following command occasionally gives a segmentation fault (roughly 1 time in 10): dorado basecaller --emit-fastq /data/models/[email protected] /data/bff-test/data/pod5 -c 1000 -l /data/bff-test/bam/firstFilter -x cpu.

When run inside gdb, the backtrace is as follows (I am running gdb on the compiled binary, without debug symbols):

#0  0x00007fb768ed3afb in ?? () from /usr/local/lib/libiomp5.so
No symbol table info available.
#1  0x00007fb768f47843 in ?? () from /usr/local/lib/libiomp5.so
No symbol table info available.
#2  0x00007fb7689b0ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140425037705872, -6014001725922954685, 140422034653184, 0, 140425710733264, 140425037706224, 6045206947668865603, 6045845542746239555}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#3  0x00007fb768a41a04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
No locals.

Steps to reproduce the issue:

The command is dorado basecaller --emit-fastq /data/models/dna_r9.4.1_e8_fast\@v3.4 /data/bff-test/data/pod5 -c 1000 -l /data/bff-test/bam/firstFilter -x cpu. I cannot share the relevant pod5 file, since it is proprietary. This command usually runs to completion but will segfault infrequently.

Run environment:

Dorado version: 0.5.1+a7fb3e3d
Dorado command: dorado basecaller --emit-fastq /data/models/dna_r9.4.1_e8_fast\@v3.4 /data/bff-test/data/pod5 -c 1000 -l /data/bff-test/bam/firstFilter -x cpu
Operating system: Ubuntu 22.04.3 LTS docker image
Hardware (CPUs, Memory, GPUs): 128 GB RAM, 32 CPUs, 0 GPUs (command is run inside docker)
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
Source data location (on device or networked drive - NFS, etc.): Local ext4 drive
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): 157 MB
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue): I cannot provide this

Logs

root@c0b8097082df:/# dorado basecaller -v --emit-fastq /data/models/[email protected] /data/bff-test/data/pod5 -c 1000 -l /data/bff-test/bam/firstFilter  -x cpu > /data/filtered.fastq
[2025-01-01 03:06:21.060] [info]  - Note: FASTQ output is not recommended as not all data can be preserved.
[2025-01-01 03:06:21.060] [info] > Creating basecall pipeline
[2025-01-01 03:06:21.061] [debug] - CPU calling: set batch size to 128, num_cpu_runners to 30
Segmentation fault (core dumped)

The text was updated successfully, but these errors were encountered:

HalfPhoton · 2025-01-02T10:03:21Z

Hi @Patrick-McKeever,
I see you're using dorado-0.5.1 which is now a year old.
Is is possible to upgrade to the latest release?

I also see that you're setting the chunksize to 1000 (-c 1000) is there a particular reason for this? We normally do not recommend that users set this parameter.

Best regards,
Rich

HalfPhoton changed the title ~~Dorado basecaller segfaults.~~ Dorado 0.5.1 basecaller segfaults. Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dorado 0.5.1 basecaller segfaults. #1206

Dorado 0.5.1 basecaller segfaults. #1206

Patrick-McKeever commented Jan 1, 2025

HalfPhoton commented Jan 2, 2025

Dorado 0.5.1 basecaller segfaults. #1206

Dorado 0.5.1 basecaller segfaults. #1206

Comments

Patrick-McKeever commented Jan 1, 2025

Issue Report

Please describe the issue:

Steps to reproduce the issue:

Run environment:

Logs

HalfPhoton commented Jan 2, 2025