Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dorado 0.5.1 basecaller segfaults. #1206

Open
Patrick-McKeever opened this issue Jan 1, 2025 · 1 comment
Open

Dorado 0.5.1 basecaller segfaults. #1206

Patrick-McKeever opened this issue Jan 1, 2025 · 1 comment

Comments

@Patrick-McKeever
Copy link

Issue Report

Dorado occasionally segfaults.

Please describe the issue:

The following command occasionally gives a segmentation fault (roughly 1 time in 10): dorado basecaller --emit-fastq /data/models/[email protected] /data/bff-test/data/pod5 -c 1000 -l /data/bff-test/bam/firstFilter -x cpu.

When run inside gdb, the backtrace is as follows (I am running gdb on the compiled binary, without debug symbols):

#0  0x00007fb768ed3afb in ?? () from /usr/local/lib/libiomp5.so
No symbol table info available.
#1  0x00007fb768f47843 in ?? () from /usr/local/lib/libiomp5.so
No symbol table info available.
#2  0x00007fb7689b0ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140425037705872, -6014001725922954685, 140422034653184, 0, 140425710733264, 140425037706224, 6045206947668865603, 6045845542746239555}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#3  0x00007fb768a41a04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
No locals.

Steps to reproduce the issue:

The command is dorado basecaller --emit-fastq /data/models/dna_r9.4.1_e8_fast\@v3.4 /data/bff-test/data/pod5 -c 1000 -l /data/bff-test/bam/firstFilter -x cpu. I cannot share the relevant pod5 file, since it is proprietary. This command usually runs to completion but will segfault infrequently.

Run environment:

  • Dorado version: 0.5.1+a7fb3e3d
  • Dorado command: dorado basecaller --emit-fastq /data/models/dna_r9.4.1_e8_fast\@v3.4 /data/bff-test/data/pod5 -c 1000 -l /data/bff-test/bam/firstFilter -x cpu
  • Operating system: Ubuntu 22.04.3 LTS docker image
  • Hardware (CPUs, Memory, GPUs): 128 GB RAM, 32 CPUs, 0 GPUs (command is run inside docker)
  • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
  • Source data location (on device or networked drive - NFS, etc.): Local ext4 drive
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): 157 MB
  • Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue): I cannot provide this

Logs

root@c0b8097082df:/# dorado basecaller -v --emit-fastq /data/models/[email protected] /data/bff-test/data/pod5 -c 1000 -l /data/bff-test/bam/firstFilter  -x cpu > /data/filtered.fastq
[2025-01-01 03:06:21.060] [info]  - Note: FASTQ output is not recommended as not all data can be preserved.
[2025-01-01 03:06:21.060] [info] > Creating basecall pipeline
[2025-01-01 03:06:21.061] [debug] - CPU calling: set batch size to 128, num_cpu_runners to 30
Segmentation fault (core dumped)
@HalfPhoton
Copy link
Collaborator

Hi @Patrick-McKeever,
I see you're using dorado-0.5.1 which is now a year old.
Is is possible to upgrade to the latest release?

I also see that you're setting the chunksize to 1000 (-c 1000) is there a particular reason for this? We normally do not recommend that users set this parameter.

Best regards,
Rich

@HalfPhoton HalfPhoton changed the title Dorado basecaller segfaults. Dorado 0.5.1 basecaller segfaults. Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants