Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRCh38 gtf file being used even when custom gff file is supplied #90

Open
davidecarlson opened this issue Apr 11, 2023 · 0 comments
Open
Labels
bug Something isn't working

Comments

@davidecarlson
Copy link

Description of the bug

I tried running the pipeline using my own fasta and gff files, but instead of converting the gff to gtf, the human GRCh38 gtf file was downloaded and used instead, which causes errors later on, when the genome assembly and the gtf do not match.

Note that it also appears that the GRCh38 STAR index is automatically being used as well instead of building a new index from the provided fasta file.

After converting my gff to gtf (and also building a STAR index outside of the nf-core pipeline), this issue does not occur. However, the docs indicate that a gff file can be supplied and will automatically be converted to gtf, which didn't seem to happen here.

Command used and terminal output

nextflow run nf-core/rnavar --input /gpfs/projects/GenomicsCore/fastqs/Sipperly/sipperly_samplesheet.csv --outdir sipperly_out --fasta ref/Dniv87_Chicago_SSPACE_LINKS14_1kb_ChromonomerRun4_integrated_20July2018.fasta --gff ref/Dniv87_Chicago_SSPACE_LINKS14_1kb_ChromonomerRun4_integrated_20July2018_unwrap.all.Run5.maker.genes.gff --skip_baserecalibration -profile seawulf

...

Apr-10 12:39:44.948 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_RNAVAR:RNAVAR:GATK4_BEDTOINTERVALLIST (genome.bed)'

Caused by:
  Process `NFCORE_RNAVAR:RNAVAR:GATK4_BEDTOINTERVALLIST (genome.bed)` terminated with an error exit status (3)

Command executed:

  gatk --java-options "-Xmx36g" BedToIntervalList \
      --INPUT exome.bed \
      --OUTPUT genome.bed.interval_list \
      --SEQUENCE_DICTIONARY Dniv87_Chicago_SSPACE_LINKS14_1kb_ChromonomerRun4_integrated_20July2018.dict \
      --TMP_DIR . \
  
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNAVAR:RNAVAR:GATK4_BEDTOINTERVALLIST":
      gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
  END_VERSIONS

Command exit status:
  3

Command output:
  (empty)

Command error:
  Using GATK jar /usr/local/share/gatk4-4.2.6.1-0/gatk-package-4.2.6.1-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx36g -jar /usr/local/share/gatk4-4.2.6.1-0/gatk-package-4.2.6.1-local.jar BedToIntervalList --INPUT exome.bed --OUTPUT genome.bed.interval_list --SEQUENCE_DICTIONARY Dniv87_Chicago_SSPACE_LINKS14_1kb_ChromonomerRun4_integrated_20July2018.dict --TMP_DIR .
  16:39:41.962 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/gatk4-4.2.6.1-0/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
  [Mon Apr 10 16:39:41 GMT 2023] BedToIntervalList --INPUT exome.bed --SEQUENCE_DICTIONARY Dniv87_Chicago_SSPACE_LINKS14_1kb_ChromonomerRun4_integrated_20July2018.dict --OUTPUT genome.bed.interval_list --TMP_DIR . --SORT true --UNIQUE false --DROP_MISSING_CONTIGS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
  [Mon Apr 10 16:39:42 GMT 2023] Executing as decarlson@dn022 on Linux 4.18.0-425.3.1.el8.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.9.1-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.2.6.1
  [Mon Apr 10 16:39:42 GMT 2023] picard.util.BedToIntervalList done. Elapsed time: 0.00 minutes.
  Runtime.totalMemory()=2147483648
  To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
  picard.PicardException: Sequence 'chr1' was not found in the sequence dictionary
  	at picard.util.BedToIntervalList.doWork(BedToIntervalList.java:156)
  	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
  	at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
  	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
  	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
  	at org.broadinstitute.hellbender.Main.main(Main.java:289)

Work dir:
  /gpfs/projects/GenomicsCore/nf-core/rnavar/sipperly/work/4e/3ddd2803d8b75b6479ad2394facc8c

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Apr-10 12:39:44.967 [Task monitor] INFO  nextflow.Session - Execution cancelled -- Finishing pending tasks before exit

Relevant files

nextflow.log

System information

nextflow version 23.04.0.585
Hardware: HPC
Executor: Slurm
Container: Singularity
OS: Rocky Linux
version of nf-core/rnavar: v1.0.0

@davidecarlson davidecarlson added the bug Something isn't working label Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant