Please consider adding support for multi-thread support #60

dbrami · 2022-07-05T17:25:53Z

Being able to scatter/gather the work over multiple CPU cores would really help speed-up your script.

susannasiebert · 2022-07-05T17:31:21Z

Thank you for your interest in VAtools. All of the tools in this toolkit are just IO for the most part so shouldn't take very long. Is there a specific tool that has been running slow for you?

dbrami · 2022-07-13T13:46:04Z

Hi,
Indeed, "vcf-info-annotator" has been working much slower than hoped. My decompressed VCF file is about 75 GB. The program was writing about 1 MB every 5 min on a beefy AWS machine.
I know it's not a lot of info. Let me know what you would like to see in order to figure out how use your script more efficiently or if some code fixes can speed things up.
Thanks

susannasiebert · 2022-07-13T14:13:32Z

ah ok. I've definitely never run it on that large of a file. I'll have a look to see how things can be improved.

lukaas33 · 2024-08-29T12:19:41Z

I am encountering the same issue. A VCF of 11GB has been running for over 12 hours now.

susannasiebert · 2024-08-29T13:41:16Z

How big is your TSV? That file is being read into memory so you'll want to make sure that you have at least that amount of memory available. You're process is probably stuck swapping memory and not actually doing any/much work.

lukaas33 · 2024-08-29T14:41:00Z

The TSV is only 3MB, the gtf around 300MB.

Here is my top output:

It seems cpu is used maximally as well as the memory (64GB total).

The exact command that I am using is vcf-expression-annotator -s sample -o /shared_dir/temp.vcf /shared_dir/neoantigen.vep.vcf /shared_dir/neoantigen.transcript.abundance.gtf stringtie transcript && vcf-expression-annotator -s sample -o /shared_dir/neoantigen.stringtie.vcf /shared_dir/temp.vcf /shared_dir/neoantigen.gene.abundance.tsv stringtie gene

I am using the Docker container.

susannasiebert · 2024-08-29T15:02:12Z

That's strange. I'm not sure why you are seeing multiple processes either. Do you see the same behavior when you run the two steps as separate commands?

The gtf parsing library we are using uses pandas underneath the hood, which, unfortunately, can use up a lot of memory (more than expected) because of the way it stores some data. Would you mind sending me your gtf file so I can play around with it?

lukaas33 · 2024-08-29T15:19:03Z

When using the && separator these commands are executed seperately.
I also tried to run only the first command but this also took too long.

I will email you these files.

susannasiebert · 2024-08-29T17:53:35Z

Ok, this is definitely not an issue with the GTF file. It's able to read it in just fine but you have over 4.5 million VCF entries so processing just takes a while. I'm not sure if there is a good programatic way to fix this, tbh, while still preserving the ordering of the VCF. You could try manually splitting the VCF into smaller subsets before running them through the annotator.

lukaas33 · 2024-08-30T08:51:38Z

Ah thank you.
Perhaps another approach would be to filter the VCF or limit the contigs I align to upstream.
But not really an issue with this tool then.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please consider adding support for multi-thread support #60

Please consider adding support for multi-thread support #60

dbrami commented Jul 5, 2022

susannasiebert commented Jul 5, 2022

dbrami commented Jul 13, 2022

susannasiebert commented Jul 13, 2022

lukaas33 commented Aug 29, 2024

susannasiebert commented Aug 29, 2024 •

edited

Loading

lukaas33 commented Aug 29, 2024 •

edited

Loading

susannasiebert commented Aug 29, 2024

lukaas33 commented Aug 29, 2024

susannasiebert commented Aug 29, 2024

lukaas33 commented Aug 30, 2024

Please consider adding support for multi-thread support #60

Please consider adding support for multi-thread support #60

Comments

dbrami commented Jul 5, 2022

susannasiebert commented Jul 5, 2022

dbrami commented Jul 13, 2022

susannasiebert commented Jul 13, 2022

lukaas33 commented Aug 29, 2024

susannasiebert commented Aug 29, 2024 • edited Loading

lukaas33 commented Aug 29, 2024 • edited Loading

susannasiebert commented Aug 29, 2024

lukaas33 commented Aug 29, 2024

susannasiebert commented Aug 29, 2024

lukaas33 commented Aug 30, 2024

susannasiebert commented Aug 29, 2024 •

edited

Loading

lukaas33 commented Aug 29, 2024 •

edited

Loading