Use hpc_hmmsearch #8

apcamargo · 2021-07-07T03:30:15Z

This might be outside of the scope of the project, but I thought it could be a nice addition. There's a modified version of hmmseach (https://github.com/Larofeticus/hpc_hmmsearch) that runs way faster in parallel than the regular command. Maybe pyhmmer.hmmsearch could use it instead, or having it available through a separate function.

More technical details: https://docs.nersc.gov/performance/case-studies/hmmer3/

The text was updated successfully, but these errors were encountered:

althonos · 2021-07-07T08:37:28Z

Hi @apcamargo , thanks for the heads-up!

The current version of pyhmmer.hmmsearch is already kind of its own command, it's not actually using hmmsearch under the hood but reimplementing the whole hmmsearch command using the objects exposed in pyhmmer.plan7. I could have a look at the version you provided, and if they have a more efficient way to run I could try to replicate that.

apcamargo · 2021-07-07T17:46:42Z

Thanks! I did some quick benchmarks here and I found hpc_hmmsearch to be faster than pyhmmer.hmmsearch when using several threads. Might be worth to take a look a it.

althonos · 2021-07-16T10:09:13Z

Hi @apcamargo ,

I was able to trace down a big performance penalty in the Pipeline search loop, which was causing parallel code to become less and less efficient with more threads (basically, when too many threads were invoked, they would spend more time waiting for the GIL than actually processing the HMMs/sequence pairs). By adding an extra restriction on the input reference, i was able to rewrite the search loop to only reacquire the GIL when the Pipeline is completely done comparing the HMM to all the reference sequences.

On a consumer PC, with a small number of threads, this didn't affect performance that much, but on larger PCs (e.g. our lab's workstation) it is making quite a difference with a higher number of threads.

Next step will be to try to rewrite the hmmsearch implementation using OpenMP instead of the Python threading, but this is going to be a bit more complicated. In the meantime, it would be great if you could test the new implementation to see if it makes any difference (installable distribution attached):
pyhmmer-0.4.4-openmp.tar.gz

apcamargo · 2021-07-16T15:47:19Z

Thanks! I'll give it a try later today.

Do you expect the OpenMP implementation to perform similarly to hpc_hmmsearch?

apcamargo · 2021-07-17T21:26:59Z

I did get some improvements with this branch, although I didn't have the chance to test it in a machine with lots of threads. Very excited about this progress! Thanks!

apcamargo · 2023-09-11T05:37:39Z

A bit more than two years later, I did some benchmarks. In a very common scenario (hmmsearch against Pfam-A), hpc_hmmsearch was about 5% faster than pyhmmer (see script below). This speedup was very consistent across several sizes of input FASTA files. I used version 0.10.2 for this benchmark.

#!/usr/bin/env python

import pyhmmer

with pyhmmer.easel.SequenceFile("test_proteins.faa", digital=True) as seq_file:
    seqs = seq_file.read_block()

with pyhmmer.plan7.HMMFile("Pfam-A.h3m") as hmms, open("hmmer_output.txt", "w") as fout:
    for hits in pyhmmer.hmmer.hmmsearch(hmms, seqs, bit_cutoffs="gathering"):
        for hit in hits:
            fout.write(f"{hits[0].name.decode()}\t{hits.query_accession.decode()}\t{hits.query_name.decode()}\t{hits[0].evalue:.2e}\t{hits[0].score:.2f}\n")

althonos added the enhancement New feature or request label Jul 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use hpc_hmmsearch #8

Use hpc_hmmsearch #8

apcamargo commented Jul 7, 2021

althonos commented Jul 7, 2021

apcamargo commented Jul 7, 2021

althonos commented Jul 16, 2021 •

edited

Loading

apcamargo commented Jul 16, 2021

apcamargo commented Jul 17, 2021

apcamargo commented Sep 11, 2023

Use hpc_hmmsearch #8

Use hpc_hmmsearch #8

Comments

apcamargo commented Jul 7, 2021

althonos commented Jul 7, 2021

apcamargo commented Jul 7, 2021

althonos commented Jul 16, 2021 • edited Loading

apcamargo commented Jul 16, 2021

apcamargo commented Jul 17, 2021

apcamargo commented Sep 11, 2023

althonos commented Jul 16, 2021 •

edited

Loading