[Question] What criteria are used to determine significant HMMER hits? #32

jolespin · 2023-07-19T19:59:01Z

I'm looking at the code here but not very familiar w/ Ruby:

module KofamScan
  class Result
    extend Autoload

    autoload :WithEvalueThreshold
    autoload :WithThresholdScale
    autoload :WithThresholdScaleAndEvalueThreshold

    def self.create(query_list, threshold_scale: nil, e_value_threshold: nil)
      if threshold_scale && e_value_threshold
        WithThresholdScaleAndEvalueThreshold.new(query_list, threshold_scale, e_value_threshold)
      elsif e_value_threshold
        WithEvalueThreshold.new(query_list, e_value_threshold)
      elsif threshold_scale
        WithThresholdScale.new(query_list, threshold_scale)
      else
        Result.new(query_list)
      end
    end

Here's what ko_list looks like:

knum	threshold	score_type	profile_type	F-measure	nseq	nseq_used	alen	mlen	eff_nseq	re/pos	definition
K00001	357.90	domain	all	0.256163	2367	1915	1975	464	14.05	0.590	alcohol dehydrogenase [EC:1.1.1.1]
K00002	443.17	full	all	0.430391	2376	2273	5878	503	7.51	0.590	alcohol dehydrogenase (NADP+) [EC:1.1.1.2]
K00003	286.37	domain	all	0.945500	6268	5369	3257	782	7.03	0.590	homoserine dehydrogenase [EC:1.1.1.3]
K00004	369.60	domain	trim	0.809403	1572	1320	1364	436	5.41	0.590	(R,R)-butanediol dehydrogenase / meso-butanediol dehydrogenase / diacetyl reductase [EC:1.1.1.4 1.1.1.- 1.1.1.303]
K00005	320.93	full	all	0.984022	1449	1051	682	366	1.99	0.590	glycerol dehydrogenase [EC:1.1.1.6]
K00006	316.47	full	all	0.899202	2118	1971	3274	549	4.64	0.590	glycerol-3-phosphate dehydrogenase (NAD+) [EC:1.1.1.8]
K00007	520.80	full	all	0.998236	358	283	834	469	1.01	0.589	D-arabinitol 4-dehydrogenase [EC:1.1.1.11]
K00008	420.17	full	all	0.500025	3737	3281	3597	524	8.29	0.590	L-iditol 2-dehydrogenase [EC:1.1.1.14]
K00009	147.27	full	trim	0.997067	1556	1192	1443	459	3.28	0.590	mannitol-1-phosphate 5-dehydrogenase [EC:1.1.1.17]

I'm seeing a score threshold but not e-value threshold. Can you elaborate on the scheme used for determining significant hits?

The text was updated successfully, but these errors were encountered:

PfisterMaxJLU · 2023-11-18T13:07:04Z

With the threshold values from the "ko_list" our group build a little helper script for the formatting and enforcement of KO cut-off thresholds in hmmsearch output files.

We discard the rare (≈1%) secondary (lower score) K assignments so keep an eye out for that. Depending on your use case, this might not be desirable.

Disregarding the dual assignments, the resulting output is functionally identical (>99.999% for our data/settings) to Kofamscan.

I uploaded a code example, which you can modify for your case. I hope this helps.

https://github.com/PfisterMaxJLU/kofamscan_cutoff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] What criteria are used to determine significant HMMER hits? #32

[Question] What criteria are used to determine significant HMMER hits? #32

jolespin commented Jul 19, 2023

PfisterMaxJLU commented Nov 18, 2023

[Question] What criteria are used to determine significant HMMER hits? #32

[Question] What criteria are used to determine significant HMMER hits? #32

Comments

jolespin commented Jul 19, 2023

PfisterMaxJLU commented Nov 18, 2023