Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hmmsearch was not run successfully #19

Open
hahafengxiang opened this issue Dec 2, 2020 · 15 comments
Open

hmmsearch was not run successfully #19

hahafengxiang opened this issue Dec 2, 2020 · 15 comments

Comments

@hahafengxiang
Copy link

Hi There,

I am running a protein annotation with kofamscan. An error message keep showing up that "hmmsearch was not run successfully". After successfully running with another .fasta file, I realized that something goes wrong with my original file. Although I finally found that ONE sequence caused that problem and ran successfully by removing it, I still don't know why.

Could anyone help explain it?

Thanks~

The sequence attached below:

VIRSorter_k141_1676536_flag=0_multi=16_9953_len=13692-cat_2_4 # 6084 # 13691 # -1 # ID=10186_4;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.465
NGEAGTSGTSGISGINGTNGLNGTGGSSGTSGLSGVDGTSGTAGTSGTSGYSGTDGTSGT
SGISGADGMPGTSGTSGISGVDGTSGTSGINGTSGSSGTTGTSGSSGTSGISGVDGTSGT
SGLSGVDGTSGSSGTSGSSGTSGISGVDGTSGTAGTSGSSGTSGTSGISGIDGTSGSSGT
NGTSGSSGTSGISGVDGTSGTAGTSGTSGIDGTSGTSGISGVDGTSGTSGTSGISGVDGV
DGTNGTSGTSGISGVDGTSGTAGSSGTSGTTGTSGSSGTSGISGVDGTSGSSGTSGTSGI
DGTSGTSGISGVDGTSGTSGTSGSSGTSGTSGISGVDGTSGTNGSSGTSGSSGTAGTSGT
SGISGVDGTSGTSGTGTSGTSGTSGTVGTSGSSGSSGTSGISGANGEAGTSGTSGISGLN
GTNGLNGTGGSSGTSGISGVDGTSGTAGTSGTSGYSGTDGTSGTSGISGADGMPGTSGTS
GISGVDGTSGTSGTTGTSGTSGTTGTSGSSGTSGISGVDGTSGSSGTSGTSGISGVDGTS
GTSGSSGTSGTSGTSGTSGTSGISGVDGTSGSSGTSGSSGTSGSSGTSGISGINGTNGSS
GTSGISGVDGTSGTSGIDGTSGTSGIDGTSGTSGISGINGTSGTNGSSGSSGTSGLSGVD
GTSGTSGIDGTSGTSGIDGTSGTSGISGINGTSGTNGSSGSSGTSGISGVDGTSGTSGSS
GSSGTSGISGVDGTSGTSGISGIDGTSGTAGTSGTSGVDGTSGTSGISGINGTNGSSGTS
GVSGVDGTSGTSGLDGTHGTSGTTGTSGSSGTSGISGANGEAGTSGTSGISGINGTNGIA
GTGGSSGTSGISGVDGTSGTAGTSGTSGYSGTDGTSGTSGISGADGMPGTSGSSGTSGLS
GVDGTSGTAGTSGSSGTSGTTGTSGSSGTSGISGVDGTSGTAGTSGTSGISGVDGTSGSS
GTSGSSGTSGSSGTSGTSGISGVDGTSGSSGTSGSSGTSGSSGTSGISGINGTNGSSGTS
GISGVDGTSGTSGIDGTSGTSGINGTSGTSGISGVDGTSGTNGSSGSSGTSGLSGVDGTS
GTSGIDGTSGTSGIDGTSGTSGISGINGTSGTNGSSGSSGTSGLSGVDGTSGTAGTSGSS
GTSGISGVDGTSGTSGISGVDGTSGTAGTSGTSGVNGTSGTSGISGINGTNGSSGTSGIS
GVDGTSGTSGLDGTHGTSGSSGTSGTSGSSGTSGISGANGEAGTSGTSGISGVAGTNGIA
GTGGSSGTSGLSGVDGTSGTAGTSGTSGYSGTDGTSGTSGISGADGMPGTSGTSGTNGSS
GTSGLSGVDGTSGTSGTNGTSGSSGTNGSSGTSGTSGTSGISGVDGTSGTAGSSGTSGSS
GTSGLSGVDGTSGSSGTSGSSGTSGSSGTSGTSGISGVDGTSGTSGSSGTSGSSGTSGIS
GVDGTSGTSGSSGTSGIDGTSGTTGTSGISGISGTSGTNGTSGSSGTSGISGVDGTSGSS
GTSGDAGTSGTSGITGTSGISGISGTSGTNGSSGSSGTSGLSGVDGTSGTSGSSGTSGTT
GTSGTSGISGVDGTSGTSGSAGTSGTSGVDGTSGVSGVSGINGTNGSSGTSGISGVDGTS
GTSGTVGTSGTSGTNGTSGSSGTSGISGANGEAGTSGTSGISGINGTAGRQGTGGSSGTS
GVSGVDGTSGTAGTSGTSGISGTTGTSGTSGISGADGMPGTSGTSGINGTSGSSGTSGSS
GTSGSSGTSGISGINGTNGTSGISGVDGTSGSSGTSGTSGSSGTSGSSGTSGISGINGTN
GSSGTSGISGVDGTSGTSGSSGTSGSSGTSGSSGTSGSSGTSGISGVDGTSGSSGTSGIS
GVDGTSGTSGTSGSSGTSGSSGTSGSSGTSGTSGISGVDGTNGTSGTSGTSGSSGTSGSS
GTSGSSGSSGTSGISGVDGTSGSSGTSGSSGTSGISGVDGTSGTSGTSGSSGTSGSSGTS
GSSGTSGISGVNGTSGSSGTSGISGVDGTSGTAGTSGSSGTSGSAGTSGSSGTSGISGIN
GTSGTNGSSGSSGTSGVDGTSGTSGSNGTSGSSGTSGISGANGAPGTSGTSGLSGVDGTS
GTAGTSGSSGTSGSSGTSGISGVDGTSGTAGSSGTSGSSGTSGSSGTSGSSGTSGISGIN
GTSGSSGTSGSSGTSGTSGTSGSSGTSGTSGISGVDGTSGSSGTNGTSGTSGTKGTSGTS
GSSGTSGSSGSSGTSGISGINGTSGSSGTSGISGVDGTSGTAGSSGTSGTSGTSGIDGTN
GTSGSSGTSGISGINGTNGSSGTSGISGVDGIDGTSGSSGTNGTSGSSGTSGISGANGAP
GTSGTSGLSGVSGISGTNGTSGTSGTSGTTGTSGISGLNGTTGTSGTSGTGFSAILNATN
NRLITSDGTQTNAVAEANLTFDGEILNLAGVFKSKTGEGSSITANTLLYAADTALGNGWI
IDYVVKATTGVAMRTGTILAVTDGIDVTFTETSSPDLGASTAAVTFGLTINSTDLEIAAN
ISFGTWDVKVAVRVI*

@Caelyn-gao
Copy link

I also met this issue, and I didn't find the problem sequence like you did. Do you know how to fix it? Thanks a lot.

@Caelyn-gao
Copy link

How do you find the problem sequence?

@hahafengxiang
Copy link
Author

How do you find the problem sequence?

Actually it's not an efficient way. I separate the original file into small parts with Seqkit software, and ran kofamscan one by one. Nextly, I opened the problemed file and look through it. Maybe I am lucky enough that, I found that sequence above immediately I saw it, as it looks so unusual and different.

@Caelyn-gao
Copy link

How do you find the problem sequence?

Actually it's not an efficient way. I separate the original file into small parts with Seqkit software, and ran kofamscan one by one. Nextly, I opened the problemed file and look through it. Maybe I am lucky enough that, I found that sequence above immediately I saw it, as it looks so unusual and different.

Can you show one of your normal sequence? Thanks a lot.

@hahafengxiang
Copy link
Author

How do you find the problem sequence?

Actually it's not an efficient way. I separate the original file into small parts with Seqkit software, and ran kofamscan one by one. Nextly, I opened the problemed file and look through it. Maybe I am lucky enough that, I found that sequence above immediately I saw it, as it looks so unusual and different.

Can you show one of your normal sequence? Thanks a lot.

some are like this :

VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_5 # 1020 # 2204 # -1 # ID=1_5;partial=00;start_type=ATG;rbs_motif=TATAA;rbs_spacer=11bp;gc_cont=0.301
MLQELLDLKEQGLANDKAFTFEKAKGETRFVQKGHENESLAKKTNLLSRDFYRGGGICPK
GKSLDEIAEYLKIPHVFRSEGIGYDAHFNTHKRAIYLEDGILLDIVSEKWVLIQPIETIG
LFLDFCTENNLEIERIGTFISNKDKTLEGGNTDILQRYKIYITAKLDDSFEVSKGDRVSG
KLLFTFGYLNGLGFNASLLTLREICSNGLRIPVKIGGQVVSHIGELVKKKTQILKLLQDS
KQVWKKEKEDYLLFQNTEMTYLEAMMFLINNFSKIPLHKELAQKALVDWKEGKGIETLDS
IFQSNQWFDEKEIVREVINMYRENQFTGSEFCSNTVWGLLNSVTEYINWKGKQIKNPLAS
LIDVNGHRGKLMYKVRTKLHDDFVKIKQNVTVSI*
VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_6 # 2467 # 3360 # -1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=TAA;rbs_spacer=4bp;gc_cont=0.346
MLIQFDFNHSFIRMEKRADGNIWVCITDMAKSSGKLVADWKRLKTTQDFLTAFESSMGFP
ITETIQGGQPEKQGTWAIQEVAIEFAGWCSLDFKMWMLRQIKKLMNEGQVSLKENDTLDS
QKVLLNAMDLMAQMSTTLENREKLLQQTIRSLSILEEERLDREYYLGQINEITKEHPLFS
SLLQFALTLKNEQYTFPSIGYTVCQILQMFPIRHCSEKRFANLCSDLYWLNKNKKPNEVG
VYKYVGDELIYPTVILFKQENYSWDEIKEKIEIDYKFRLPASNRRAFTEIMAKKKRK*
VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_7 # 3399 # 3734 # -1 # ID=1_7;partial=00;start_type=ATG;rbs_motif=TAA;rbs_spacer=5bp;gc_cont=0.214
MIDFKILNNKNLEITLNVPSFIFKRFLNQHKDLTFNQIWKKLIENTELVFIEPFEVGALT
DAPIIRFNHRYYWFSDYMVRDELKELEKNNKVIFELAYDENCRIKEFSTID*
VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_8 # 3727 # 3969 # -1 # ID=1_8;partial=00;start_type=ATG;rbs_motif=AAAAAA;rbs_spacer=6bp;gc_cont=0.177
MMNKNNLQKIRHQIFILYCNLISFIDDWDFLQLDLILQKIDKEYTVDIYLSPEDIKSQTY
KIISINNQINIDTILDNLYD*

@Caelyn-gao
Copy link

How do you find the problem sequence?

Actually it's not an efficient way. I separate the original file into small parts with Seqkit software, and ran kofamscan one by one. Nextly, I opened the problemed file and look through it. Maybe I am lucky enough that, I found that sequence above immediately I saw it, as it looks so unusual and different.

Can you show one of your normal sequence? Thanks a lot.

some are like this :

VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_5 # 1020 # 2204 # -1 # ID=1_5;partial=00;start_type=ATG;rbs_motif=TATAA;rbs_spacer=11bp;gc_cont=0.301
MLQELLDLKEQGLANDKAFTFEKAKGETRFVQKGHENESLAKKTNLLSRDFYRGGGICPK
GKSLDEIAEYLKIPHVFRSEGIGYDAHFNTHKRAIYLEDGILLDIVSEKWVLIQPIETIG
LFLDFCTENNLEIERIGTFISNKDKTLEGGNTDILQRYKIYITAKLDDSFEVSKGDRVSG
KLLFTFGYLNGLGFNASLLTLREICSNGLRIPVKIGGQVVSHIGELVKKKTQILKLLQDS
KQVWKKEKEDYLLFQNTEMTYLEAMMFLINNFSKIPLHKELAQKALVDWKEGKGIETLDS
IFQSNQWFDEKEIVREVINMYRENQFTGSEFCSNTVWGLLNSVTEYINWKGKQIKNPLAS
LIDVNGHRGKLMYKVRTKLHDDFVKIKQNVTVSI*
VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_6 # 2467 # 3360 # -1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=TAA;rbs_spacer=4bp;gc_cont=0.346
MLIQFDFNHSFIRMEKRADGNIWVCITDMAKSSGKLVADWKRLKTTQDFLTAFESSMGFP
ITETIQGGQPEKQGTWAIQEVAIEFAGWCSLDFKMWMLRQIKKLMNEGQVSLKENDTLDS
QKVLLNAMDLMAQMSTTLENREKLLQQTIRSLSILEEERLDREYYLGQINEITKEHPLFS
SLLQFALTLKNEQYTFPSIGYTVCQILQMFPIRHCSEKRFANLCSDLYWLNKNKKPNEVG
VYKYVGDELIYPTVILFKQENYSWDEIKEKIEIDYKFRLPASNRRAFTEIMAKKKRK*
VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_7 # 3399 # 3734 # -1 # ID=1_7;partial=00;start_type=ATG;rbs_motif=TAA;rbs_spacer=5bp;gc_cont=0.214
MIDFKILNNKNLEITLNVPSFIFKRFLNQHKDLTFNQIWKKLIENTELVFIEPFEVGALT
DAPIIRFNHRYYWFSDYMVRDELKELEKNNKVIFELAYDENCRIKEFSTID*
VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_8 # 3727 # 3969 # -1 # ID=1_8;partial=00;start_type=ATG;rbs_motif=AAAAAA;rbs_spacer=6bp;gc_cont=0.177
MMNKNNLQKIRHQIFILYCNLISFIDDWDFLQLDLILQKIDKEYTVDIYLSPEDIKSQTY
KIISINNQINIDTILDNLYD*

Thanks a lot. I will try to find the problem sequences to see if can work normally.

@Caelyn-gao
Copy link

Hi, I found the problem sequences using the method of you. It seems the problem sequence is too long that the kofamscan cannot handle on it. But I don't know why. The problem sequence below is Protein TolB according COs. So it is not a sequence that cannot be annotated.

MMKRSWIAAWLCLVLLLLQIIFAWPLFASEQKIVYPRQEASGKYDLWMMNPDGSGQQRLT
DDAKNGTDSTSPSIFPDGKRILYQRGYNIAILNVDTRQITDLTTDGAYGIYAYSEPWLSP
DGMKITYMYGQPIAGSCSSCRTYDVWIMNADGSNRVQMTSNTYRDATPIYSPDGTKLLVT
HYQGAPSSDCCNATDVYTMDIATKVETKLYGSSYYDWGFAWNNSGILFTTQNGALVRINP
DGTGFTTVLDASNRVGSATYSVTGDKILYQSNVSGTNNLYVVNPDGTNSVAITTGMNVSG
TGVWGYINSSAIPKIYYVSNQSGTYDIWKMDPDGNNKVRVTTLPGSEAYPRVSPDGRKIA
FNSDATGSYDVYVMNVDGTDIRQLTSGKNTNGQLTWDPASTKIYYAAPAASIYDAAVRSV
NVDGTGDSQVFDHVGYHDVEVDVSPDGNSLVYIYEQCCWTPNRSIRLRNLSSGTDIELLA
ADGYSDFYPRFSPDGNSIIWTRNRNTNPYAFGYDIWRMNKDGSSKTNLTGAYSTLAFFNA
NYSRDGNKIVMSAQTNGGDSNVYTMNSDGSGLLQLTTGSSADDSPDFATIIPPPNKIVYH
SNQSGNDDIWVMDEQGGNKVQLTNATANEFQPKWSKDGLKIVYTSDASGNYDVWVMNSDG
SGKTQLTTNTALDYQPIWTPDGAKILFFSSRDGGRDVYIMDASGANQTRLTAVNGWTGRD
GIGISPEGIKIAYTTQPSGANWGYYNELYSATLTCSGNASTCSLSNTLKIASFDNQIDVT
PSFTPDGKILWSSGRHNPYGTCSTNCYFSTQEIYRINPDGSGEVQITSNTITSDVQPVSS
PDGSKIIFSSDRASGSANVSNTSDLWVVNNDGTGLTQLTNTSAYSEGGADWWASGGLVIY
TLTVTKSGTGAGTVNSNPAGISCGVTCSESYSAGTSVVLTATPDSGSTFIGWLGDCSGTG
TCTVPMGAAKNVTAAFGDTTPPDTTITAKPTNPTNSTSAAFSFTSTEAGATFQCQLDAGG
YSSCTSPKGYSGLSAGSHTFYVKATDAAGNTDATPASSAWTIDLTPPVTILPIEGRPDPI
TNSTSATFTFSSESGVMFQCQFDGGIWTTCTSPASFAGLAVGNHTLLIKATDTAGNVETP
VSYSWTIDTTAPNAPSVAGTTPTNTRTPSWGWASGGSGGNGTYRYKLDSSDLTTGATETT
ATSYTPTGNLTEGPHILYVQERDVAGNWSASGSLVIVVDITAPTDVISYSLTSNGAWTSP
VIVDEANDVGLYTSIAVDSNNKPHIAYFDLTNRDLKYTTNTSGSWAFQMIDGDQGENPSI
AIDKSNKVHIAYHDNTNLALKYLTNGTGDWVKETIDAANCEWISLALDSNNKIHISAENN
LGALPRPLRYVTNASGSWVPATIDNTVERVGFWTSLAIDRQDKIHISYYDYTNANLKYIT
NAGGSWVRTTIDETGTVGLYTSLALDSNGKAHISYYDQTNGNLKYANNVSGSWTIETADN
SSDNVGEYTAIGIDTIGKAHISYYDRTNGNLMYATNVSGSWVRKKLDGDNADAGLWTSLK
VDSQNGLHISYYDQTNGNLKYIKSTGIIQLPKTGQTTCYDTSGAVVSCSGTGQDGEIQAG
TVWPNPRFTDNGDQTVKDNLTGIIWTKDGNAPGPAACGAGVAKTWQEALDYVKCLNINNY
LGHNDWLMPNINEIESLVNYNEPNSAAWLNSQGFFNAQPDFYWSSTSSALFPNAAWNIHL
WNGMHYEDKTIVRRYVWPLRKGGSGNFVNLPQTGQTSCYHASGAVITCTGTGQDGEIQAG
SAWPNPRFTVNGDQTVKDNLTGMIWTKDGNAPGPAACGAGIAKTWQEALDYVKCLNINNY
LSRNDWHMPNMNELESLVNYNEANLVSWLNSEGFYNVQPAAYWSSDSAANDTSIAWIVKM
GDSSGSITYKTSPCYVWPVRSGQSAIFGSGISINSMAISTNSVSVNLSISAIDANSVSKM
ILSNDGTFDAEPEEDYVTSKTWTMSTGDGEKTVYVKFRDAAGNWSQVYRDSIILDTVIPV
TTIATKPASITNSTSATFSFTSDAGVTFQCQLDGGTWSTCTSPASYTVLPPGDHTLLIKA
TDAAGNTETPVSYSWTIDTTAPNAPVVSGTTPTNDQTPTWTWVSGGSGGNGTYRYKLDSS
DLTTGATETTATAFTPSTNLTEGSHTLYVQERDAAGNWSNSGSFAIVIDTTAPTAGTGGV
IQLPKTGQTKCYNSSGTEITCLGTGQDGEIQAGIAWPNPRFTSNADTSLTDNLTGLIWAK
DGSTPTFNSCTGGTVTWTAALAYVTCLNTNNYLEHNDWRLPNRIELESLGNVGWANHSTW
LNGQGFTNVQSDYYWSSSTGAYNTDGAWIIEIGGVYYVDRADDKSLNHYVWPVRNGSSGA
VQLPKTGQITSYAAGDDGDLQKGTAWPSTRFTVSGDCVIDRLTGLMWTKDANLAGGKSTW
QELLTYANNLNICGYTDWRLPNRKELSSLLDFSKSVPAVPDNHPFSNVKIEDSNNEAYWS
SSTFAYLPHGAFVVTMGNGAIGNSWKSYGGVYAQYGWPVRGGLDSGTATGTGVSINAGAA
WATGTSVTLALSAKDSNGVTHMMVSNDAAFTGATEQAYTTTKTWLLSSGDGDKTVYVKFR
DAAGNWSQIYSDSIVLDTVMPVTTIATKPAGITNGTSATFTFTSEVGATFQCQLDGGVWA
ACTSPFSYTGLLPGDHTLLIKATDAAGNVETPVSHSWTIDIAAPNAPLVTGTTPTNDQTP
TWTWASGGNGGIGTYRYKLDNSDLTIGATETTLLSYTPTGNLTEGSHTLYVQERDAAGNW
SSSGSSSIVADFTPPVGAITSVTGTFTATGPMTVNRYSHTATLLPNGKVLIAGGYPANDS
PRFNTAELYDPATGTFSATGNMISKRAQHTATLLANGKVLLAGGSVYDTSWSALNTAEIY
DPATGEFTPTGMMRDIRYCHTATRLHDGTVLIAGGWTASAALNRAEIYDPVSGTFSETGN
MGYARYVFTATLLQNGKVLIVGGTNGMTGNTGYKAEIYDPVARSFSATGDLNDSRCMHSA
VILPNGKVLVADGWYSDLQNKRLEIYDPNTGIFTASATTTSGLLASLLSNGQVLLAGGTG
TAFIFDYQTGICTPLNASMVTSRSYYNYSETLLPNGLVLLAGGSDHSPNALNSAELYLPV
SQAISINSGATATNATSSSLALWATDATGVNQMILSNDAAFSGAIAETYATAITWNLNSG
DGTKTVYVKFKDAAGNWSQAYSDTIILDSTAPTVTPSVAGGTYTETQTVTLTCNDGSGTG
CGSIYYTISGNEPTTGSTVYSSPIIISATTTLKYFAVDSAGNSSPIQTATYTIQGFNILN
TGTLSNGLVAWYPFNSNANDASGNGNNGTVNGATLTTDRFGNPNGAYSFNGSDNGVDVND
SGTLQLYDTAAFAAWFQLKQWPTSIPNNQASLICKGATADLYAEYCFMITSEKKLSLYAS
NNTYEMANVSYDISGISLNEWHHYSATFERGTIKIFVDGILKQTGTIPITALRASTNDLY
LGKWRDGWHYANGSLDDVRIYNRALSASEIQALYGSAGTINAVQGDTRLKTLNLTSFGGY
SQSADLTYAWVGTAPTGATVNITPSNIMPTPSGASASVSFTAGVNTPAGSYTLRITATSG
TITKTADILVNVSAPLAIPTLTLDPATKGTSYTPSVLASGGVGAYTFSVASGTLPTGLTL
NNNGTFSGSPTTRGTYTFTIQAMDNDGHASSREYTVRVYDPAYRKLVLESTSWSVYKNEA
TGWIYTRVLDDYDVSVTMTTPTAIDITSSSSTGKFSTDGLTWYSIFSPDISSGSSSKRFL
YKDSTNGSFTITAAGVPGSSNEQWAAGSHVLTITEPPVVDTTPPDTAITGNPQPVTNSTS
ATFTFSSSEEPATFECNLDGAGWVNCTSPENYTGLAVAGHTFQVRAKDAAVPVNNVDPSP
ASYAWKIDQTGPTGAGFGVLPRTGQTTCYDTAGAVMPCAGTGQDGEIQAGVAWPNPRFTN
TDGTSPVSDALVLDKLTGLEWPKDAGTPTAGSCTGGAKTWQGALDYVACLNTNNYLGHND
WRLPNVNELESLVSANNSGPSIPTGHPFTNVQAARYWSSSNNVDDYYGTSWAFFVAMTNG
VLDLYPKSSGYYVWPVRNGALGGTVVWRTGQTACYDSSGAAITCAGTGQDGDKLAGSAWP
SPRFTDNGNSTVTDVLMGLTWTKEANAPGPQSCSPGGAKTWQAVLDYVKCLNNNSYLGYN
DWRLPNRIELRSLSDYSIHAPSLPPGHPFTNVQVSSVYWSSTTYIDDASRAWYVYMDIGL
LSHAFKSNSYCVWPVRGGQSAGAGIGIIINNGAPATNSTSVTLGFSATDANGVSAMMVST
DANFTGAFEEPYATSKAGTLSPGEGEKTFYAKFKDTAGNWSGIYSDTITLDVTAPTLVVG
SPSASVTSNGDVTYTVTYGGADFITLSPGNITLNNTGSANGTVSVTGTGSTTRTITVSAI
TGNGTLGISLAAGTGRDAAGNQTQAAGPSGTFTVDNSAPTATIDTKPPSLTNAASESFTF
SSEQGATFQCKLDSGNYAACSGTASYSSLAEGSHIFWLKTTDTAGNITETSYSWTVDTIG
PVAGVAFNSNLVAYYPFNGNANDESGNGNNGTVNGATLTTDRFGHTNSAYSFNGSSSSIA
IDSVAPSISGQSAGAIALWFKASSSIPQGYGVPLIVFYKDMEVGPDQQGFFVVGRFSSGL
PYNQSIGYYSPRPNFEGAYINGVNAYKDDQWHHAVITIDNSGNRLYVDGQSVPLTYTSFA
GYNQNAMWSTVTSVVIGKRAIDSWVFNGAMDDVRIYNRALSATEIQAIYNGQSGPGISIN
YGAAATNTTSVTLSFSAFDANGVSEMMLSNDGSFNGAVAEAYLATKTWTLTSGDGEKTAY
VKFKDNAGNWSQVYSDSIILDTLVPVTTISDKPASVTNGTAATFTFASDAGATFQCQLDG
GAWLACTSPANYSELLPGDHTLLIKATDTAGNVEIPVSYTWTIDIAAPNAPSVAGTTPTN
NQKPTWTWASGGNGGIGTYRYKFDSSDLTTGATETTDTSYTPSSNLPEGSHTLYFQERDA
AGNWSVSGSFAITVDTTGPVAVGESVVFVSDRSGNPEIWKTSLAHPSNMVKISNFAGSMI
PSQLSWSPDGQWIAFWAFPPGESNNDIYVIKSDGSQPADRLYGRRYDAGDLARFGADNDW
VYFRDGYAALNGMIYRVNRISKVIETVQGDTSKTVQSFDISEDGRYRLETRENGCCWSPN
QYAVLYDLVDGTSTTIMPQDGNSESNPNFSHDGSKIIFTNATGYQTPQNLWVVNRDGSGK
RPVTTETGNIFYHSASWLSDNQNVLVTYNNGTRDGLYIVDTNSGQKQPFLADANYNYANS
DYRVSLGADNGISINTGASATNSASVALSLSAFDANGVSGMMVSNDDTFTGATEEGYATG
KAWSLPAGDGAKTVCVKFKDNAGNWSPVYCDSIVLDTVTPVTVISGKPAIITNGTAAIFT
FAAEAGVTFQCQLDGGAWLACTSPASYTGLPAGDHTLLIKATDVAGNVEIPVSYTWTIDT
VAPNAPSVAGTTLTNDQTPTWTWASGGNSGNGTFRYKLDNSDLTTGATETTAVSYTPLSN
LPEISHTLYVQERDAAGNWSASGSFAITVDTTVPVAESAVTPAGMVLVPAGSFQMGDSFS
EGESDERPVHTVTVSSYYLEKYEVTKALWDEVKTWATANGYEFDNAGTGTATTYPVQGVS
WYDVIKWLNARSEKEGRMPVYYTGAGQTTIYRTGQVNVVSGAVQWSANGYRLPTEAEWEY
AARAGTTTRFYTGDCISADTQANYNGNSSWSGCTGGQSRGETTAVGSFTANPWGLHDMAG
NVWEWTWDWIGSYSSTAVTNPRGPDSGSYRVFRGGSLSDGAYYLRLAYRTGSFPAGRGIN
LGFRSAHAVNTGIAKGVTINSDAFATNSTSVTLSILAFDANGVTHMKVSNDASFTGTNEE
TYITSKAWPLTSGDGIKTVYVMFKDAAGNWSQAYSDSIVLDTTGPAVVANPQGGTYLTAQ
SVTLMCSDASGSGCEKIYFTTDGSEPTIGSAVYSGSISITSTTTIKFFAVDQLGTAGPVQ
TATYTIQGVTMTAGSIKVVKGETRQTTVTLTSIGGFNAAMNLSHAWQGSALDDAVVNITP
ATVTPTPSGASANVSVTVDSATAAGNYTLQIIAEGGQYATFVNVPVTVANPLAFTTNSPI
NGVKGQPLAPITATGGIGSLICSLVITEPPGISPPGVTFNADGTFSGLPTARGTYVFTVR
CSDTDGHSVDRQYTIRVYDPAYRHLVLESASWSLQEDGGVSDWIKAKVLDDYDVSAAVTL
NSTIYITSSSATGQFSLSGSFSGSEVNALLVDIPAGSSLKSFKYRDSTAGSFAIAVIGWE
GTPSAEWGPASHQITVVEVLPTQYNLSIAKTGTGSGNVTVSTGSVSWEGNNGMATYNAGQ
QVTLTAAADPTSSTFTGWSGGGCSGPAPTCTVTMDAAKNVAAAFALKTYNITATAGANGS
ITPAGSLFYNHGSSQAFTITPDTGYHIADVLVDGYSIGAVGSRTFDPITAGHTISATFAV
NVYALTVTKTGTGTGAVTASPGTIVWTGNTGTASYNYNTPVQLTASANTGFTFTGWSGGG
CSGPVPTCTVTMDAAKNVTANFADITPPTGLVAINSGAAHTTQPAVTLSILATDASGVSK
MVVSNDANFAGGASDENYGTSKAWTLTDGDGTKTVYVKFKDAVGNWSQPYSDTIVLDTVA
PTVAAQPATGTYMSPQSVTLICDDGSGSGCANIYFTTDKTEPTINSTRYMDPIPVSATTT
LKFFARDSAGNSGTVRLETYTFPDVTMTGGNIKVVQGETRQSTVTLKALDGFNSSMTLSH
QYQGTEPANASVSITPDTVTPTTSGAAAVVAFTAGTTTATTAVDRPYIIRVTATGGEITR
TADIQVMVAAPLAIGTPTLPDGVKGQPYTAAVVATGGIAPYTFTKMSGQLPGGLTLSTGG
AISGTPTARGTFTFAVQAADSDEPRHSVTQEYTVRIYDLAYRTLVMEAGSWMVEKSSATQ
INASDLIMVMIKDDYGNYVNADANTMIRITSSSPTGRFSSDGSSFPSSSLAPTIYQGNAT
TLFFYKDTTAGTFTLTASGIAGQPSASWQAGSHEITVWFDAHETELTASATQSLVYGQGM
TVTGLLKDARDNVPLPSKTVTLAFTSPSGSILNRTAQTASDGRFTYSADAAMIDAAGPWT
GRASFSEPASYKDSAATDSFDVAKANTRLEITTSASSVPPGGQVTVSGQLSAFTSFAVSL
SGIDIVVEFIGPDGSTVHATTVKTSDASGHFTYPYTTPSVPLGLWSIRARFVGTGNLNSS
DSDAKSLNVTNSPGYAILVQGDLGGTYRDSYSASLDDIYKKLRNRSFPTGNIWYLSHPAA
THDTGIAPNAHTSKENVRKAITEWALARITEGGIAPLYIVLMDHGSSGLFHIDPETITPE
ELNGWISTIETGIKTNLQKDLTTVVINGSCYSGSFIPALSKEGRVIITSGAEDEETLQGP
DTETPNKVFGEYFVYYLFSYLAQGENLRDAFKDAAQETHRERKCEGKDCKTNSALGNQGN
TRQHPLLDDNGDKKGSWMGVVGQEDGGVTSHLVLGLGANPATIKITEVMPTTKVMMGTSS
VLAFAKTSNYAQTMASWVKVRKPFFAEPNSSGTGQVVMNLTRIEGTPNPTTGRWEYNITP
LNEAGAYTLYYYAMDSNGNILPPVAGTLYVDTPDNNPPAAFNLTSPADSAELNDAMMIFK
WAKSVDPDNDLVTYTLKIYDEETVSEIKRYELISQEFFFLNASQEKKPDGVTPLFTTGKH
YLWKVEAVDGKGKSVEVQARRFHVIFTNALTGIITGVVLSDRDYSQIASASITATIGGTV
VNIPVTDGAFVLNVNPGSLDLSSTSSGYQSASLSDVNIVAGQATMVNILLSPNGLQGDID
GSGVVDIADAILALRITAGIGPTAGVTIHRENGVNPGGAIGIHDTLYILQYLAGLRP

@hahafengxiang
Copy link
Author

Hi, I found the problem sequences using the method of you. It seems the problem sequence is too long that the kofamscan cannot handle on it. But I don't know why. The problem sequence below is Protein TolB according COs. So it is not a sequence that cannot be annotated.

MMKRSWIAAWLCLVLLLLQIIFAWPLFASEQKIVYPRQEASGKYDLWMMNPDGSGQQRLT

Glad to know.

Hope an official solution will come up~

@hegartybr
Copy link

hegartybr commented Dec 18, 2020

I'm also having this same error (on 3 of ~70 files). It also doesn't seem to be a size issue for me because my largest gene calls run just fine on their own (and aren't as large as Caelyn996's), but going through each, I realized that it may be similar to hahafengxiang's problem of a weird motif (see below) though not sure why this would cause it to fail... Any other input would be greatly appreciated. Thanks! (also, happy to provide any extra information/files on my runs that would be useful)

weird_gene_call_that_failed
MAIAFEAASSTMVPSTATNPSVTITITDATDMVVAYGGSINGGASAMTFNSTGVMTQVVLENSSASTSGNLVGIGVYYILNNDLPAAGTYSVAATFAVADSQAMVGAVALSGADQSTGEHVISSTSLHNTSSTSITVSAQAATANSWWIRSVYAATTSSATSPGILIQSTDVARVSMVDTGLLSESMMTANPVNSTIASGPGFFASGSSSLNLSMVVFGVPAASASPSASPSASPSQSPSASLSPSASPSDSPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSESPSASLSPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSESPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSESPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSESPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSESPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSASPSASLSPSASPSASPSASPSASPSLSPSASPSASPSASPSESPSASLSPSASPSASPSASPSASPSASLSPSASPSASPSQSPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSASPSASLSPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSQSPSASLSPSASPSTSPSASPSASPSQSPSTSLSPSVSPSASPSASPSTSPSQSPSASLSPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSVSPSASPSQSPSASLSPSASPSASLSPSASPSASPSVSPSASPSASPSASPSASPSQSPSASLSPSASPSISPSASPSASPSQSPSASLSPSASPSASPSVSPSASPSQSPSASLSPSASPSASPSASVSPSASLSASPSESPSASVSPSASPSVVTSASPSESPSASLSPSVSPSVGGSSASPSESPSVSLSPSASPSASVSPSESPSPSAVTSGPSPSVSPSVAVFISASPSEAQVTFVGPPGQETLVDEEGGEPVIFKNQMYFTTTLTEDNSPILIGRRTIPTWNTTGRPKKAKVGTLGFNLKTNNLEYWDGSRWLILRMKKI*

@AstrobioMike
Copy link

AstrobioMike commented Dec 22, 2020

Hey folks, I hit this too.

(First, thank you to the developers and maintainers of kofamscan :))

In wondering why this came up only recently with KoFamScan (with no recent changes), I wondered if it was related to a change in HMMER3 possibly (which did update in September as far as what conda would install, and more recently HMMER as a whole if installing from github.

In using the example problem sequences you all kindly provided above (keeping me from needing to hunt mine down :)), I was able to track it down to specifically the HMM for K14297. This fails with a seg fault on HMMER version 3.3.1, which is the latest conda install of HMMER, but it doesn't fail when run on HMMER version 3.3.0. Here is a reproducible example using conda environments if interested:

### getting problem sequence example ###
curl -L -o trouble-seqs.faa https://ndownloader.figshare.com/files/25861299

### getting problematic hmm profile ###
curl -L -o K14297.hmm https://ndownloader.figshare.com/files/25861284

### trying with hmmer 3.3.1 ###
conda create -y -n hmmer-3.3.1 -c conda-forge -c bioconda -c defaults hmmer=3.3.1
conda activate hmmer-3.3.1

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # Segmentation fault (core dumped)

conda deactivate

### trying with hmmer 3.3.0 ###
conda create -y -n hmmer-3.3.0 -c conda-forge -c bioconda -c defaults hmmer=3.3.0
conda activate 

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # no problem

After putting this example together to provide as an issue to HMMER's github, I realized it seems the great folks at HMMER already caught this, as it is noted as a bug fix in v3.3.2's release notes.

It's just not updated on conda yet, so if wanting to use conda, installing hmmer=3.3.0 as shown above in the kofamscan environment should get around this problem, or of course installing the latest from their github 👍

@halexand
Copy link

I was just coming to see if anyone else had run into this problem! I just updated the conda build of my env and it seems to have cleared up the issue for me!

name: kofamscan
channels:
    - bioconda
dependencies:    
    - kofamscan
    - hmmer=3.3.0

Thanks for working it out, all!

@hegartybr
Copy link

Thank you for sharing your solutions, halexand and AstrobioMike! Unfortunately, when I try to build a conda environment specifying the older version of hmmer (like halexand showed), I get an error from kofamscan because it is requiring version 3.1 or greater:

Package hmmer conflicts for:
hmmer=3.3.0
kofamscan -> hmmer[version='>=3.1']

This doesn't make sense to me, since version 3.3.0 should satisfy kofamscan's requirement...

I'm trying to build the environment using this command: conda env create --file kofamscan.yaml, which works fine typically for me.

Any help with this would be greatly appreciated. Thanks!

@AstrobioMike
Copy link

That's strange, @hegartybr. Not only does it work for me, but 3.3.0 is >= 3.1, ha (though I have seen some odd things in how conda compares versions sometimes if not all digits are placed there - as is the case there).

First thing i'd do is see if you have the latest conda (conda --version, currently 4.9.2, you can update with conda update conda if you don't, but probably best to do this in the base environment and not in one that is set up for a specific project or anything).

Then maybe try with this in a file that more explicitly sets the channels and versions (can't attach a yaml it seems):

name: kofamscan
channels:
    - conda-forge
    - bioconda
    - defaults
dependencies:
    - kofamscan=1.3.0
    - hmmer=3.3.0

@hegartybr
Copy link

setting all the channels explicitly seems to have done the trick (it loads fine if I don't specify the exact version of kofamscan and my conda version is the latest). At the very least, it is running now, so hopefully I'm good. thanks, @AstrobioMike, for the tip!

@Caelyn-gao
Copy link

Hey folks, I hit this too.

(First, thank you to the developers and maintainers of kofamscan :))

In wondering why this came up only recently with KoFamScan (with no recent changes), I wondered if it was related to a change in HMMER3 possibly (which did update in September as far as what conda would install, and more recently HMMER as a whole if installing from github.

In using the example problem sequences you all kindly provided above (keeping me from needing to hunt mine down :)), I was able to track it down to specifically the HMM for K14297. This fails with a seg fault on HMMER version 3.3.1, which is the latest conda install of HMMER, but it doesn't fail when run on HMMER version 3.3.0. Here is a reproducible example using conda environments if interested:

### getting problem sequence example ###
curl -L -o trouble-seqs.faa https://ndownloader.figshare.com/files/25861299

### getting problematic hmm profile ###
curl -L -o K14297.hmm https://ndownloader.figshare.com/files/25861284

### trying with hmmer 3.3.1 ###
conda create -y -n hmmer-3.3.1 -c conda-forge -c bioconda -c defaults hmmer=3.3.1
conda activate hmmer-3.3.1

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # Segmentation fault (core dumped)

conda deactivate

### trying with hmmer 3.3.0 ###
conda create -y -n hmmer-3.3.0 -c conda-forge -c bioconda -c defaults hmmer=3.3.0
conda activate 

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # no problem

After putting this example together to provide as an issue to HMMER's github, I realized it seems the great folks at HMMER already caught this, as it is noted as a bug fix in v3.3.2's release notes.

It's just not updated on conda yet, so if wanting to use conda, installing hmmer=3.3.0 as shown above in the kofamscan environment should get around this problem, or of course installing the latest from their github 👍

Hey folks, I hit this too.

(First, thank you to the developers and maintainers of kofamscan :))

In wondering why this came up only recently with KoFamScan (with no recent changes), I wondered if it was related to a change in HMMER3 possibly (which did update in September as far as what conda would install, and more recently HMMER as a whole if installing from github.

In using the example problem sequences you all kindly provided above (keeping me from needing to hunt mine down :)), I was able to track it down to specifically the HMM for K14297. This fails with a seg fault on HMMER version 3.3.1, which is the latest conda install of HMMER, but it doesn't fail when run on HMMER version 3.3.0. Here is a reproducible example using conda environments if interested:

### getting problem sequence example ###
curl -L -o trouble-seqs.faa https://ndownloader.figshare.com/files/25861299

### getting problematic hmm profile ###
curl -L -o K14297.hmm https://ndownloader.figshare.com/files/25861284

### trying with hmmer 3.3.1 ###
conda create -y -n hmmer-3.3.1 -c conda-forge -c bioconda -c defaults hmmer=3.3.1
conda activate hmmer-3.3.1

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # Segmentation fault (core dumped)

conda deactivate

### trying with hmmer 3.3.0 ###
conda create -y -n hmmer-3.3.0 -c conda-forge -c bioconda -c defaults hmmer=3.3.0
conda activate 

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # no problem

After putting this example together to provide as an issue to HMMER's github, I realized it seems the great folks at HMMER already caught this, as it is noted as a bug fix in v3.3.2's release notes.

It's just not updated on conda yet, so if wanting to use conda, installing hmmer=3.3.0 as shown above in the kofamscan environment should get around this problem, or of course installing the latest from their github 👍

Thanks so much!! I update the "hmmer" and it works well now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants