ValueError: invalid literal for int() with base 10: '5_42' #96

shiyi-pan · 2022-01-13T07:49:01Z

Hi, I want to use cafe to do gene family study but when I run cafetutorial_report_analysis.py script , I met an error .

Here is my script :

$PYTHON cafetutorial_report_analysis.py -i report_run2.cafe -o summary20220113

Here is my error:
[*-------------------------------------------------] 0.033% complete.Traceback (most recent call last):
File "cafetutorial_report_analysis.py", line 371, in
results_main, node_fams_main = cra(inlines_main, results_main, node_fams_main, linestart_main, ancfilename, sorted_nodes, 1);
File "cafetutorial_report_analysis.py", line 193, in cra
anccount = int(tlinfo[curanc][4]);
ValueError: invalid literal for int() with base 10: '5_42'

Here is my input file:
report_run2.zip

Could you help me fix it? Thank you very much. By the way, is cafe has some article to cite?

yuanw-18 · 2022-03-01T14:59:12Z

I meet the same error. Do you have solution?

shiyi-pan · 2022-03-03T01:55:58Z

I'm sorry I can't help you. I don't solve this problem now.

yuanw-18 · 2022-03-03T02:05:11Z

I have solved my problem. Because my tree file has extra "." . I checked your input file. Maybe you can replace the species name with no dots, and remove the other information except
time. Good luck~

gwct · 2022-03-03T16:07:27Z

Hi all,
Sorry I missed this post. This is likely because your input tree has internal node labels, which CAFE can handle but the report analysis script cannot. This is a common issue and has been reported before (https://groups.google.com/g/hahnlabcafe/c/NboOVhcbZPk/m/JluqEyZVBAAJ). I'll copy the solutions I posted to that thread here too:

Until the report analysis script is fixed, there are two solutions:

Using your favorite text editor or bash regex program, do a simple find and replace of all the node labels in the report file to replace them with empty strings. This might be difficult in your case because you have so many unique node labels, and this also runs the risk of accidentally replacing other important text within the report file.
Remove the labels from your CAFE input tree and re-run CAFE to generate a report file without the node labels. I think this will be the easiest option.

shiyi-pan · 2022-03-04T01:22:03Z

Thank you

I have solved my problem. Because my tree file has extra "." . I checked your input file. Maybe you can replace the species name with no dots, and remove the other information except time. Good luck~

Thank you for your reply. I will replace the species name with no dots , such as replace "Cicerarietinum.representative.pep" with "Cicerarietinum_pep" . But i don't know how to "remove the other information except time" , could you give me a example ? Thank you very much.

yuanw-18 · 2022-03-04T01:35:30Z

tree (((((acch:74.8509,(((rhde:7.0997,rhwi:7.0997):3.6345,rhsi:10.7342):00.8280,rhov:11.5623):63.2887):8.2057,casi:83.0566):0.056636,dilo:88.7201):18.2580,prvu:106.9782):4.1897,soly:111.1678)

This is an example.

我再用中文解释一下，去掉所有空格和bootstrap的信息，仅保留时间信息。看上面@gwct的评论也是这个意思，计算扩张收缩的基因家族数量没有问题，但是如果要用report这个汇总的脚本就需要对把树上多余的标签信息去掉，

shiyi-pan · 2022-03-04T01:56:29Z

tree (((((acch:74.8509,(((rhde:7.0997,rhwi:7.0997):3.6345,rhsi:10.7342):00.8280,rhov:11.5623):63.2887):8.2057,casi:83.0566):0.056636,dilo:88.7201):18.2580,prvu:106.9782):4.1897,soly:111.1678)

This is an example.

我再用中文解释一下，去掉所有空格和bootstrap的信息，仅保留时间信息。看上面@gwct的评论也是这个意思，计算扩张收缩的基因家族数量没有问题，但是如果要用report这个汇总的脚本就需要对把树上多余的标签信息去掉，

非常感谢您的回复，以下是我改好的输入文件 filtered.cafe.input.tsv 和 cafetutorial_run1.sh，请帮我看一下是否还存在格式上的问题。

filtered.cafe.input.tsv如下：

Desc Orthogroup ArabidopsisthalianaAraport11representativepep Cicerarietinumrepresentativepep Gmax275Wm82longestprotein Medicagotruncatularepresentativepep NN1138longestpep Nelumbonucifera11longestpep SoyL01longestProtein SoyW01longestProtein ZH13longestprotein araduV14167gnm1ann1cxSMprotein
(null) OG0000014 1 0 21 4 3 1 97 94 42 21
(null) OG0000018 1 6 52 60 12 15 20 42 29 4
(null) OG0000020 0 3 42 96 14 0 13 21 34 9
(null) OG0000021 10 5 33 4 22 52 28 31 30 9

cafetutorial_run1.sh如下:

#!/gss1/home/hjb20181119/panyongpeng/NN1138-2/03.orthofinder_data/04.between_species_cafe/CAFE-4.2/CAFE/release/cafe

load -i filtered.cafe.input.tsv -t 4 -l ./reports/log_run1.txt -p 0.05
tree (((((Cicerarietinumrepresentativepep:21.121085,Medicagotruncatularepresentativepep:21.121085):19.000163,((SoyL01longestProtein:2.259435,SoyW01longestPro
tein:2.259435):0.301212,((NN1138longestpep:2.374908,Gmax275Wm82longestprotein:2.374908):0.097247,ZH13longestprotein:2.472155):0.088492):37.560601):8.591100,a
raduV14167gnm1ann1cxSMprotein:48.712348):43.714107,arabidopsisthalianaaraport11representativepep:92.426456):29.573544,Nelumbonucifera11longestpep:122.000000)
;
lambda -s -t (((((1,1)1,((1,1)1,((1,1)1,1)1)1)1,1)1,1)1,1);
report ./reports/report_run1

shiyi-pan · 2022-03-04T01:58:18Z

tree (((((acch:74.8509,(((rhde:7.0997,rhwi:7.0997):3.6345,rhsi:10.7342):00.8280,rhov:11.5623):63.2887):8.2057,casi:83.0566):0.056636,dilo:88.7201):18.2580,prvu:106.9782):4.1897,soly:111.1678)
This is an example.
我再用中文解释一下，去掉所有空格和bootstrap的信息，仅保留时间信息。看上面@gwct的评论也是这个意思，计算扩张收缩的基因家族数量没有问题，但是如果要用report这个汇总的脚本就需要对把树上多余的标签信息去掉，

非常感谢您的回复，以下是我改好的输入文件 filtered.cafe.input.tsv 和 cafetutorial_run1.sh，请帮我看一下是否还存在格式上的问题。

filtered.cafe.input.tsv如下：

Desc Orthogroup ArabidopsisthalianaAraport11representativepep Cicerarietinumrepresentativepep Gmax275Wm82longestprotein Medicagotruncatularepresentativepep NN1138longestpep Nelumbonucifera11longestpep SoyL01longestProtein SoyW01longestProtein ZH13longestprotein araduV14167gnm1ann1cxSMprotein (null) OG0000014 1 0 21 4 3 1 97 94 42 21 (null) OG0000018 1 6 52 60 12 15 20 42 29 4 (null) OG0000020 0 3 42 96 14 0 13 21 34 9 (null) OG0000021 10 5 33 4 22 52 28 31 30 9

cafetutorial_run1.sh如下:

#!/gss1/home/hjb20181119/panyongpeng/NN1138-2/03.orthofinder_data/04.between_species_cafe/CAFE-4.2/CAFE/release/cafe

load -i filtered.cafe.input.tsv -t 4 -l ./reports/log_run1.txt -p 0.05 tree (((((Cicerarietinumrepresentativepep:21.121085,Medicagotruncatularepresentativepep:21.121085):19.000163,((SoyL01longestProtein:2.259435,SoyW01longestPro tein:2.259435):0.301212,((NN1138longestpep:2.374908,Gmax275Wm82longestprotein:2.374908):0.097247,ZH13longestprotein:2.472155):0.088492):37.560601):8.591100,a raduV14167gnm1ann1cxSMprotein:48.712348):43.714107,arabidopsisthalianaaraport11representativepep:92.426456):29.573544,Nelumbonucifera11longestpep:122.000000) ; lambda -s -t (((((1,1)1,((1,1)1,((1,1)1,1)1)1)1,1)1,1)1,1); report ./reports/report_run1

脚本cafetutorial_report_analysis.py正常运行了，非常感谢。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: invalid literal for int() with base 10: '5_42' #96

ValueError: invalid literal for int() with base 10: '5_42' #96

shiyi-pan commented Jan 13, 2022

yuanw-18 commented Mar 1, 2022

shiyi-pan commented Mar 3, 2022

yuanw-18 commented Mar 3, 2022

gwct commented Mar 3, 2022

shiyi-pan commented Mar 4, 2022

yuanw-18 commented Mar 4, 2022

shiyi-pan commented Mar 4, 2022

shiyi-pan commented Mar 4, 2022

ValueError: invalid literal for int() with base 10: '5_42' #96

ValueError: invalid literal for int() with base 10: '5_42' #96

Comments

shiyi-pan commented Jan 13, 2022

yuanw-18 commented Mar 1, 2022

shiyi-pan commented Mar 3, 2022

yuanw-18 commented Mar 3, 2022

gwct commented Mar 3, 2022

shiyi-pan commented Mar 4, 2022

yuanw-18 commented Mar 4, 2022

shiyi-pan commented Mar 4, 2022

shiyi-pan commented Mar 4, 2022