Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: invalid literal for int() with base 10: '5_42' #96

Open
shiyi-pan opened this issue Jan 13, 2022 · 8 comments
Open

ValueError: invalid literal for int() with base 10: '5_42' #96

shiyi-pan opened this issue Jan 13, 2022 · 8 comments

Comments

@shiyi-pan
Copy link

Hi, I want to use cafe to do gene family study but when I run cafetutorial_report_analysis.py script , I met an error .

Here is my script :

$PYTHON cafetutorial_report_analysis.py -i report_run2.cafe -o summary20220113

Here is my error:
[*-------------------------------------------------] 0.033% complete.Traceback (most recent call last):
File "cafetutorial_report_analysis.py", line 371, in
results_main, node_fams_main = cra(inlines_main, results_main, node_fams_main, linestart_main, ancfilename, sorted_nodes, 1);
File "cafetutorial_report_analysis.py", line 193, in cra
anccount = int(tlinfo[curanc][4]);
ValueError: invalid literal for int() with base 10: '5_42'

Here is my input file:
report_run2.zip

Could you help me fix it? Thank you very much. By the way, is cafe has some article to cite?

@yuanw-18
Copy link

yuanw-18 commented Mar 1, 2022

I meet the same error. Do you have solution?

@shiyi-pan
Copy link
Author

I'm sorry I can't help you. I don't solve this problem now.

@yuanw-18
Copy link

yuanw-18 commented Mar 3, 2022

I have solved my problem. Because my tree file has extra "." . I checked your input file. Maybe you can replace the species name with no dots, and remove the other information except
time. Good luck~

@gwct
Copy link
Contributor

gwct commented Mar 3, 2022

Hi all,
Sorry I missed this post. This is likely because your input tree has internal node labels, which CAFE can handle but the report analysis script cannot. This is a common issue and has been reported before (https://groups.google.com/g/hahnlabcafe/c/NboOVhcbZPk/m/JluqEyZVBAAJ). I'll copy the solutions I posted to that thread here too:

Until the report analysis script is fixed, there are two solutions:

  1. Using your favorite text editor or bash regex program, do a simple find and replace of all the node labels in the report file to replace them with empty strings. This might be difficult in your case because you have so many unique node labels, and this also runs the risk of accidentally replacing other important text within the report file.
  2. Remove the labels from your CAFE input tree and re-run CAFE to generate a report file without the node labels. I think this will be the easiest option.

@shiyi-pan
Copy link
Author

Thank you

I have solved my problem. Because my tree file has extra "." . I checked your input file. Maybe you can replace the species name with no dots, and remove the other information except time. Good luck~

Thank you for your reply. I will replace the species name with no dots , such as replace "Cicerarietinum.representative.pep" with "Cicerarietinum_pep" . But i don't know how to "remove the other information except time" , could you give me a example ? Thank you very much.

@yuanw-18
Copy link

yuanw-18 commented Mar 4, 2022

tree (((((acch:74.8509,(((rhde:7.0997,rhwi:7.0997):3.6345,rhsi:10.7342):00.8280,rhov:11.5623):63.2887):8.2057,casi:83.0566):0.056636,dilo:88.7201):18.2580,prvu:106.9782):4.1897,soly:111.1678)

This is an example.

我再用中文解释一下,去掉所有空格和bootstrap的信息,仅保留时间信息。看上面@gwct的评论也是这个意思,计算扩张收缩的基因家族数量没有问题,但是如果要用report这个汇总的脚本就需要对把树上多余的标签信息去掉,

@shiyi-pan
Copy link
Author

tree (((((acch:74.8509,(((rhde:7.0997,rhwi:7.0997):3.6345,rhsi:10.7342):00.8280,rhov:11.5623):63.2887):8.2057,casi:83.0566):0.056636,dilo:88.7201):18.2580,prvu:106.9782):4.1897,soly:111.1678)

This is an example.

我再用中文解释一下,去掉所有空格和bootstrap的信息,仅保留时间信息。看上面@gwct的评论也是这个意思,计算扩张收缩的基因家族数量没有问题,但是如果要用report这个汇总的脚本就需要对把树上多余的标签信息去掉,

非常感谢您的回复,以下是我改好的输入文件 filtered.cafe.input.tsv 和 cafetutorial_run1.sh,请帮我看一下是否还存在格式上的问题。

filtered.cafe.input.tsv如下:

Desc Orthogroup ArabidopsisthalianaAraport11representativepep Cicerarietinumrepresentativepep Gmax275Wm82longestprotein Medicagotruncatularepresentativepep NN1138longestpep Nelumbonucifera11longestpep SoyL01longestProtein SoyW01longestProtein ZH13longestprotein araduV14167gnm1ann1cxSMprotein
(null) OG0000014 1 0 21 4 3 1 97 94 42 21
(null) OG0000018 1 6 52 60 12 15 20 42 29 4
(null) OG0000020 0 3 42 96 14 0 13 21 34 9
(null) OG0000021 10 5 33 4 22 52 28 31 30 9

cafetutorial_run1.sh如下:

#!/gss1/home/hjb20181119/panyongpeng/NN1138-2/03.orthofinder_data/04.between_species_cafe/CAFE-4.2/CAFE/release/cafe

load -i filtered.cafe.input.tsv -t 4 -l ./reports/log_run1.txt -p 0.05
tree (((((Cicerarietinumrepresentativepep:21.121085,Medicagotruncatularepresentativepep:21.121085):19.000163,((SoyL01longestProtein:2.259435,SoyW01longestPro
tein:2.259435):0.301212,((NN1138longestpep:2.374908,Gmax275Wm82longestprotein:2.374908):0.097247,ZH13longestprotein:2.472155):0.088492):37.560601):8.591100,a
raduV14167gnm1ann1cxSMprotein:48.712348):43.714107,arabidopsisthalianaaraport11representativepep:92.426456):29.573544,Nelumbonucifera11longestpep:122.000000)
;
lambda -s -t (((((1,1)1,((1,1)1,((1,1)1,1)1)1)1,1)1,1)1,1);
report ./reports/report_run1

@shiyi-pan
Copy link
Author

tree (((((acch:74.8509,(((rhde:7.0997,rhwi:7.0997):3.6345,rhsi:10.7342):00.8280,rhov:11.5623):63.2887):8.2057,casi:83.0566):0.056636,dilo:88.7201):18.2580,prvu:106.9782):4.1897,soly:111.1678)
This is an example.
我再用中文解释一下,去掉所有空格和bootstrap的信息,仅保留时间信息。看上面@gwct的评论也是这个意思,计算扩张收缩的基因家族数量没有问题,但是如果要用report这个汇总的脚本就需要对把树上多余的标签信息去掉,

非常感谢您的回复,以下是我改好的输入文件 filtered.cafe.input.tsv 和 cafetutorial_run1.sh,请帮我看一下是否还存在格式上的问题。

filtered.cafe.input.tsv如下:

Desc Orthogroup ArabidopsisthalianaAraport11representativepep Cicerarietinumrepresentativepep Gmax275Wm82longestprotein Medicagotruncatularepresentativepep NN1138longestpep Nelumbonucifera11longestpep SoyL01longestProtein SoyW01longestProtein ZH13longestprotein araduV14167gnm1ann1cxSMprotein (null) OG0000014 1 0 21 4 3 1 97 94 42 21 (null) OG0000018 1 6 52 60 12 15 20 42 29 4 (null) OG0000020 0 3 42 96 14 0 13 21 34 9 (null) OG0000021 10 5 33 4 22 52 28 31 30 9

cafetutorial_run1.sh如下:

#!/gss1/home/hjb20181119/panyongpeng/NN1138-2/03.orthofinder_data/04.between_species_cafe/CAFE-4.2/CAFE/release/cafe

load -i filtered.cafe.input.tsv -t 4 -l ./reports/log_run1.txt -p 0.05 tree (((((Cicerarietinumrepresentativepep:21.121085,Medicagotruncatularepresentativepep:21.121085):19.000163,((SoyL01longestProtein:2.259435,SoyW01longestPro tein:2.259435):0.301212,((NN1138longestpep:2.374908,Gmax275Wm82longestprotein:2.374908):0.097247,ZH13longestprotein:2.472155):0.088492):37.560601):8.591100,a raduV14167gnm1ann1cxSMprotein:48.712348):43.714107,arabidopsisthalianaaraport11representativepep:92.426456):29.573544,Nelumbonucifera11longestpep:122.000000) ; lambda -s -t (((((1,1)1,((1,1)1,((1,1)1,1)1)1)1,1)1,1)1,1); report ./reports/report_run1

脚本cafetutorial_report_analysis.py正常运行了,非常感谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants