Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Calculated posterior probability for family" after size filtering #46

Open
4 of 5 tasks
josiahseaman opened this issue Jul 11, 2018 · 8 comments
Open
4 of 5 tasks

Comments

@josiahseaman
Copy link

josiahseaman commented Jul 11, 2018

I've previously followed the tutorial and used Cafe with my own data. Now, with this new batch of data I'm getting ".WARNING: Calculated posterior probability for family 14866 = 0" until a Bus Error crashes it. I've rerun Cafe 4.2 on my old data and it still works.

Lambda : 0.01050260385206 Mu : 0.01697552947024 & Score: -inf
.WARNING: Calculated posterior probability for family 14866 = 0

Lambda : 0.01050260385206 Mu : 0.01697552947024 & Score: -inf
.WARNING: Calculated posterior probability for family 14866 = 0
.Bus Error

$ head -n 1 filtered_OG_counts.txt
Desc    Family ID       FRAX00  FRAX01  FRAX03  FRAX04  FRAX05  FRAX06  FRAX07  FRAX08  FRAX09  FRAX11  FRAX12  FRAX13  FRAX14  FRAX15  FRAX16  FRAX19  FRAX20  FRAX21  FRAX23  FRAX25  FRAX26  FRAX27  FRAX28  FRAX29  FRAX30  FRAX31  FRAX32  FRAX33  Mguttatus   Oeuropea        Slycopersicum
]$ grep 14866 filtered_OG_counts.txt
OG0014866       14866   1       1       0       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1    1       1       1       1       1       1       1       2       1       1

14866 Seems exceedingly normal. Removing this line just moves the error to a new family.

Things I have checked

  • Tree is ultrametric from r8s
  • There's no polytomies in the trees, shortest branch length is 0.047221
  • Header and tree species names match between Random_Sample_1000.GeneCount.txt and cafe script (below)
  • cafetutorial_clade_and_size_filter.py for size filtering
  • Branch lengths are not integer as one comment indicated, but I have other non-integer inputs working fine.
python ../cafe_tutorial/python_scripts/cafetutorial_clade_and_size_filter.py -i Cafe_orthofinder_Orthogroups.GeneCount.csv -o filtered_OG_counts.txt -s

head filtered_OG_counts.txt -n 1 > Random_Sample_1000.GeneCount.txt  # important to add header
shuf -n 1000 filtered_OG_counts.txt >> Random_Sample_1000.GeneCount.txt

#!cafe
load -i Random_Sample_1000.GeneCount.txt -t 1 -l reports/run1__orthofinder_full.txt
#Approximated ultrametric based on 763103 sites and 5 calibration points
tree (((((((((((FRAX30:2.062238,FRAX32:2.062238):0.73118,FRAX28:2.793419):1.660572,FRAX12:4.45399):4.76951,(FRAX07:8.140409,FRAX29:8.140409):1.083092):3.809055,FRAX08:13.032555):0.967445,(((((FRAX01:2.412569,FRAX16:2.412569):3.979589,FRAX15:6.392158):1.96257,FRAX00:8.354728):1.6873,(FRAX06:8.890557,FRAX23:8.890557):1.15147):3.083198,FRAX25:13.125226):0.874774):3.475312,FRAX21:17.475312):1.524688,(((FRAX19:8.534639,FRAX20:8.534639):1.589727,((FRAX11:5.06782,FRAX27:5.06782):5.009324,FRAX04:10.077145):0.047221):0.875635,(((((FRAX03:0.834928,FRAX09:0.834928):0.892633,FRAX13:1.72756):2.522183,(FRAX26:2.287722,FRAX14:2.287722):1.962021):3.374113,FRAX05:7.623856):1.770855,FRAX33:9.394711):1.605289):8.0):14.705751,FRAX31:33.705751):2.294249,Oeuropea:36.0):43.0,(Slycopersicum:36.746625,Mguttatus:36.746625):42.253375)
lambdamu -s 

I suspect there's something wrong with my tree, I just don't know what that would be.

Any advice you can offer is much appreciated. Thanks.

@josiahseaman
Copy link
Author

I was able to get Cafe to run by rounding everything to the nearest integers and then manually making it ultrametric. The tree is a lot less accurate now, but it works. That indicates that Cafe has trouble running with branch lengths close to zero. Perhaps they are rounded internally?

(((((((((((FRAX30:2,FRAX32:2):1,FRAX28:3):2,FRAX12:5):4,(FRAX07:8,FRAX29:8):1):4,FRAX08:13):1,(((((FRAX01:2,FRAX16:2):4,FRAX15:6):2,FRAX00:8):2,(FRAX06:9,FRAX23:9):1):3,FRAX25:13):1):3,FRAX21:17):2,(((FRAX19:8,FRAX20:8):2,((FRAX11:5,FRAX27:5):4,FRAX04:9):1):1,(((((FRAX03:1,FRAX09:1):1,FRAX13:2):2,(FRAX26:2,FRAX14:2):2):3,FRAX05:7):2,FRAX33:9):2):8):15,FRAX31:34):2,Oeuropea:36):43,(Slycopersicum:37,Mguttatus:37):42);

@benfulton
Copy link
Member

You're correct, CAFE rounds to the closest integer and if that integer is 0, it won't be able to do the calculations. We'll address this in a future release.

@josiahseaman
Copy link
Author

I'm considering a pull request for this issue then. It seems like a max(1, round(branch_length)) would do it. What's the downside to using a slightly non-ultrametric integer tree rather than crashing?

@benfulton
Copy link
Member

A pull request would be fine. Depending on the lengths of the other branches, CAFE still may not be able to calculate the probabilities, but certainly crashing is not a great solution :)

@Edison2021
Copy link

Hi Ben, I wonder if CAFE 4.2.1 can handle branch lengths close to zero now?

@benfulton
Copy link
Member

No. We will handle fractional branch lengths in the next major version.

@rdriver
Copy link

rdriver commented Nov 15, 2019

I have a tree with a lot of very short branch lengths below 1, some even <0.01. I'm wondering if there's anything that I can do. I'm assuming that simply multiplying the branch lengths to artificially inflate them is not an acceptable solution.

@benfulton
Copy link
Member

Hi @rdriver, please direct questions about CAFE to the Google group here: https://groups.google.com/forum/#!forum/hahnlabcafe . I asked your question there and multiplying the branch lengths should be fine. You just have to re-scale lambda to account for this at the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants