Replies: 21 comments
-
Hi @ShyamieG 👋 This is an interesting problem, thanks for the post! I wonder if you could boil this down to a really small example (max 10 nodes each), and paste in the node and edge tables to this discussion? That will make it easier for folks to understand what it is you're trying to do. Also, is there recombination in your simulations? |
Beta Was this translation helpful? Give feedback.
-
Hi @jeromekelleher, thanks for the response and suggestion! Yes, my simulations do include recombination. I was able to pare all four tree sequences down to many fewer nodes, and reproduced the same errors as before. I couldn't quite get tree 3 below 10 nodes, unfortunately, but it's close. Here are these smaller tree sequences, and below are the node and edge tables. Sorry for formatting =\ Tree sequence 1Node table
Edge table
Tree sequence 2Node table
Edge table
Merging trees 1 and 2 works as long as check_shared_equality = False. Tree sequence 3Node table
Edge table
Tree sequence 4Node table
Edge table
Merging trees 3 and 4 does not work even when I set check_shared_equality = False. |
Beta Was this translation helpful? Give feedback.
-
(I just edited your comment there to add triple-ticks around the tables. Works OK for most browers when displaying stuff designed for fixed-width output) |
Beta Was this translation helpful? Give feedback.
-
Great, thanks @ShyamieG! Now, would you mind also pasting in the |
Beta Was this translation helpful? Give feedback.
-
Sure thing - here they are: Tree 1
Tree 2
Tree 3
Tree 4
|
Beta Was this translation helpful? Give feedback.
-
Great, thanks. OK, so I think you want to merge ts3 and ts4 first - if you look at just the first tree in these (which happen to both cover 0-830001), what would the merged tree look like? |
Beta Was this translation helpful? Give feedback.
-
I still get the same errors merging the smaller versions of ts3 and ts4 :( This is exactly what I did:
|
Beta Was this translation helpful? Give feedback.
-
Sorry, I meant to sketch out what the first tree in the sequence would look like if you did the merge. I'm still not quite following what the goal is here, and I think going through the visualisation process is really helpful as a debugging tool in general. |
Beta Was this translation helpful? Give feedback.
-
Ah, got it. I think it should look like this:
(0, 1, and 2 descend from 3 - I couldn't find the right symbol for that) |
Beta Was this translation helpful? Give feedback.
-
Hmm, I don't see quite how to do this, and I don't think this is what |
Beta Was this translation helpful? Give feedback.
-
Sorry, I made a mistake drawing the above tree! Thank you for suggesting this, as it is helping me see the potential issues. Labeling nodes by their SLiM ids also helps me avoid confusing myself. These are the starting trees and the expected merged tree for that first interval: There must be a node X relating both '736356604' and '752228938' back to '6', but it seems it has been simplified out of both tree 3 and 4 at some point. This is okay. By moving along the genome one interval at a time, I was able to narrow down the source of my error to a single interval. Here is that: I can't really see why merging just these trees would be problematic, but I did notice something weird happening with tree 4. The relationship between 752228938 and 752528962 changes between intervals - in the first, they are parent-child, and in the second they are sisters (both children of 745825540). How could this have happened? |
Beta Was this translation helpful? Give feedback.
-
Ah ha - I think you're right here, the problem is that simplification has removed bits of the "shared" portions, so that these shared portions are no longer identical. In this most recent example: you're right that Tree 3 and Tree 4 are clearly mergable, but figuring out how to actually merge them is a serious algorithmic problem. When So, the question seems to be "how to get SLiM to retain this shared stuff without modifying it?" This is achieved in this vignette by just making sure to do |
Beta Was this translation helpful? Give feedback.
-
Hi @petrelharp! Thank you for pointing me to this vignette - I'm not sure how I didn't come across this before, as it is pretty much exactly what I have been doing (and trying to do) 😅. You're right, I had not specified 'treeSeqRememberIndividuals' in my simulations, so I will try that first. Regarding tree 4 in my last post - is it okay that the relationship between '752228938' and '752528962' has changed from parent-child to sisters across the two intervals? |
Beta Was this translation helpful? Give feedback.
-
I'm impressed you figured out how to do this without the vignette!
Well, it's possible, and the underlying code is very thoroughly checked, so I'd be very surprised if there were a bug like this. Let's see - suppose that |
Beta Was this translation helpful? Give feedback.
-
Ahh, yes - I can see that. Thanks for clearing that up! |
Beta Was this translation helpful? Give feedback.
-
Great! I'm going to close this but feel free to re-open if this isn't resolved. |
Beta Was this translation helpful? Give feedback.
-
I just converted this to a discussion as I think it's a valuable thing for people to be able to see in future. |
Beta Was this translation helpful? Give feedback.
-
Hello again - I'm trying to merge two tree sequences that essentially represent two samples taken from the same population at the same time without replacement, trying to keep things as simple as possible (also no recombination). I now call sim.treeSeqRememberIndividuals() right before calling treeSeqOutput() with simplify=F. However, I'm still having problems union-ing the two resulting trees when check_shared_equality = True. I am using essentially the same code as in the vignette you pointed me to earlier. I have checked for conflicting edges, where parent-child relationships (determined by matching on the slim ids) are discordant across the two trees, but no such edges exist. What else is 'check_shared_equality' looking for? When would be okay to not 'check_shared_equality' when merging two trees? Sorry to keep bugging you about this issue, but I'm still not sure where I'm going wrong... |
Beta Was this translation helpful? Give feedback.
-
Let's see - I think that all properties of the shared nodes should be the same; all edges between shared nodes should be the same, all mutations falling on such edges should be the same, and all individuals of the shared nodes should be the same. A way to check for discrepancies would be to It sounds like the thing to start from is where your code differs from the example code? |
Beta Was this translation helpful? Give feedback.
-
Ahhh, okay! Thank you so much, that is very helpful. I think I finally found the problem - the individual IDs were not matching up. I was using the same code to calculate matches and merge as in the vignette, but my process for generating the tree sequences is a lot more complicated than in the vignette. Still, I tried it with a simple example and I can see that everything matches except the individual ids. If I merge these with 'check_share_equality = False, it looks those nodes will take on a new individual id. I think this largely solves my problem, but I will want to make sure these are the only kinds of conflicts that arise when I use check_shared_equality=False. Is there a straightforward way of doing that? |
Beta Was this translation helpful? Give feedback.
-
Sorry, a bit of an aside - for the match_nodes() function in the vignette, shouldn't you check both trees to make sure there are no node matches younger than the split time?
Because these are running in parallel, a node can be born in 'ts' under the same slim_id as a node in 'other' that existed prior to the split time (so your 'alive_before_split1' condition will not catch it). I added the following to my code to resolve these kinds of conflicts:
Does that make sense? |
Beta Was this translation helpful? Give feedback.
-
I have what might be a pretty unusual case of needing to merge a set of related tree sequences. I am simulating the evolution of a pathogen as infections are transmitted from one individual to another through time. I first simulate the infection network (epidemiological simulation) and then go back and simulate the genomes, using SLiM, as they follow the simulated infection paths (evolutionary genomic simulation). My SLiM scripts simulate pathogen reproduction within an infected individual and output the resulting tree sequence, which becomes input for the next iteration of the SLiM script (i.e. the individual to whom the infection is transmitted to next). One contagious individual can infect more than one susceptible individual, so the tree sequences can diverge and continue in parallel at various points in the infection scenario.
The result of my simulations are a set of tree sequences that have been 'sampled' from a population of infected individuals, and are related to each other by the epidemiological simulation. So far, so good. My goal now is to merge these tree sequences to get a unified tree sequence representing the genetic ancestry of all the sampled pathogens. I have figured out that I must use the union() function, but I'm running into some problems actually executing it.
To illustrate, I've attached some code and four tree sequences that represent the tips of a clade within my infection network. In my example, 1 and 2 diverge first (14 time steps ago) then 3 and 4 diverge (41 time steps ago).
My idea is to iteratively merge sister tips until I am left with just one tree (e.g. 1&2 and 3&4, then 12&34). I match nodes based on the IDs that slim assigns them, renaming nodes that are younger than the divergence of the two infections in question. Even after accounting for possible discrepancies, I find that merging trees fails pretty frequently.
In my example, if I try to merge trees 1 and 2 when 'check_shared_equality = True' union() fails with the error 'Shared portions of the tree sequences are not equal.'. I can set 'check_shared_equality = False' and merge trees 1 and 2 just fine. But do I want to do this? I'm not sure what it means for the shared portions to be equal vs. not. In what cases would it be okay to set this parameter to False?
If I try to merge trees 3 and 4, however, even setting 'check_shared_equality = False' does not work, and union() fails with a different error - 'Bad edges: contradictory children for a given parent over an interval, or indexes need to be rebuilt.'. I have spent a lot of time trying to figure out what constitutes a 'bad edge' and how to fix it, but I am not sure how to go about identifying these problematic edges. For example, trees 1 and 2 have edges where child nodes are discordant in their parents or left/right coordinates, but this does not appear to present a problem (at least not when 'check_shared_equality = False'). The message suggests that rebuilding the indexes might help, but I'm not sure how to do this - or what it even means, to be honest.
Sorry for the long post! Thanks for reading and for any advice on how I might be able fix my problem.
Beta Was this translation helpful? Give feedback.
All reactions