diff --git a/CHANGELOG.md b/CHANGELOG.md index 3f362134..cb0f9936 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,8 +1,19 @@ # Change Log -## [v3.1.6](https://github.com/sanger-pathogens/gubbins/tree/v3.1.6) (2021-1-20) +## [v3.2.1](https://github.com/sanger-pathogens/gubbins/tree/v3.2.1) (2022-5-24) +[Full Changelog](https://github.com/sanger-pathogens/gubbins/compare/v3.1.6...v3.2.1) + +- Fix problem with sequence reconstruction +- Improve detection of small recombinations by modifying window sizes +- Enable resumption of stalled analyses +- Clean C code +- Fixes to scripts +- Add CI tests and update expected results + +## [v3.1.6](https://github.com/sanger-pathogens/gubbins/tree/v3.1.6) (2022-1-20) [Full Changelog](https://github.com/sanger-pathogens/gubbins/compare/v3.1.5...v3.1.6) + - Fix problem with sequence reconstruction - Add test for consistency of reconstructions diff --git a/README.md b/README.md index b050567f..e5c45cdf 100644 --- a/README.md +++ b/README.md @@ -105,7 +105,7 @@ chmod +x configure make sudo make install cd python -python setup.py install +python3 -m pip install . ``` ### OSX/Linux/Windows - Virtual Machine diff --git a/VERSION b/VERSION index 9cec7165..e4604e3a 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -3.1.6 +3.2.1 diff --git a/docs/gubbins_manual.md b/docs/gubbins_manual.md index c4b372ae..69c27fb0 100644 --- a/docs/gubbins_manual.md +++ b/docs/gubbins_manual.md @@ -170,7 +170,7 @@ Gubbins was originally designed to use a [joint ancestral state reconstruction]( ### Recombination detection options -Recombination is detected using a [spatial scanning statistic](https://link.springer.com/chapter/10.1007/978-1-4612-1578-3_14), which relies on a sliding window. The size of this window may need to be reduced if you apply Gubbins to very small genomes (e.g. viruses). +Recombination is detected using a [spatial scanning statistic](https://link.springer.com/chapter/10.1007/978-1-4612-1578-3_14), which relies on a sliding window. The size of this window may need to be reduced if you apply Gubbins to very small genomes (e.g. viruses). To increase the sensitivity for detecting recombinations, `--min-snps` can be set at the minimum value of 2; the `--p-value` threshold required to detect recombinations can be increased; the `--trimming-ratio` can be raised above 1.0, to disfavour the trimming of recombination edges; and the `--extensive-search` mode can be used. ``` --min-snps MIN_SNPS, -m MIN_SNPS @@ -179,19 +179,26 @@ Recombination is detected using a [spatial scanning statistic](https://link.spri Minimum window size (default: 100) --max-window-size MAX_WINDOW_SIZE, -b MAX_WINDOW_SIZE Maximum window size (default: 10000) + --p-value P_VALUE Uncorrected p value used to identify recombinations (default: 0.05) + --trimming-ratio TRIMMING_RATIO + Ratio of log probabilities used to trim recombinations (default: 1.0) + --extensive-search Undertake slower, more thorough, search for recombination (default: False) ``` -### Algorithm stop options +### Algorithm stop and restart options -Given the scale of available dataset sizes, and the size of tree space, it is unlikely that any Gubbins analysis will ever converge based on identifying identical trees in subsequent iterations. Note that trees from previous iterations are used as starting trees for inference in subsequent iterations with IQTree and RAxML (although not RAxML-NG). In practice, there is little improvement to the tree after three iterations. +Given the scale of available dataset sizes, and the size of tree space, it is unlikely that any Gubbins analysis will ever converge based on identifying identical trees in subsequent iterations. Normally the algorithm will stop after reaching the maximum number of iterations. Should the run fail or stall before this point, the analysis can be restarted from the last iteration that successfully completed by providing a tree through the `--resume` flag (all other flags should be kept identical to the original commend, including `--iterations`). Note that although only the tree is provided to `--resume`, the corresponding alignment generated at the end of the same iteration also needs to be available within the same directory. ``` --iterations ITERATIONS, -i ITERATIONS Maximum No. of iterations (default: 5) --converge-method {weighted_robinson_foulds,robinson_foulds,recombination}, -z {weighted_robinson_foulds,robinson_foulds,recombination} Criteria to use to know when to halt iterations (default: weighted_robinson_foulds) + --resume RESUME Intermediate tree from previous run (must include "iteration_X" in file name) (default: None) ``` +Note that trees from previous iterations are used as starting trees for inference in subsequent iterations with IQTree and RAxML (although not RAxML-NG). + ## Output files A successful Gubbins run will generate files with the suffixes: @@ -221,13 +228,15 @@ The `.per_branch_statistics.csv` file contains summary statistics for each branc * **Node** - Name of the node subtended by the branch. This can either be one of the taxa included in the input alignment, or an internal node, which are numbered * **Total SNPs** - Total number of base substitutions reconstructed onto the branch -* **Num of SNPs inside recombinations** - Number of base substitutions reconstructed onto the branch that fall within a predicted recombination (*r*) -* **Num of SNPs outside recombinations** - Number of base substitutions reconstructed onto the branch that fall outside of a predicted recombination. i.e. predicted to have arisen by point mutation (*m*) -* **Num of Recombination Blocks** - Total number of recombination blocks reconstructed onto the branch -* **Bases in recombinations** - Total length of all recombination events reconstructed onto the branch +* **Number of SNPs Inside Recombinations** - Number of base substitutions reconstructed onto the branch that fall within a predicted recombination (*r*) +* **Number of SNPs Outside Recombinations** - Number of base substitutions reconstructed onto the branch that fall outside of a predicted recombination. i.e. predicted to have arisen by point mutation (*m*) +* **Number of Recombination Blocks** - Total number of recombination blocks reconstructed onto the branch +* **Bases in Recombinations** - Total length of all recombination events reconstructed onto the branch +* **Cumulative Bases in Recombinations** - Total number of bases in the alignment affected by recombination on this branch and its ancestors * ***r/m*** - The r/m value for the branch. This value gives a measure of the relative impact of recombination and mutation on the variation accumulated on the branch * ***rho/theta*** - The ratio of the number of recombination events to point mutations on a branch; a measure of the relative rates of recombination and point mutation * **Genome Length** - The total number of aligned bases between the ancestral and descendent nodes for the branch excluding any missing data or gaps in either +* **Bases in Clonal Frame** - The number of called bases at the descendant node that have not been affected by recombination on this branch or an ancestor (i.e., the length of sequence that can be used for phylogenetic interpretation) Note that all positions in the output files are relative to the input alignment. If you wish to compare the positions of recombinations relative to a reference annotation, their coordinates will need to be adjusted to account for any gaps in the reference sequence introduced when generating the alignment. diff --git a/python/gubbins/common.py b/python/gubbins/common.py index 0645d387..eda19e43 100644 --- a/python/gubbins/common.py +++ b/python/gubbins/common.py @@ -108,11 +108,28 @@ def parse_and_run(input_args, program_description=""): gaps_vcf_filename = base_filename + ".gaps.vcf" joint_sequences_filename = base_filename + ".seq.joint.aln" + # If restarting from a previous run + starting_iteration = 1 + if input_args.resume is not None: + search_itr = re.search(r'iteration_(\d+)', input_args.resume) + if search_itr is None: + sys.stderr.write('Resuming a Gubbins run requires a tree file name containing the phrase "iteration_X"\n') + exit(1) + else: + starting_iteration = int(search_itr.group(1)) + 1 + if starting_iteration >= input_args.iterations: + sys.stderr.write('Run has already reached the number of specified iterations\n') + exit(1) + else: + sys.stderr.write('Resuming Gubbins analysis at iteration ' + str(starting_iteration) + '\n') + input_args.starting_tree = input_args.resume + current_tree_name = input_args.starting_tree + # Check if intermediate files from a previous run exist intermediate_files = [basename + ".iteration_"] - if not input_args.no_cleanup: + if not input_args.no_cleanup and input_args.resume is None: utils.delete_files(".", intermediate_files, "", input_args.verbose) - if utils.do_files_exist(".", intermediate_files, "", input_args.verbose): + if utils.do_files_exist(".", intermediate_files, "", input_args.verbose) and input_args.resume is None: sys.exit("Intermediate files from a previous run exist. Please rerun without the --no_cleanup option " "to automatically delete them or with the --use_time_stamp to add a unique prefix.") @@ -176,11 +193,11 @@ def parse_and_run(input_args, program_description=""): reconvert_fasta_file(gaps_alignment_filename, base_filename + ".start") # Start the main loop printer.print("\nEntering the main loop.") - for i in range(1, input_args.iterations+1): + for i in range(starting_iteration, input_args.iterations+1): printer.print("\n*** Iteration " + str(i) + " ***") # 1.1. Construct the tree-building command depending on the iteration and employed options - if i == 2: + if i == 2 or input_args.resume is not None: # Select the algorithms used for the subsequent iterations current_tree_builder, current_model_fitter, current_model, extra_tree_arguments, extra_model_arguments = return_algorithm_choices(input_args,i) # Initialise tree builder @@ -247,7 +264,7 @@ def parse_and_run(input_args, program_description=""): # 3.2a. Joint ancestral reconstruction printer.print(["\nReconstructing ancestral sequences with pyjar..."]) - if i == 1: + if i == starting_iteration: # 3.3a. Read alignment and identify unique base patterns in first iteration only @@ -281,6 +298,7 @@ def parse_and_run(input_args, program_description=""): info_filename = info_filename, # file containing evolutionary model parameters info_filetype = input_args.model_fitter, # model fitter - format of file containing evolutionary model parameters output_prefix = temp_working_dir + "/" + ancestral_sequence_basename, # output prefix + outgroup_name = input_args.outgroup, # outgroup for rooting and reconstruction threads = input_args.threads, # number of cores to use verbose = input_args.verbose, max_pos = max_pos) @@ -354,7 +372,8 @@ def parse_and_run(input_args, program_description=""): shutil.copyfile(current_tree_name_with_internal_nodes, current_tree_name) gubbins_command = create_gubbins_command( gubbins_exec, gaps_alignment_filename, gaps_vcf_filename, current_tree_name, - input_args.alignment_filename, input_args.min_snps, input_args.min_window_size, input_args.max_window_size) + input_args.alignment_filename, input_args.min_snps, input_args.min_window_size, input_args.max_window_size, + input_args.p_value, input_args.trimming_ratio, input_args.extensive_search) printer.print(["\nRunning Gubbins to detect recombinations...", gubbins_command]) try: subprocess.check_call(gubbins_command, shell=True) @@ -617,13 +636,16 @@ def return_algorithm(algorithm_choice, model, input_args, node_labels = None, ex return initialised_algorithm def create_gubbins_command(gubbins_exec, alignment_filename, vcf_filename, current_tree_name, - original_alignment_filename, min_snps, min_window_size, max_window_size): + original_alignment_filename, min_snps, min_window_size, max_window_size, + p_value, trimming_ratio, extensive_search): command = [gubbins_exec, "-r", "-v", vcf_filename, "-a", str(min_window_size), "-b", str(max_window_size), "-f", original_alignment_filename, "-t", current_tree_name, - "-m", str(min_snps), alignment_filename] + "-m", str(min_snps), "-p", str(p_value), "-i", str(trimming_ratio)] + if extensive_search: + command.append("-x") + command.append(alignment_filename) return " ".join(command) - def number_of_sequences_in_alignment(filename): return len(get_sequence_names_from_alignment(filename)) @@ -734,14 +756,13 @@ def reroot_tree(tree_name, outgroups): def reroot_tree_with_outgroup(tree_name, outgroups): clade_outgroups = get_monophyletic_outgroup(tree_name, outgroups) - outgroups = [{'name': taxon_name} for taxon_name in clade_outgroups] - - tree = Phylo.read(tree_name, 'newick') - tree.root_with_outgroup(*outgroups) - Phylo.write(tree, tree_name, 'newick') - tree = dendropy.Tree.get_from_path(tree_name, 'newick', preserve_underscores=True) - tree.deroot() + outgroup_mrca = tree.mrca(taxon_labels=clade_outgroups) + print('Edge length is: ' + str(outgroup_mrca.edge.length)) + tree.reroot_at_edge(outgroup_mrca.edge, + length1 = outgroup_mrca.edge.length/2, + length2 = outgroup_mrca.edge.length/2, + update_bipartitions=False) tree.update_bipartitions() output_tree_string = tree_as_string(tree, suppress_internal=False) with open(tree_name, 'w+') as output_file: diff --git a/python/gubbins/pyjar.py b/python/gubbins/pyjar.py index c1ac17ea..c4519f52 100755 --- a/python/gubbins/pyjar.py +++ b/python/gubbins/pyjar.py @@ -339,23 +339,10 @@ def fill_out_aln(out_aln,reconstructed_alleles,ordered_bases,ancestral_node_orde for column in base_pattern_columns: out_aln[column,index] = base -# Return positions of columns in alignment -########################################## -@njit(numba.int32[:](numba.int32[:], - numba.int32[:,:], - numba.int32), - cache=True) -def get_columns(base_pattern_columns_padded,column_positions,column_index): - base_pattern_columns_indices = numpy.argmax(base_pattern_columns_padded == -1) - if base_pattern_columns_indices == 0: - base_pattern_columns_indices = base_pattern_columns_padded.size - base_pattern_columns = column_positions[column_index,0:base_pattern_columns_indices] - return base_pattern_columns - # Reconstruct each base pattern ############################### -@njit(numba.void(numba.uint8[:,:], - numba.int32[:,:], +@njit(numba.void(numba.uint8[:], + numba.int32[:], numba.float32[:,:], numba.uint8[:,:], numba.typeof(numpy.dtype('i1'))[:,:], @@ -373,8 +360,8 @@ def get_columns(base_pattern_columns_padded,column_positions,column_index): numba.uint8[:], numba.int32[:]), cache=True) -def iterate_over_base_patterns(columns, - column_positions, +def iterate_over_base_patterns(column, + base_pattern_columns, Lmat, Cmat, tmp_out_aln, @@ -392,144 +379,116 @@ def iterate_over_base_patterns(columns, reconstructed_base_indices, node_snps): - column_indices = numpy.arange(columns.shape[0], dtype = numpy.int32) Cmat_null = numpy.array([0,1,2,3], dtype = numpy.uint8) - - for column_index in column_indices: - - # Get column bases - column = columns[column_index] - # Get column positions - base_pattern_columns_padded = column_positions[column_index] - base_pattern_columns = get_columns(base_pattern_columns_padded,column_positions,column_index) + # Reset matrices + Lmat.fill(numpy.NINF) + Cmat[:] = Cmat_null + + # Count unknown bases + unknown_base_count = numpy.count_nonzero(column > 3) + column_base_indices = numpy.unique(column[numpy.where(column <= 3)]) + + # Heuristic for speed: if all taxa are monomorphic, with a gap in only one sequence, then the ancestral states + # will all be the observed base, as no ancestral node will have two child nodes with unknown bases at this site + if unknown_base_count == 1 and column_base_indices.size == 1: + # If site is monomorphic - replace entire column + tmp_out_aln[base_pattern_columns,:] = ordered_bases[column_base_indices[0]] + else: + # Otherwise perform a full ML inference + #1 For each OTU y perform the following: + #Visit a nonroot internal node, z, which has not been visited yet, but both of whose sons, nodes x and y, have already been visited, i.e., Lx(j), Cx(j), Ly(j), and Cy(j) have already been defined for each j. Let tz be the length of the branch connecting node z and its father. For each amino acid i, compute Lz(i) and Cz(i) according to the following formulae: + #Denote the three sons of the root by x, y, and z. For each amino acid k, compute the expression Pk x Lx(k) x Ly(k) x Lz(k). Reconstruct r by choosing the amino acid k maximizing this expression. The maximum value found is the likelihood of the best reconstruction. + for node_index in postordered_nodes: + if node_index == seed_node: + continue + #calculate the transistion matrix for the branch + pij=numpy.reshape(node_pij[node_index,:].copy(),(4,4)) + if node_index in leaf_nodes: + alignment_index = node_index_to_aln_row[node_index] + taxon_base_index = column[alignment_index] + process_leaf(Lmat, + Cmat, + pij, + node_index, + taxon_base_index) + else: + #2a. Lz(i) = maxj Pij(tz) x Lx(j) x Ly(j) + #2b. Cz(i) = the value of j attaining the above maximum. + find_most_likely_base_given_descendents(Lmat, + Cmat, + pij, + node_index, + child_nodes, + column_base_indices) + + # Calculate likelihood of base at root node + child_node_indices = child_nodes[node_index,:] + child_node_indices = child_node_indices[child_node_indices > -1] + calculate_root_likelihood(Lmat, Cmat, base_frequencies, node_index, child_node_indices, column_base_indices) + max_root_base_index = Cmat[node_index,numpy.argmax(Lmat[node_index,:])] + reconstructed_base_indices[node_index] = max_root_base_index - # Reset matrices - Lmat.fill(numpy.NINF) - Cmat[:] = Cmat_null - - # Count unknown bases - unknown_base_count = numpy.count_nonzero(column > 3) - column_base_indices = numpy.unique(column[numpy.where(column <= 3)]) - - # Heuristic for speed: if all taxa are monomorphic, with a gap in only one sequence, then the ancestral states - # will all be the observed base, as no ancestral node will have two child nodes with unknown bases at this site - if unknown_base_count == 1 and column_base_indices.size == 1: - # If site is monomorphic - replace entire column - tmp_out_aln[base_pattern_columns,:] = ordered_bases[column_base_indices[0]] - else: - # Otherwise perform a full ML inference - #1 For each OTU y perform the following: - #Visit a nonroot internal node, z, which has not been visited yet, but both of whose sons, nodes x and y, have already been visited, i.e., Lx(j), Cx(j), Ly(j), and Cy(j) have already been defined for each j. Let tz be the length of the branch connecting node z and its father. For each amino acid i, compute Lz(i) and Cz(i) according to the following formulae: - #Denote the three sons of the root by x, y, and z. For each amino acid k, compute the expression Pk x Lx(k) x Ly(k) x Lz(k). Reconstruct r by choosing the amino acid k maximizing this expression. The maximum value found is the likelihood of the best reconstruction. - for node_index in postordered_nodes: - if node_index == seed_node: - continue - #calculate the transistion matrix for the branch - pij=numpy.reshape(node_pij[node_index,:].copy(),(4,4)) - if node_index in leaf_nodes: - alignment_index = node_index_to_aln_row[node_index] - taxon_base_index = column[alignment_index] - process_leaf(Lmat, - Cmat, - pij, - node_index, - taxon_base_index) - else: - #2a. Lz(i) = maxj Pij(tz) x Lx(j) x Ly(j) - #2b. Cz(i) = the value of j attaining the above maximum. - find_most_likely_base_given_descendents(Lmat, - Cmat, - pij, - node_index, - child_nodes, - column_base_indices) - - # Calculate likelihood of base at root node - child_node_indices = child_nodes[node_index,:] - child_node_indices = child_node_indices[child_node_indices > -1] - calculate_root_likelihood(Lmat, Cmat, base_frequencies, node_index, child_node_indices, column_base_indices) - max_root_base_index = Cmat[node_index,numpy.argmax(Lmat[node_index,:])] - reconstructed_base_indices[node_index] = max_root_base_index - - #Traverse the tree from the root in the direction of the OTUs, assigning to each node its most likely ancestral character as follows: - # Note that preordered node list does not include the root - for node_index in preordered_nodes: - #5a. Visit an unreconstructed internal node x whose father y has already been reconstructed. Denote by i the reconstructed amino acid at node y. - parent_node_index = parent_nodes[node_index] - i = reconstructed_base_indices[parent_node_index] - #5b. Reconstruct node x by choosing Cx(i). - reconstructed_base_indices[node_index] = Cmat[node_index,i] - - # Put gaps back in and check that any ancestor with only gaps downstream is made a gap - # store reconstructed alleles - reconstructed_alleles = numpy.full(postordered_nodes.size, 8, dtype = numpy.uint8) - reconstruct_alleles(reconstructed_alleles, - postordered_nodes, - leaf_nodes, - node_index_to_aln_row, - column, - child_nodes, - reconstructed_base_indices - ) + #Traverse the tree from the root in the direction of the OTUs, assigning to each node its most likely ancestral character as follows: + # Note that preordered node list does not include the root + for node_index in preordered_nodes: + #5a. Visit an unreconstructed internal node x whose father y has already been reconstructed. Denote by i the reconstructed amino acid at node y. + parent_node_index = parent_nodes[node_index] + i = reconstructed_base_indices[parent_node_index] + #5b. Reconstruct node x by choosing Cx(i). + reconstructed_base_indices[node_index] = Cmat[node_index,i] + + # Put gaps back in and check that any ancestor with only gaps downstream is made a gap + # store reconstructed alleles + reconstructed_alleles = numpy.full(postordered_nodes.size, 8, dtype = numpy.uint8) + reconstruct_alleles(reconstructed_alleles, + postordered_nodes, + leaf_nodes, + node_index_to_aln_row, + column, + child_nodes, + reconstructed_base_indices + ) - # If site is not monomorphic - replace specific entries - fill_out_aln(tmp_out_aln, + # If site is not monomorphic - replace specific entries + fill_out_aln(tmp_out_aln, + reconstructed_alleles, + ordered_bases, + ancestral_node_order, + base_pattern_columns + ) + + # enumerate the number of base subtitutions reconstructed occurring on each branch + count_node_snps(node_snps, + preordered_nodes, + parent_nodes, + seed_node, reconstructed_alleles, - ordered_bases, - ancestral_node_order, - base_pattern_columns + base_pattern_columns, ) - # enumerate the number of base subtitutions reconstructed occurring on each branch - count_node_snps(node_snps, - preordered_nodes, - parent_nodes, - seed_node, - reconstructed_alleles, - base_pattern_columns, - ) - # Convert integers to bases - ########################### @njit(numba.void(numba.int8[:], - numba.typeof(numpy.dtype('U1'))[:]), - cache = True) - def int_to_seq(seq,out_seq): for i,b in enumerate(seq): - if b == 0: - out_seq[i] = 'A' - elif b == 1: - out_seq[i] = 'C' - elif b == 2: - out_seq[i] = 'G' - elif b == 3: - out_seq[i] = 'T' - elif b == 4: - out_seq[i] = '-' - elif b == 5: - out_seq[i] = 'N' - else: - print('Unable to process integer') ################## @@ -581,25 +540,31 @@ def get_base_patterns(prefix, verbose, threads = 1): base_positions_fn = prefix + '.gaps.base_positions.csv' if not os.path.isfile(base_positions_fn): sys.exit("Unable to open base positions file " + base_positions_fn + "\n") + array_max = 0 with open(base_positions_fn, 'r') as positions_file: for line in positions_file: - array_of_position_arrays.append(list(map(int,line.rstrip().split(',')))) - array_max = max([max(sublist) for sublist in array_of_position_arrays]) + 1 + subarray = numpy.asarray(list(map(int,line.rstrip().split(','))), + dtype = numpy.int32) + array_of_position_arrays.append(subarray) + subarray_max = numpy.amax(subarray) + if subarray_max > array_max: + array_max = subarray_max + array_max = array_max + 1 + # Record timing - t2=time.process_time() if verbose: print("Time taken to load unique base patterns:", t2-t1, "seconds") print("Unique base patterns: ", array_max) # Return output - return sequence_names,vstacked_patterns,array_of_position_arrays, array_max + return sequence_names,vstacked_patterns,array_of_position_arrays,array_max ######################################################## # Function for reconstructing individual base patterns # ######################################################## -def reconstruct_alignment_column(column_indices, +def reconstruct_alignment_column(column_index, base_pattern_positions, tree = None, preordered_nodes = None, @@ -643,11 +608,8 @@ def reconstruct_alignment_column(column_indices, base_patterns = numpy.ndarray(base_patterns.shape, dtype = base_patterns.dtype, buffer = base_patterns_shm.buf) # Load base pattern position information - # Extract information for iterations - column_positions = convert_to_square_numpy_array(base_pattern_positions) - columns = base_patterns[column_indices] - + column = base_patterns[column_index] ### TIMING if verbose: @@ -656,8 +618,8 @@ def reconstruct_alignment_column(column_indices, calc_time_start = time.process_time() # Iterate over columns - iterate_over_base_patterns(columns, - column_positions, + iterate_over_base_patterns(column, + base_pattern_positions, Lmat, Cmat, out_aln, @@ -675,21 +637,9 @@ def reconstruct_alignment_column(column_indices, reconstructed_base_indices, node_snps) - - ### TIMING - if verbose: - calc_time_end = time.process_time() - calc_time = (calc_time_end - calc_time_start) - # Close shared memory out_aln_shm.close() base_patterns_shm.close() - - - ### TIMING - if verbose: - print('Time for JAR preparation:\t' + str(prep_time)) - print('Time for JAR calculation:\t' + str(calc_time)) return node_snps @@ -705,13 +655,16 @@ def jar(sequence_names = None, info_filename = None, info_filetype = None, output_prefix = None, + outgroup_name = None, threads = 1, verbose = False, - mp_metho = "spawn", + mp_method = "spawn", max_pos = None): - - + if verbose: + prep_time = 0.0 + calc_time = 0.0 + prep_time_start = time.process_time() # Create a new alignment for the output containing all taxa in the input alignment alignment_sequence_names = {} for i, name in enumerate(sequence_names): @@ -753,6 +706,7 @@ def jar(sequence_names = None, seed_node_edge_truncation = True node_index_to_aln_row = numpy.full(num_nodes, -1, dtype=numpy.int32) ancestral_node_indices = {} + for node_index,node in zip(postordered_nodes,tree.postorder_node_iter()): if node.taxon == None: nodecounter+=1 @@ -776,8 +730,15 @@ def jar(sequence_names = None, # Set the length of one root-to-child branch to ~zero # as reconstruction should occur with rooting at a node # midpoint rooting causes problems at the root, especially w/JC69 - seed_node_edge_truncation = False - node_pij[node_index,:]=calculate_pij(node.edge_length/1e6, rm) + # With outgroup: + # Prefer to truncate the ingroup branch to force the recombinations + # onto the outgroup branch + if outgroup_name is None or \ + (outgroup_name is not None and node.taxon.label != outgroup_name): + seed_node_edge_truncation = False + node_pij[node_index,:]=calculate_pij(node.edge_length/1e6, rm) + else: + node_pij[node_index,:]=calculate_pij(node.edge_length, rm) else: node_pij[node_index,:]=calculate_pij(node.edge_length, rm) # Store information to avoid subsequent recalculation as @@ -832,88 +793,125 @@ def jar(sequence_names = None, with SharedMemoryManager() as smm: # Convert alignment to shared memory numpy array - new_aln_shared_array = generate_shared_mem_array(new_aln_array, smm) # Convert base patterns to shared memory numpy array - base_patterns_shared_array = generate_shared_mem_array(base_patterns, smm) # Convert base pattern positions to shared memory numpy array - bp_list = list(range(len(base_patterns))) npatterns = len(base_patterns) - ntaxa_jumps = ceil(npatterns / threads) - base_pattern_indices = [bp_list[i: i + ntaxa_jumps] for i in range(0, len(bp_list), ntaxa_jumps)] - base_positions = [base_pattern_positions[i:i + ntaxa_jumps] for i in range(0, len(base_pattern_positions), ntaxa_jumps)] - - # Parallelise reconstructions across alignment columns using multiprocessing - with multiprocessing.get_context(method=mp_metho).Pool(processes = threads) as pool: - reconstruction_results = pool.starmap(partial( - reconstruct_alignment_column, - tree = tree, - preordered_nodes = preordered_nodes, - postordered_nodes = postordered_nodes, - leaf_nodes = leaf_nodes, - parent_nodes = parent_nodes, - child_nodes = child_nodes, - seed_node = seed_node, - node_pij = node_pij, - node_index_to_aln_row = node_index_to_aln_row, - ancestral_node_order = ancestral_node_order, - base_patterns = base_patterns_shared_array, - base_frequencies = f, - new_aln = new_aln_shared_array, - threads = threads, - verbose = verbose), - zip(base_pattern_indices, base_positions) - ) - - # Write out alignment while shared memory manager still active - - out_aln_shm = shared_memory.SharedMemory(name = new_aln_shared_array.name) - out_aln = numpy.ndarray(new_aln_array.shape, dtype = 'i1', buffer = out_aln_shm.buf) - - - aln_line = numpy.full(len(out_aln[:,0]),"?",dtype="U1") if verbose: - print("Printing alignment with internal node sequences: ", output_prefix+".joint.aln") - source = alignment_filename - destination = output_prefix+".joint.aln" - dest = shutil.copy(source, destination) - with open(dest, "a") as asr_output: - for i,node_index in enumerate(ancestral_node_order): - taxon = ancestral_node_indices[node_index] - asr_output.write('>' + taxon + '\n') - int_to_seq(out_aln[:,i], aln_line) - asr_output.write(''.join(aln_line) + "\n") - - # Release pool nodes - pool.join() - - # Combine results for each base across the alignment - for node in tree.preorder_node_iter(): - node.edge_length = 0.0 # reset lengths to convert to SNPs - node_index = node_indices[node.taxon.label] - for x in range(len(reconstruction_results)): - try: - node.edge_length += reconstruction_results[x][node_index]; - except AttributeError: - continue - - # Print tree - from gubbins.common import tree_as_string + prep_time_end = time.process_time() + prep_time = prep_time_end - prep_time_start + calc_time_start = time.process_time() + + + if threads > 1: + + # Parallelise reconstructions across alignment columns using multiprocessing + with multiprocessing.get_context(method=mp_method).Pool(processes = threads) as pool: + reconstruction_results = pool.starmap(partial( + reconstruct_alignment_column, + tree = tree, + preordered_nodes = preordered_nodes, + postordered_nodes = postordered_nodes, + leaf_nodes = leaf_nodes, + parent_nodes = parent_nodes, + child_nodes = child_nodes, + seed_node = seed_node, + node_pij = node_pij, + node_index_to_aln_row = node_index_to_aln_row, + ancestral_node_order = ancestral_node_order, + base_patterns = base_patterns_shared_array, + base_frequencies = f, + new_aln = new_aln_shared_array, + threads = threads, + verbose = verbose), + zip(bp_list, base_pattern_positions) + ) + + # Write out alignment while shared memory manager still active + out_aln_shm = shared_memory.SharedMemory(name = new_aln_shared_array.name) + out_aln = numpy.ndarray(new_aln_array.shape, dtype = 'i1', buffer = out_aln_shm.buf) + + # Release pool nodes + pool.join() - if verbose: - print("Printing tree with internal nodes labelled: ", output_prefix+".joint.tre") - with open(output_prefix+".joint.tre", "w") as tree_output: + else: - recon_tree = tree_as_string(tree, - suppress_rooting=True, - suppress_internal=False) - print(recon_tree.replace('\'', ''), - file = tree_output) + # Run as a loop on a single core + reconstruction_results = \ + [ + reconstruct_alignment_column( + b, + p, + tree = tree, + preordered_nodes = preordered_nodes, + postordered_nodes = postordered_nodes, + leaf_nodes = leaf_nodes, + parent_nodes = parent_nodes, + child_nodes = child_nodes, + seed_node = seed_node, + node_pij = node_pij, + node_index_to_aln_row = node_index_to_aln_row, + ancestral_node_order = ancestral_node_order, + base_patterns = base_patterns_shared_array, + base_frequencies = f, + new_aln = new_aln_shared_array, + threads = threads, + verbose = verbose) for b,p in zip(bp_list, base_pattern_positions) + ] + + # Extract final result + out_aln_shm = shared_memory.SharedMemory(name = new_aln_shared_array.name) + out_aln = numpy.ndarray(new_aln_array.shape, dtype = 'i1', buffer = out_aln_shm.buf) + + # Process outputs + aln_line = numpy.full(len(out_aln[:,0]),"?",dtype="U1") + if verbose: + print("Printing alignment with internal node sequences: ", output_prefix+".joint.aln") + source = alignment_filename + destination = output_prefix+".joint.aln" + dest = shutil.copy(source, destination) + with open(dest, "a") as asr_output: + for i,node_index in enumerate(ancestral_node_order): + taxon = ancestral_node_indices[node_index] + asr_output.write('>' + taxon + '\n') + int_to_seq(out_aln[:,i], aln_line) + asr_output.write(''.join(aln_line) + "\n") + + # Combine results for each base across the alignment + for node in tree.preorder_node_iter(): + node.edge_length = 0.0 # reset lengths to convert to SNPs + node_index = node_indices[node.taxon.label] + for x in range(len(reconstruction_results)): + try: + node.edge_length += reconstruction_results[x][node_index]; + except AttributeError: + continue + + ### TIMING + if verbose: + calc_time_end = time.process_time() + calc_time = (calc_time_end - calc_time_start) + + # Print tree + from gubbins.common import tree_as_string + + if verbose: + print("Printing tree with internal nodes labelled: ", output_prefix+".joint.tre") + with open(output_prefix+".joint.tre", "w") as tree_output: + + recon_tree = tree_as_string(tree, + suppress_rooting=True, + suppress_internal=False) + print(recon_tree.replace('\'', ''), + file = tree_output) if verbose: print("Done") + print('Time for JAR preparation:\t' + str(prep_time)) + print('Time for JAR calculation:\t' + str(calc_time)) + diff --git a/python/gubbins/run_gubbins.py b/python/gubbins/run_gubbins.py index 15f21c44..1675f831 100755 --- a/python/gubbins/run_gubbins.py +++ b/python/gubbins/run_gubbins.py @@ -120,12 +120,33 @@ def parse_input_args(): help='Minimum window size', type=int, default=100) gubbinsGroup.add_argument('--max-window-size','-b', help='Maximum window size', type=int, default=10000) + gubbinsGroup.add_argument('--p-value', + help='Uncorrected p value used to identify recombinations', + type=float, + default=0.05) + gubbinsGroup.add_argument('--trimming-ratio', + help='Ratio of log probabilities used to trim recombinations', + type=float, + default=1.0) + gubbinsGroup.add_argument('--extensive-search', + help='Undertake slower, more thorough, search for recombination', + action='store_true', + default=False) - stopGroup = parser.add_argument_group('Algorithm stop options') - stopGroup.add_argument('--iterations', '-i', help='Maximum No. of iterations', type=int, default=5) - stopGroup.add_argument('--converge-method', '-z', help='Criteria to use to know when to halt iterations', - default='weighted_robinson_foulds', choices=['weighted_robinson_foulds', 'robinson_foulds', + stopGroup = parser.add_argument_group('Algorithm start/stop options') + stopGroup.add_argument('--iterations', '-i', + help='Maximum No. of iterations', + type=int, + default=5) + stopGroup.add_argument('--converge-method', '-z', + help='Criteria to use to know when to halt iterations', + default='weighted_robinson_foulds', + choices=['weighted_robinson_foulds', 'robinson_foulds', 'recombination']) + stopGroup.add_argument('--resume', + help='Intermediate tree from previous run (must include' + ' "iteration_X" in file name)', + default=None) return parser diff --git a/python/gubbins/tests/data/clade_to_extract.txt b/python/gubbins/tests/data/clade_to_extract.txt index 281753b1..01f6543a 100644 --- a/python/gubbins/tests/data/clade_to_extract.txt +++ b/python/gubbins/tests/data/clade_to_extract.txt @@ -1,7 +1,4 @@ sequence_1 -sequence_2 -sequence_3 -sequence_4 sequence_5 sequence_6 sequence_7 diff --git a/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_all_in_different_clade.tre b/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_all_in_different_clade.tre index b3f0a236..895d9f37 100644 --- a/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_all_in_different_clade.tre +++ b/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_all_in_different_clade.tre @@ -1 +1 @@ -(((E:1.0,D:1.0):2.0,C:1.0):1.0,B:1.0,A:1.0):1.0; +(A:0.5,(B:1.0,(C:1.0,(E:1.0,D:1.0):2.0):1.0):0.5):1.0; diff --git a/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_all_in_one_clade.tre b/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_all_in_one_clade.tre index b3f0a236..ef2de591 100644 --- a/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_all_in_one_clade.tre +++ b/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_all_in_one_clade.tre @@ -1 +1 @@ -(((E:1.0,D:1.0):2.0,C:1.0):1.0,B:1.0,A:1.0):1.0; +((B:1.0,A:1.0):0.5,(C:1.0,(E:1.0,D:1.0):2.0):0.5):1.0; diff --git a/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_with_two_mixed_clades.tre b/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_with_two_mixed_clades.tre index b3f0a236..895d9f37 100644 --- a/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_with_two_mixed_clades.tre +++ b/python/gubbins/tests/data/expected_reroot_tree_with_outgroups_with_two_mixed_clades.tre @@ -1 +1 @@ -(((E:1.0,D:1.0):2.0,C:1.0):1.0,B:1.0,A:1.0):1.0; +(A:0.5,(B:1.0,(C:1.0,(E:1.0,D:1.0):2.0):1.0):0.5):1.0; diff --git a/python/gubbins/tests/data/expected_test_reroot_tree_with_outgroups_all_in_one_clade_large.tre b/python/gubbins/tests/data/expected_test_reroot_tree_with_outgroups_all_in_one_clade_large.tre index 292bc7ec..4123b481 100644 --- a/python/gubbins/tests/data/expected_test_reroot_tree_with_outgroups_all_in_one_clade_large.tre +++ b/python/gubbins/tests/data/expected_test_reroot_tree_with_outgroups_all_in_one_clade_large.tre @@ -1 +1 @@ -((E:1.0,D:1.0):2.0,C:1.0,(B:1.0,A:1.0):1.0):1.0; +((C:1.0,(B:1.0,A:1.0):1.0):1.0,(E:1.0,D:1.0):1.0):1.0; diff --git a/python/gubbins/tests/data/multiple_recombinations.recombination_predictions.embl b/python/gubbins/tests/data/multiple_recombinations.recombination_predictions.embl index cdb80234..3daf4825 100644 --- a/python/gubbins/tests/data/multiple_recombinations.recombination_predictions.embl +++ b/python/gubbins/tests/data/multiple_recombinations.recombination_predictions.embl @@ -1,30 +1,30 @@ -FT misc_feature 29..49 -FT /node="Node_4->Node_3" -FT /neg_log_likelihood="4.955311" -FT /colour="2" -FT /taxa=" sequence_7 sequence_9 sequence_8" -FT /SNP_count="21" -FT misc_feature 51..84 +FT misc_feature 24..49 +FT /node="Node_3->sequence_8" +FT /neg_log_likelihood="3.983837" +FT /colour="4" +FT /taxa="sequence_8" +FT /SNP_count="22" +FT misc_feature 24..49 FT /node="Node_5->Node_4" -FT /neg_log_likelihood="10.195830" +FT /neg_log_likelihood="6.753589" FT /colour="2" -FT /taxa=" sequence_6 sequence_7 sequence_9 sequence_8" -FT /SNP_count="30" +FT /taxa=" sequence_7 sequence_9" +FT /SNP_count="22" FT misc_feature 51..84 -FT /node="Node_6->sequence_1" -FT /neg_log_likelihood="8.046578" +FT /node="Node_6->sequence_5" +FT /neg_log_likelihood="4.365959" FT /colour="4" -FT /taxa="sequence_1" +FT /taxa="sequence_5" FT /SNP_count="30" -FT misc_feature 124..201 -FT /node="Node_9->sequence_10" -FT /neg_log_likelihood="40.155361" -FT /colour="4" -FT /taxa="sequence_10" -FT /SNP_count="78" FT misc_feature 51..84 -FT /node="Node_9->Node_8" -FT /neg_log_likelihood="10.195830" +FT /node="Node_8->Node_2" +FT /neg_log_likelihood="9.268611" FT /colour="2" -FT /taxa=" sequence_3 sequence_4 sequence_2 sequence_5 sequence_6 sequence_7 sequence_9 sequence_8 sequence_1" +FT /taxa=" sequence_4 sequence_3 sequence_2" FT /SNP_count="30" +FT misc_feature 117..207 +FT /node="Node_9->Node_8" +FT /neg_log_likelihood="nan" +FT /colour="2" +FT /taxa=" sequence_4 sequence_3 sequence_2 sequence_1 sequence_5 sequence_8 sequence_6 sequence_7 sequence_9" +FT /SNP_count="91" diff --git a/python/gubbins/tests/data/multiple_recombinations_clade_extract.aln b/python/gubbins/tests/data/multiple_recombinations_clade_extract.aln index e63ed1ca..73be0012 100644 --- a/python/gubbins/tests/data/multiple_recombinations_clade_extract.aln +++ b/python/gubbins/tests/data/multiple_recombinations_clade_extract.aln @@ -4,24 +4,6 @@ AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AA ->sequence_2 -AAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAACACCCCCCCC -CCCCCCCACCCACCCCCCCCCCACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AA ->sequence_3 -AAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAATAAAAAAAAAAAAAAACACCCCCCCC -CCCCCCCACCCACCCCCCCCCCACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AA ->sequence_4 -AAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAATAAAAAGAAAAAAAAACACCCCCCCC -CCCCCCCACCCACCCCCCCCCCACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AA >sequence_5 AAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACCCCCCCC CCCCCCCACCCACCCCCCCCCCACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA diff --git a/python/gubbins/tests/data/multiple_recombinations_clade_extract.tree b/python/gubbins/tests/data/multiple_recombinations_clade_extract.tree index 99eeab07..c6ad4b3c 100644 --- a/python/gubbins/tests/data/multiple_recombinations_clade_extract.tree +++ b/python/gubbins/tests/data/multiple_recombinations_clade_extract.tree @@ -1 +1,2 @@ -(('sequence_3':0.000213,'sequence_4':0.911185)'Node_1':0.910968,('sequence_2':0.000222,(('sequence_5':0.000213,('sequence_6':0.000212,(('sequence_7':1.622202,'sequence_9':1.647725)'Node_2':0.000302,'sequence_8':0.000264)'Node_3':1.622508)'Node_4':1.22782)'Node_5':0.911834,'sequence_1':0.000214)'Node_6':0.911867)'Node_7':0.000217)'Node_8':28097902592.0; +((sequence_5:0.000213,(sequence_6:0.000212,((sequence_7:1.622202,sequence_9:1.647725)Node_2:0.000302,sequence_8:0.000264)Node_3:1.622508)Node_4:1.22782)Node_5:0.911834,sequence_1:0.000214)Node_6:28097902592.912083; + diff --git a/python/gubbins/tests/data/multiple_recombinations_extract.aln b/python/gubbins/tests/data/multiple_recombinations_extract.aln new file mode 100644 index 00000000..73be0012 --- /dev/null +++ b/python/gubbins/tests/data/multiple_recombinations_extract.aln @@ -0,0 +1,36 @@ +>sequence_1 +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AA +>sequence_5 +AAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACCCCCCCC +CCCCCCCACCCACCCCCCCCCCACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AA +>sequence_6 +AAAAAAAAAAAAAAAGAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AA +>sequence_7 +AAAAAAAAAAAACAAGAAAAAAAGAAACGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AA +>sequence_8 +AAAAAAAAAAAAAAAGAAAAAAAGAAACGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AA +>sequence_9 +---------------GAAAAAAAGAAAAGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AA diff --git a/python/gubbins/tests/data/multiple_recombinations_extract.gff b/python/gubbins/tests/data/multiple_recombinations_extract.gff new file mode 100644 index 00000000..a082b1bc --- /dev/null +++ b/python/gubbins/tests/data/multiple_recombinations_extract.gff @@ -0,0 +1,6 @@ +##gff-version 3 +##sequence-region SEQUENCE 1 242 +SEQUENCE GUBBINS CDS 29 49 0.000 . 0 node="Node_4->Node_3";neg_log_likelihood="4.955311";taxa=" sequence_7 sequence_9 sequence_8";snp_count="21"; +SEQUENCE GUBBINS CDS 51 84 0.000 . 0 node="Node_5->Node_4";neg_log_likelihood="10.195830";taxa=" sequence_6 sequence_7 sequence_9 sequence_8";snp_count="30"; +SEQUENCE GUBBINS CDS 51 84 0.000 . 0 node="Node_6->sequence_1";neg_log_likelihood="8.046578";taxa="sequence_1";snp_count="30"; +SEQUENCE GUBBINS CDS 51 84 0.000 . 0 node="Node_9->Node_8";neg_log_likelihood="10.195830";taxa=" sequence_3 sequence_4 sequence_2 sequence_5 sequence_6 sequence_7 sequence_9 sequence_8 sequence_1";snp_count="30"; diff --git a/python/gubbins/tests/data/robinson_foulds_distance_tree1.tre.reroot_at_sequence_4_2_expected b/python/gubbins/tests/data/robinson_foulds_distance_tree1.tre.reroot_at_sequence_4_2_expected index caf0cc04..9c6adeda 100644 --- a/python/gubbins/tests/data/robinson_foulds_distance_tree1.tre.reroot_at_sequence_4_2_expected +++ b/python/gubbins/tests/data/robinson_foulds_distance_tree1.tre.reroot_at_sequence_4_2_expected @@ -1 +1 @@ -(((((sequence_1:0.0,(sequence_6:0.03056,(sequence_5:0.24819,sequence_8:1e-05)N8:1e-05)N7:1e-05)N1:1e-05,(sequence_7:0.30066,sequence_10:2e-05)N3:1e-05)N2:0.19334,sequence_9:0.0002)N4:1.23278,sequence_3:8.38574)N5:0.00017,sequence_2:0.0002,sequence_4:0.30659)N6:0.0; +((sequence_2:0.0002,sequence_4:0.306587)N6:8.5e-05,(sequence_3:8.38574,(sequence_9:0.000196,((sequence_7:0.300658,sequence_10:2e-05)N3:1.1e-05,(sequence_1:2e-06,(sequence_6:0.030557,(sequence_5:0.24819,sequence_8:1.1e-05)N8:9e-06)N7:1e-05)N1:1e-05)N2:0.193335)N4:1.23278)N5:8.5e-05):0.0; diff --git a/python/gubbins/tests/data/robinson_foulds_distance_tree1.tre.reroot_at_sequence_4_expected b/python/gubbins/tests/data/robinson_foulds_distance_tree1.tre.reroot_at_sequence_4_expected index 8c9e72bc..3ad1658e 100644 --- a/python/gubbins/tests/data/robinson_foulds_distance_tree1.tre.reroot_at_sequence_4_expected +++ b/python/gubbins/tests/data/robinson_foulds_distance_tree1.tre.reroot_at_sequence_4_expected @@ -1 +1 @@ -(((((sequence_1:0.0,(sequence_6:0.03056,(sequence_5:0.24819,sequence_8:1e-05)N8:1e-05)N7:1e-05)N1:1e-05,(sequence_7:0.30066,sequence_10:2e-05)N3:1e-05)N2:0.19334,sequence_9:0.0002)N4:1.23278,sequence_3:8.38574)N5:0.00017,sequence_2:0.0002,sequence_4:0.30659):0.0; +(sequence_4:0.1532935,(sequence_2:0.0002,(sequence_3:8.38574,(sequence_9:0.000196,((sequence_7:0.300658,sequence_10:2e-05)N3:1.1e-05,(sequence_1:2e-06,(sequence_6:0.030557,(sequence_5:0.24819,sequence_8:1.1e-05)N8:9e-06)N7:1e-05)N1:1e-05)N2:0.193335)N4:1.23278)N5:0.00017)N6:0.1532935):0.0; diff --git a/python/gubbins/tests/data/valid_newick_tree.tre b/python/gubbins/tests/data/valid_newick_tree.tre index 3d62851b..8b55f851 100644 --- a/python/gubbins/tests/data/valid_newick_tree.tre +++ b/python/gubbins/tests/data/valid_newick_tree.tre @@ -1 +1 @@ -(sequence_1:0.000002,((sequence_7:0.300658,sequence_17:0.300658,sequence_27:0.300658,sequence_37:0.300658,sequence_47:0.300658,sequence_10:0.000020)N3:0.000011,(sequence_9:0.000196,(sequence_3:8.385740,(sequence_2:0.000200,sequence_4:0.306587)N6:0.000170)N5:1.232780)N4:0.193335)N2:0.000010,(sequence_6:0.030557,(sequence_5:0.248190,sequence_8:0.000011)N8:0.000009)N7:0.000010)N1:0.0; +(sequence_1:2e-06,((sequence_7:0.300658,sequence_17:0.300658,sequence_27:0.300658,sequence_37:0.300658,sequence_47:0.300658,sequence_10:2e-05)N3:1.1e-05,(sequence_9:0.000196,(sequence_3:8.38574,(sequence_2:0.0002,sequence_4:0.306587)N6:0.00017)N5:1.23278)N4:0.193335)N2:1e-05,(sequence_6:0.030557,(sequence_5:0.24819,sequence_8:1.1e-05)N8:9e-06)N7:1e-05)N1:0.0; diff --git a/python/gubbins/tests/test_dependencies.py b/python/gubbins/tests/test_dependencies.py index f1dc310a..f64e6afa 100644 --- a/python/gubbins/tests/test_dependencies.py +++ b/python/gubbins/tests/test_dependencies.py @@ -32,11 +32,13 @@ def test_pairwise(self): self.cleanup('pairwise') assert exit_code == 0 - def test_pairwise(self): + def test_pairwise_altered_rec_options(self): exit_code = 1 parser = run_gubbins.parse_input_args() common.parse_and_run(parser.parse_args(["--pairwise", "--threads", "1", + "--p-value", "0.01", + "--trimming-ratio", "2.0", os.path.join(data_dir, 'pairwise.aln')])) exit_code = self.check_for_output_files('pairwise') self.cleanup('pairwise') @@ -54,6 +56,26 @@ def test_fasttree(self): self.cleanup('multiple_recombinations') assert exit_code == 0 + # Test resuming a default analysis + def test_fasttree_resume(self): + exit_code = 1 + parser = run_gubbins.parse_input_args() + common.parse_and_run(parser.parse_args(["--tree-builder", "fasttree", + "--verbose", "--iterations", "3", + "--threads", "1", + "--no-cleanup", + "--prefix", "original", + os.path.join(data_dir, 'multiple_recombinations.aln')])) + common.parse_and_run(parser.parse_args(["--tree-builder", "fasttree", + "--verbose", "--iterations", "3", + "--resume", "multiple_recombinations.iteration_1.tre", + "--threads", "1", + os.path.join(data_dir, 'multiple_recombinations.aln')])) + exit_code = self.check_for_output_files('multiple_recombinations') + self.cleanup('original') + self.cleanup('multiple_recombinations') + assert exit_code == 0 + def test_iqtree(self): exit_code = 1 parser = run_gubbins.parse_input_args() @@ -65,6 +87,26 @@ def test_iqtree(self): self.cleanup('multiple_recombinations') assert exit_code == 0 + # Test resuming a default analysis + def test_iqtree_resume(self): + exit_code = 1 + parser = run_gubbins.parse_input_args() + common.parse_and_run(parser.parse_args(["--tree-builder", "iqtree", + "--verbose", "--iterations", "3", + "--threads", "1", + "--no-cleanup", + "--prefix", "original", + os.path.join(data_dir, 'multiple_recombinations.aln')])) + common.parse_and_run(parser.parse_args(["--tree-builder", "iqtree", + "--verbose", "--iterations", "3", + "--resume", "multiple_recombinations.iteration_1.tre", + "--threads", "1", + os.path.join(data_dir, 'multiple_recombinations.aln')])) + exit_code = self.check_for_output_files('multiple_recombinations') + self.cleanup('original') + self.cleanup('multiple_recombinations') + assert exit_code == 0 + def test_raxml(self): exit_code = 1 parser = run_gubbins.parse_input_args() @@ -76,6 +118,26 @@ def test_raxml(self): self.cleanup('multiple_recombinations') assert exit_code == 0 + # Test resuming a default analysis + def test_raxml_resume(self): + exit_code = 1 + parser = run_gubbins.parse_input_args() + common.parse_and_run(parser.parse_args(["--tree-builder", "raxml", + "--verbose", "--iterations", "3", + "--threads", "1", + "--no-cleanup", + "--prefix", "original", + os.path.join(data_dir, 'multiple_recombinations.aln')])) + common.parse_and_run(parser.parse_args(["--tree-builder", "raxml", + "--verbose", "--iterations", "3", + "--resume", "multiple_recombinations.iteration_1.tre", + "--threads", "1", + os.path.join(data_dir, 'multiple_recombinations.aln')])) + exit_code = self.check_for_output_files('multiple_recombinations') + self.cleanup('original') + self.cleanup('multiple_recombinations') + assert exit_code == 0 + def test_raxml_quiet(self): exit_code = 1 parser = run_gubbins.parse_input_args() @@ -99,6 +161,26 @@ def test_raxmlng(self): self.cleanup('multiple_recombinations') assert exit_code == 0 + # Test resuming a default analysis + def test_raxmlng_resume(self): + exit_code = 1 + parser = run_gubbins.parse_input_args() + common.parse_and_run(parser.parse_args(["--tree-builder", "raxmlng", + "--verbose", "--iterations", "3", + "--threads", "1", + "--no-cleanup", + "--prefix", "original", + os.path.join(data_dir, 'multiple_recombinations.aln')])) + common.parse_and_run(parser.parse_args(["--tree-builder", "raxmlng", + "--verbose", "--iterations", "3", + "--resume", "multiple_recombinations.iteration_1.tre", + "--threads", "1", + os.path.join(data_dir, 'multiple_recombinations.aln')])) + exit_code = self.check_for_output_files('multiple_recombinations') + self.cleanup('original') + self.cleanup('multiple_recombinations') + assert exit_code == 0 + def test_rapidnj(self): exit_code = 1 parser = run_gubbins.parse_input_args() @@ -114,6 +196,28 @@ def test_rapidnj(self): self.cleanup('multiple_recombinations') assert exit_code == 0 + # Test resuming a default analysis + def test_rapidnj_resume(self): + exit_code = 1 + parser = run_gubbins.parse_input_args() + common.parse_and_run(parser.parse_args(["--tree-builder", "rapidnj", + "--verbose", "--iterations", "3", + "--threads", "1", + "--model","JC", + "--no-cleanup", + "--prefix", "original", + os.path.join(data_dir, 'multiple_recombinations.aln')])) + common.parse_and_run(parser.parse_args(["--tree-builder", "rapidnj", + "--verbose", "--iterations", "3", + "--resume", "multiple_recombinations.iteration_1.tre", + "--model","JC", + "--threads", "1", + os.path.join(data_dir, 'multiple_recombinations.aln')])) + exit_code = self.check_for_output_files('multiple_recombinations') + self.cleanup('original') + self.cleanup('multiple_recombinations') + assert exit_code == 0 + def check_rapidnj_consistency(self): new_file = os.path.join(data_dir,'new_rapidnj_jc_output.recombination_predictions.embl') reference_file = os.path.join(data_dir,'ref_rapidnj_jc_output.recombination_predictions.embl') @@ -520,4 +624,4 @@ def cleanup(prefix): os.rmdir(dir) if __name__ == "__main__": - unittest.main(buffer=True) \ No newline at end of file + unittest.main(buffer=True) diff --git a/python/gubbins/tests/test_tree_methods.py b/python/gubbins/tests/test_tree_methods.py index ca63a6d8..17dcc1bb 100644 --- a/python/gubbins/tests/test_tree_methods.py +++ b/python/gubbins/tests/test_tree_methods.py @@ -102,11 +102,12 @@ def test_reroot_tree_with_outgroups_with_two_mixed_clades(self): self.reroot_tree_check(outgroups, expected_output_file, expected_monophyletic_outgroup) def reroot_tree_check(self, outgroups, expected_output_file, expected_monophyletic_outgroup): - shutil.copyfile(os.path.join(data_dir, 'outgroups_input.tre'), '.tmp.outgroups_input.tre') - assert expected_monophyletic_outgroup == common.get_monophyletic_outgroup('.tmp.outgroups_input.tre', outgroups) - common.reroot_tree_with_outgroup('.tmp.outgroups_input.tre', outgroups) - assert filecmp.cmp('.tmp.outgroups_input.tre', expected_output_file) - os.remove('.tmp.outgroups_input.tre') + tmp_tree_file_name = expected_output_file.replace('expected_','generated_') + shutil.copyfile(os.path.join(data_dir, 'outgroups_input.tre'), tmp_tree_file_name) + assert expected_monophyletic_outgroup == common.get_monophyletic_outgroup(tmp_tree_file_name, outgroups) + common.reroot_tree_with_outgroup(tmp_tree_file_name, outgroups) + assert filecmp.cmp(tmp_tree_file_name, expected_output_file) + os.remove(tmp_tree_file_name) def test_split_all_non_bi_nodes(self): # best way to access it is via reroot_tree_at_midpoint because it outputs to a file diff --git a/python/gubbins/tests/test_utils.py b/python/gubbins/tests/test_utils.py index 3997f501..ce2146a4 100644 --- a/python/gubbins/tests/test_utils.py +++ b/python/gubbins/tests/test_utils.py @@ -20,8 +20,8 @@ class TestUtilities(unittest.TestCase): def test_gubbins_command(self): - assert common.create_gubbins_command('AAA', 'BBB', 'CCC', 'DDD', 'EEE', 5, 10, 200) \ - == 'AAA -r -v CCC -a 10 -b 200 -f EEE -t DDD -m 5 BBB' + assert common.create_gubbins_command('AAA', 'BBB', 'CCC', 'DDD', 'EEE', 5, 10, 200, 0.05, 1.0, 0) \ + == 'AAA -r -v CCC -a 10 -b 200 -f EEE -t DDD -m 5 -p 0.05 -i 1.0 BBB' def test_translation_of_filenames_to_final_filenames(self): assert common.translation_of_filenames_to_final_filenames('AAA', 'test') == { diff --git a/python/scripts/extract_gubbins_clade.py b/python/scripts/extract_gubbins_clade.py index dd428693..90e019af 100755 --- a/python/scripts/extract_gubbins_clade.py +++ b/python/scripts/extract_gubbins_clade.py @@ -99,8 +99,26 @@ def get_options(): schema = 'newick', preserve_underscores = True) tree.retain_taxa_with_labels(subset) - tree.write_to_path(output_tree_name, - 'newick') + + # Output tree + clade_tree_string = tree.as_string( + schema='newick', + suppress_leaf_taxon_labels=False, + suppress_leaf_node_labels=True, + suppress_internal_taxon_labels=False, + suppress_internal_node_labels=False, + suppress_rooting=True, + suppress_edge_lengths=False, + unquoted_underscores=True, + preserve_spaces=False, + store_tree_weights=False, + suppress_annotations=True, + annotations_as_nhx=False, + suppress_item_comments=True, + node_label_element_separator=' ' + ) + with open(output_tree_name,'w') as tree_out: + tree_out.write(clade_tree_string.replace('\'', '') + '\n') # Identify relevant recombination blocks output_gff_name = args.out + '.gff' diff --git a/python/scripts/extract_gubbins_clade_statistics.py b/python/scripts/extract_gubbins_clade_statistics.py index 565943fa..00a08e80 100755 --- a/python/scripts/extract_gubbins_clade_statistics.py +++ b/python/scripts/extract_gubbins_clade_statistics.py @@ -199,7 +199,7 @@ def get_options(): node_label_element_separator=' ' ) with open(clade_name + '.tre','w') as tree_out: - tree_out.write(clade_tree_string + '\n') + tree_out.write(clade_tree_string.replace('\'', '') + '\n') clade_info = {label:0 for label in info_labels + tree_info_labels} for node in clade_tree.preorder_node_iter(): if node != clade_tree.seed_node: diff --git a/python/scripts/generate_ska_alignment.py b/python/scripts/generate_ska_alignment.py index 1545cdfe..73a6ab36 100755 --- a/python/scripts/generate_ska_alignment.py +++ b/python/scripts/generate_ska_alignment.py @@ -123,12 +123,12 @@ def ska_map_sequences(seq, k = None, ref = None): with open(args.fastq,'r') as fastq_list: for line in fastq_list.readlines(): info = line.strip().split() - if os.path.isfile(fns[1]) and os.path.isfile(fns[2]): + if os.path.isfile(info[1]) and os.path.isfile(info[2]): fastq_names.append((info[1],info[2])) seq_names[info[1]] = info[0] all_names.append(info[0]) else: - sys.stderr.write('Unable to find files ' + fns[1] + ' and ' + fns[2] + '\n') + sys.stderr.write('Unable to find files ' + info[1] + ' and ' + info[2] + '\n') # Sketch into split kmers with Pool(processes = args.threads) as pool: return_codes = pool.map(partial(map_fastq_sequence, diff --git a/python/scripts/mask_gubbins_aln.py b/python/scripts/mask_gubbins_aln.py index 25aed99e..93f9599e 100755 --- a/python/scripts/mask_gubbins_aln.py +++ b/python/scripts/mask_gubbins_aln.py @@ -29,6 +29,7 @@ from Bio import SeqIO from Bio.Align import MultipleSeqAlignment from Bio.Seq import Seq +from Bio.Seq import MutableSeq # command line parsing def get_options(): @@ -67,7 +68,7 @@ def get_options(): alignment = AlignIO.read(args.aln,'fasta') for taxon in alignment: overall_taxon_list.append(taxon.id) - taxon.seq = taxon.seq.tomutable() + taxon.seq = MutableSeq(taxon.seq) overall_taxon_set = set(overall_taxon_list) # Read recombinant regions from GFF diff --git a/src/Newickform.c b/src/Newickform.c index 2b705996..38a6a11c 100644 --- a/src/Newickform.c +++ b/src/Newickform.c @@ -28,7 +28,7 @@ #define STR_OUT "out" -newick_node* build_newick_tree(char * filename, FILE *vcf_file_pointer,int * snp_locations, int number_of_snps, char** column_names, int number_of_columns,int length_of_original_genome,int min_snps, int window_min, int window_max) +newick_node* build_newick_tree(char * filename, FILE *vcf_file_pointer,int * snp_locations, int number_of_snps, char** column_names, int number_of_columns,int length_of_original_genome,int min_snps, int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag) { int iLen, iMaxLen; char *pcTreeStr; @@ -98,7 +98,23 @@ newick_node* build_newick_tree(char * filename, FILE *vcf_file_pointer,int * snp char * root_sequence = NULL; carry_unambiguous_gaps_up_tree(root); - root_sequence = generate_branch_sequences(root, vcf_file_pointer, snp_locations, number_of_snps, column_names, number_of_columns,root_sequence, length_of_original_genome, block_file_pointer,gff_file_pointer,min_snps,branch_snps_file_pointer,window_min, window_max); + root_sequence = generate_branch_sequences(root, + vcf_file_pointer, + snp_locations, + number_of_snps, + column_names, + number_of_columns, + root_sequence, + length_of_original_genome, + block_file_pointer, + gff_file_pointer, + min_snps, + branch_snps_file_pointer, + window_min, + window_max, + uncorrected_p_value, + trimming_ratio, + extensive_search_flag); free(root_sequence); int * parent_recombinations = NULL; fill_in_recombinations_with_gaps(root, parent_recombinations, 0, 0,0,root->block_coordinates,length_of_original_genome,snp_locations,number_of_snps); diff --git a/src/Newickform.h b/src/Newickform.h index eadf3494..f5818c0a 100644 --- a/src/Newickform.h +++ b/src/Newickform.h @@ -51,12 +51,12 @@ typedef struct newick_node #ifdef __NEWICKFORM_C__ newick_node* parseTree(char *str); -newick_node* build_newick_tree(char * filename, FILE *vcf_file_pointer,int * snp_locations, int number_of_snps, char** column_names, int number_of_columns, int length_of_original_genome,int min_snps, int window_min, int window_max); +newick_node* build_newick_tree(char * filename, FILE *vcf_file_pointer,int * snp_locations, int number_of_snps, char** column_names, int number_of_columns, int length_of_original_genome,int min_snps, int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag); void print_tree(newick_node *root, FILE * outputfile); char* strip_quotes(char *taxon); #else extern newick_node* parseTree(char *str); -extern newick_node* build_newick_tree(char * filename, FILE *vcf_file_pointer,int * snp_locations, int number_of_snps, char** column_names, int number_of_columns, int length_of_original_genome,int min_snps, int window_min, int window_max); +extern newick_node* build_newick_tree(char * filename, FILE *vcf_file_pointer,int * snp_locations, int number_of_snps, char** column_names, int number_of_columns, int length_of_original_genome,int min_snps, int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag); extern void print_tree(newick_node *root, FILE * outputfile); extern char* strip_quotes(char *taxon); #endif diff --git a/src/alignment_file.c b/src/alignment_file.c index 0d4d0fcf..4b1ad840 100644 --- a/src/alignment_file.c +++ b/src/alignment_file.c @@ -40,10 +40,12 @@ int line_length(FILE * alignment_file_pointer) int length_of_line = 0; int total_length_of_line = 0; - while((pcRes = fgets(szBuffer, sizeof(szBuffer), alignment_file_pointer)) != NULL){ + while((pcRes = fgets(szBuffer, sizeof(szBuffer), alignment_file_pointer)) != NULL) + { length_of_line = (int)strlen(szBuffer) - 1; total_length_of_line = total_length_of_line + length_of_line; - if((szBuffer)[length_of_line] == '\n'){ + if((szBuffer)[length_of_line] == '\n') + { break; } } @@ -65,41 +67,42 @@ void advance_to_sequence_name(FILE * alignment_file_pointer) void get_bases_for_each_snp(char filename[], int snp_locations[], char ** bases_for_snps, int length_of_genome, int number_of_snps) { - int l; - int i = 0; - int sequence_number = 0; - - gzFile fp; - kseq_t *seq; + int l; + int i = 0; + int sequence_number = 0; - fp = gzopen(filename, "r"); - seq = kseq_init(fp); + gzFile fp; + kseq_t *seq; - - while ((l = kseq_read(seq)) >= 0) - { - - for(i = 0; i< number_of_snps; i++) - { - bases_for_snps[i][sequence_number] = toupper(((char *) seq->seq.s)[snp_locations[i]]); - // Present gaps and unknowns in the same way to Gubbins - if(bases_for_snps[i][sequence_number] == 'N') - { - bases_for_snps[i][sequence_number] = '-'; - } - } - sequence_number++; - } + fp = gzopen(filename, "r"); + seq = kseq_init(fp); - kseq_destroy(seq); - gzclose(fp); + + while ((l = kseq_read(seq)) >= 0) + { + + for(i = 0; i< number_of_snps; i++) + { + bases_for_snps[i][sequence_number] = toupper(((char *) seq->seq.s)[snp_locations[i]]); + // Present gaps and unknowns in the same way to Gubbins + if(bases_for_snps[i][sequence_number] == 'N') + { + bases_for_snps[i][sequence_number] = '-'; + } + } + sequence_number++; + } + + kseq_destroy(seq); + gzclose(fp); } int genome_length(char filename[]) { - if( access( filename, F_OK ) == -1 ) { + if( access( filename, F_OK ) == -1 ) + { printf("Cannot calculate genome_length because file '%s' does not exist\n",filename); exit(0); } @@ -121,21 +124,21 @@ int genome_length(char filename[]) int number_of_sequences_in_file(char filename[]) { - int number_of_sequences = 0; - int l; + int number_of_sequences = 0; + int l; - gzFile fp; - kseq_t *seq; - - fp = gzopen(filename, "r"); - seq = kseq_init(fp); - - while ((l = kseq_read(seq)) >= 0) { + gzFile fp; + kseq_t *seq; + + fp = gzopen(filename, "r"); + seq = kseq_init(fp); + + while ((l = kseq_read(seq)) >= 0) { number_of_sequences++; - } - kseq_destroy(seq); - gzclose(fp); - return number_of_sequences; + } + kseq_destroy(seq); + gzclose(fp); + return number_of_sequences; } @@ -148,7 +151,7 @@ int build_reference_sequence(char reference_sequence[], char filename[]) fp = gzopen(filename, "r"); seq = kseq_init(fp); - kseq_read(seq); + kseq_read(seq); for(i = 0; i < seq->seq.l; i++) { @@ -170,59 +173,59 @@ int build_reference_sequence(char reference_sequence[], char filename[]) int detect_snps(char reference_sequence[], char filename[], int length_of_genome, int exclude_gaps) { - int i; - int number_of_snps = 0; - int l; - - gzFile fp; - kseq_t *seq; - - fp = gzopen(filename, "r"); - seq = kseq_init(fp); - // First sequence is the reference sequence so skip it - kseq_read(seq); - - while ((l = kseq_read(seq)) >= 0) { - for(i = 0; i < length_of_genome; i++) + int i; + int number_of_snps = 0; + int l; + + gzFile fp; + kseq_t *seq; + + fp = gzopen(filename, "r"); + seq = kseq_init(fp); + // First sequence is the reference sequence so skip it + kseq_read(seq); + + while ((l = kseq_read(seq)) >= 0) { - - if(exclude_gaps) - { - // If there is an indel in the reference sequence, replace with the first proper base you find - if((reference_sequence[i] == '-' && seq->seq.s[i] != '-' ) || (toupper(reference_sequence[i]) == 'N' && seq->seq.s[i] != 'N' )) + for(i = 0; i < length_of_genome; i++) { - reference_sequence[i] = toupper(seq->seq.s[i]); - } - - if(reference_sequence[i] != '*' && seq->seq.s[i] != '-' && toupper(seq->seq.s[i]) != 'N' && reference_sequence[i] != toupper(seq->seq.s[i])) - { - reference_sequence[i] = '*'; - number_of_snps++; - } - } - else - { - - char input_base = toupper(seq->seq.s[i]); - if(input_base == 'N') - { - input_base = '-'; - } - - if(reference_sequence[i] != '*' && reference_sequence[i] != input_base) - { - reference_sequence[i] = '*'; - number_of_snps++; + + if(exclude_gaps) + { + // If there is an indel in the reference sequence, replace with the first proper base you find + if((reference_sequence[i] == '-' && seq->seq.s[i] != '-' ) || (toupper(reference_sequence[i]) == 'N' && seq->seq.s[i] != 'N' )) + { + reference_sequence[i] = toupper(seq->seq.s[i]); + } + + if(reference_sequence[i] != '*' && seq->seq.s[i] != '-' && toupper(seq->seq.s[i]) != 'N' && reference_sequence[i] != toupper(seq->seq.s[i])) + { + reference_sequence[i] = '*'; + number_of_snps++; + } + } + else + { + + char input_base = toupper(seq->seq.s[i]); + if(input_base == 'N') + { + input_base = '-'; + } + + if(reference_sequence[i] != '*' && reference_sequence[i] != input_base) + { + reference_sequence[i] = '*'; + number_of_snps++; + } + } } - } } - - } - kseq_destroy(seq); - gzclose(fp); + kseq_destroy(seq); + gzclose(fp); - return number_of_snps; + return number_of_snps; } @@ -230,48 +233,50 @@ int detect_snps(char reference_sequence[], char filename[], int length_of_genome char * read_line(char sequence[], FILE * pFilePtr) { - char *pcRes = NULL; - long lineLength = 0; - char current_line_buffer[MAX_READ_BUFFER] = {0}; - - - while((pcRes = fgets(current_line_buffer, sizeof(current_line_buffer), pFilePtr)) != NULL){ - if(size_of_string(sequence) > 0) - { - sequence = realloc(sequence, sizeof(char)*(size_of_string(sequence) + size_of_string(current_line_buffer) + 2) ); + char *pcRes = NULL; + long lineLength = 0; + char current_line_buffer[MAX_READ_BUFFER] = {0}; + + while((pcRes = fgets(current_line_buffer, sizeof(current_line_buffer), pFilePtr)) != NULL) + { + if(size_of_string(sequence) > 0) + { + sequence = realloc(sequence, sizeof(char)*(size_of_string(sequence) + size_of_string(current_line_buffer) + 2) ); } - concat_strings_created_with_malloc(sequence,current_line_buffer); - current_line_buffer[0] = '\0'; + concat_strings_created_with_malloc(sequence,current_line_buffer); + current_line_buffer[0] = '\0'; lineLength = size_of_string(sequence); //if end of line character is found then exit from loop - - if((sequence)[lineLength] == '\n' || (sequence)[lineLength] == '\0'){ + + if((sequence)[lineLength] == '\n' || (sequence)[lineLength] == '\0') + { break; } } - - + + return sequence; } void get_sample_names_for_header(char filename[], char ** sequence_names, int number_of_samples) { - int l; - int i = 0; - - gzFile fp; - kseq_t *seq; - - fp = gzopen(filename, "r"); - seq = kseq_init(fp); - - while ((l = kseq_read(seq)) >= 0) { - memcpy(sequence_names[i], seq->name.s, size_of_string(seq->name.s)+1); - i++; - } - kseq_destroy(seq); - gzclose(fp); + int l; + int i = 0; + + gzFile fp; + kseq_t *seq; + + fp = gzopen(filename, "r"); + seq = kseq_init(fp); + + while ((l = kseq_read(seq)) >= 0) + { + memcpy(sequence_names[i], seq->name.s, size_of_string(seq->name.s)+1); + i++; + } + kseq_destroy(seq); + gzclose(fp); } @@ -288,10 +293,12 @@ char filter_invalid_characters(char input_char) /* Execute regular expression */ reti = regexec(®ex, input_chars, 0, NULL, 0); - if( !reti ){ + if( !reti ) + { return input_char; } - else if( reti == REG_NOMATCH ){ + else if( reti == REG_NOMATCH ) + { return '\0'; } return '\0'; diff --git a/src/block_tab_file.c b/src/block_tab_file.c index 4fadbf39..4eadfb51 100644 --- a/src/block_tab_file.c +++ b/src/block_tab_file.c @@ -24,35 +24,35 @@ void print_block_details(FILE * block_file_pointer, int start_coordinate, int end_coordinate, int number_of_snps, char * current_node_id, char * parent_node_id, char * taxon_names, int number_of_child_nodes, double neg_log_likelihood) { - fprintf(block_file_pointer, "FT misc_feature %d..%d\n", start_coordinate, end_coordinate); - fprintf(block_file_pointer, "FT /node=\"%s->%s\"\n",parent_node_id,current_node_id); - fprintf(block_file_pointer, "FT /neg_log_likelihood=\"%f\"\n",neg_log_likelihood); + fprintf(block_file_pointer, "FT misc_feature %d..%d\n", start_coordinate, end_coordinate); + fprintf(block_file_pointer, "FT /node=\"%s->%s\"\n",parent_node_id,current_node_id); + fprintf(block_file_pointer, "FT /neg_log_likelihood=\"%f\"\n",neg_log_likelihood); - if(number_of_child_nodes > 0) - { - fprintf(block_file_pointer, "FT /colour=\"2\"\n"); - } - else - { - fprintf(block_file_pointer, "FT /colour=\"4\"\n"); - } - fprintf(block_file_pointer, "FT /taxa=\"%s\"\n",taxon_names); - fprintf(block_file_pointer, "FT /SNP_count=\"%d\"\n",number_of_snps); - fflush(block_file_pointer); + if(number_of_child_nodes > 0) + { + fprintf(block_file_pointer, "FT /colour=\"2\"\n"); + } + else + { + fprintf(block_file_pointer, "FT /colour=\"4\"\n"); + } + fprintf(block_file_pointer, "FT /taxa=\"%s\"\n",taxon_names); + fprintf(block_file_pointer, "FT /SNP_count=\"%d\"\n",number_of_snps); + fflush(block_file_pointer); } void print_branch_snp_details(FILE * branch_snps_file_pointer, char * current_node_id, char * parent_node_id, int * branches_snp_sites, int number_of_branch_snps, char * branch_snp_sequence, char * branch_snp_ancestor_sequence,char * taxon_names) { - int i = 0; - for(i=0; i< number_of_branch_snps; i++) - { - fprintf(branch_snps_file_pointer, "FT variation %d\n", branches_snp_sites[i]); - fprintf(branch_snps_file_pointer, "FT /node=\"%s->%s\"\n",parent_node_id,current_node_id); - fprintf(branch_snps_file_pointer, "FT /colour=\"4\"\n"); - fprintf(branch_snps_file_pointer, "FT /taxa=\"%s\"\n",taxon_names); - fprintf(branch_snps_file_pointer, "FT /parent_base=\"%c\"\n",branch_snp_ancestor_sequence[i]); - fprintf(branch_snps_file_pointer, "FT /replace=\"%c\"\n",branch_snp_sequence[i]); - fflush(branch_snps_file_pointer); - } + int i = 0; + for(i=0; i< number_of_branch_snps; i++) + { + fprintf(branch_snps_file_pointer, "FT variation %d\n", branches_snp_sites[i]); + fprintf(branch_snps_file_pointer, "FT /node=\"%s->%s\"\n",parent_node_id,current_node_id); + fprintf(branch_snps_file_pointer, "FT /colour=\"4\"\n"); + fprintf(branch_snps_file_pointer, "FT /taxa=\"%s\"\n",taxon_names); + fprintf(branch_snps_file_pointer, "FT /parent_base=\"%c\"\n",branch_snp_ancestor_sequence[i]); + fprintf(branch_snps_file_pointer, "FT /replace=\"%c\"\n",branch_snp_sequence[i]); + fflush(branch_snps_file_pointer); + } } diff --git a/src/branch_sequences.c b/src/branch_sequences.c index 38524838..22e441bb 100644 --- a/src/branch_sequences.c +++ b/src/branch_sequences.c @@ -116,7 +116,11 @@ void fill_in_recombinations_with_gaps(newick_node *root, int * parent_recombinat char * child_sequence = (char *) calloc((length_of_original_genome +1),sizeof(char)); current_recombinations = (int *) calloc((root->num_recombinations+1+parent_num_recombinations),sizeof(int)); - num_current_recombinations = copy_and_concat_integer_arrays(root->recombinations, root->num_recombinations,parent_recombinations, parent_num_recombinations, current_recombinations); + num_current_recombinations = copy_and_concat_integer_arrays(root->recombinations, + root->num_recombinations, + parent_recombinations, + parent_num_recombinations, + current_recombinations); // overwrite the bases of snps with N's int i; @@ -127,18 +131,40 @@ void fill_in_recombinations_with_gaps(newick_node *root, int * parent_recombinat set_number_of_snps_for_sample(root->taxon,root->number_of_snps); get_sequence_for_sample_name(child_sequence, root->taxon); - int genome_length_excluding_blocks_and_gaps = calculate_genome_length_excluding_blocks_and_gaps(child_sequence, length_of_original_genome, current_block_coordinates, num_blocks); + int genome_length_excluding_blocks_and_gaps = calculate_genome_length_excluding_blocks_and_gaps(child_sequence, + length_of_original_genome, + current_block_coordinates, + num_blocks); - set_genome_length_excluding_blocks_and_gaps_for_sample(root->taxon,genome_length_excluding_blocks_and_gaps); + set_genome_length_excluding_blocks_and_gaps_for_sample(root->taxon, + genome_length_excluding_blocks_and_gaps); int ** merged_block_coordinates; merged_block_coordinates = (int **) calloc(3,sizeof(int *)); merged_block_coordinates[0] = (int*) calloc((num_blocks + root->number_of_blocks+1),sizeof(int )); merged_block_coordinates[1] = (int*) calloc((num_blocks + root->number_of_blocks+1),sizeof(int )); - copy_and_concat_2d_integer_arrays(current_block_coordinates,num_blocks,root->block_coordinates, root->number_of_blocks,merged_block_coordinates ); + copy_and_concat_2d_integer_arrays(current_block_coordinates, + num_blocks, + root->block_coordinates, + root->number_of_blocks, + merged_block_coordinates + ); - set_number_of_blocks_for_sample(root->taxon, root->number_of_blocks ); - set_number_of_bases_in_recombinations(root->taxon, calculate_number_of_bases_in_recombations_excluding_gaps(merged_block_coordinates, (num_blocks + root->number_of_blocks), child_sequence, snp_locations,current_total_snps)); + set_number_of_blocks_for_sample(root->taxon, root->number_of_blocks); + set_number_of_branch_bases_in_recombinations(root->taxon, + calculate_number_of_bases_in_recombations_excluding_gaps(merged_block_coordinates, + root->number_of_blocks, + child_sequence, + snp_locations, + current_total_snps) + ); + set_number_of_bases_in_recombinations(root->taxon, + calculate_number_of_bases_in_recombations_excluding_gaps(merged_block_coordinates, + (num_blocks + root->number_of_blocks), + child_sequence, + snp_locations, + current_total_snps) + ); free(child_sequence); for(i = 0; i < num_current_recombinations; i++) @@ -146,10 +172,13 @@ void fill_in_recombinations_with_gaps(newick_node *root, int * parent_recombinat update_sequence_base('N', sequence_index, current_recombinations[i]); } - // TODO: The stats for the number of snps in recombinations will need to be updated. int * snps_in_recombinations = (int *) calloc((number_of_snps +1),sizeof(int)); - int num_snps_in_recombinations = get_list_of_snp_indices_which_fall_in_downstream_recombinations(merged_block_coordinates, (num_blocks + root->number_of_blocks),snp_locations, number_of_snps, snps_in_recombinations); + int num_snps_in_recombinations = get_list_of_snp_indices_which_fall_in_downstream_recombinations(merged_block_coordinates, + (num_blocks + root->number_of_blocks), + snp_locations, + number_of_snps, + snps_in_recombinations); for(i = 0; i < num_snps_in_recombinations; i++) { update_sequence_base('N', sequence_index, snps_in_recombinations[i]); @@ -163,14 +192,23 @@ void fill_in_recombinations_with_gaps(newick_node *root, int * parent_recombinat while (child != NULL) { - fill_in_recombinations_with_gaps(child->node, current_recombinations, num_current_recombinations,(current_total_snps + root->number_of_snps),(num_blocks + root->number_of_blocks),merged_block_coordinates,length_of_original_genome, snp_locations, number_of_snps ); + fill_in_recombinations_with_gaps(child->node, + current_recombinations, + num_current_recombinations, + (current_total_snps + root->number_of_snps), + (num_blocks + root->number_of_blocks), + merged_block_coordinates, + length_of_original_genome, + snp_locations, + number_of_snps + ); child = child->next; } } else { - set_internal_node(0,sequence_index); + set_internal_node(0,sequence_index); } free(current_recombinations); free(merged_block_coordinates[0]); @@ -233,7 +271,11 @@ int calculate_number_of_bases_in_recombations_excluding_gaps(int ** block_coordi } - total_bases += calculate_block_size_without_gaps(child_sequence, snp_locations, block_coordinates[0][start_block], block_coordinates[1][start_block], length_of_original_genome); + total_bases += calculate_block_size_without_gaps(child_sequence, + snp_locations, + block_coordinates[0][start_block], + block_coordinates[1][start_block], + length_of_original_genome); } return total_bases; @@ -263,14 +305,14 @@ void carry_unambiguous_gaps_up_tree(newick_node *root) } } -char *generate_branch_sequences(newick_node *root, FILE *vcf_file_pointer,int * snp_locations, int number_of_snps, char** column_names, int number_of_columns, char * leaf_sequence, int length_of_original_genome, FILE * block_file_pointer, FILE * gff_file_pointer,int min_snps,FILE * branch_snps_file_pointer, int window_min, int window_max) +char *generate_branch_sequences(newick_node *root, FILE *vcf_file_pointer,int * snp_locations, int number_of_snps, char** column_names, int number_of_columns, char * leaf_sequence, int length_of_original_genome, FILE * block_file_pointer, FILE * gff_file_pointer,int min_snps,FILE * branch_snps_file_pointer, int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag) { newick_child *child; int child_counter = 0; int current_branch =0; int branch_genome_size = 0; int number_of_branch_snps=0; - root->current_node_id = ++node_counter; + root->current_node_id = ++node_counter; if (root->childNum == 0) { @@ -298,7 +340,23 @@ char *generate_branch_sequences(newick_node *root, FILE *vcf_file_pointer,int * while (child != NULL) { // recursion - child_sequences[child_counter] = generate_branch_sequences(child->node, vcf_file_pointer, snp_locations, number_of_snps, column_names, number_of_columns, child_sequences[child_counter],length_of_original_genome, block_file_pointer,gff_file_pointer,min_snps,branch_snps_file_pointer, window_min, window_max); + child_sequences[child_counter] = generate_branch_sequences(child->node, + vcf_file_pointer, + snp_locations, + number_of_snps, + column_names, + number_of_columns, + child_sequences[child_counter], + length_of_original_genome, + block_file_pointer, + gff_file_pointer, + min_snps, + branch_snps_file_pointer, + window_min, + window_max, + uncorrected_p_value, + trimming_ratio, + extensive_search_flag); child_nodes[child_counter] = child->node; char delimiter_string[3] = {" "}; @@ -336,7 +394,25 @@ char *generate_branch_sequences(newick_node *root, FILE *vcf_file_pointer,int * child_nodes[current_branch]->number_of_snps = number_of_branch_snps; print_branch_snp_details(branch_snps_file_pointer, child_nodes[current_branch]->taxon,root->taxon, branches_snp_sites, number_of_branch_snps, branch_snp_sequence, branch_snp_ancestor_sequence,child_nodes[current_branch]->taxon_names); - get_likelihood_for_windows(child_sequences[current_branch], number_of_snps, branches_snp_sites, branch_genome_size, number_of_branch_snps,snp_locations, child_nodes[current_branch], block_file_pointer, root, branch_snp_sequence,gff_file_pointer,min_snps,length_of_original_genome,leaf_sequence, window_min, window_max); + get_likelihood_for_windows(child_sequences[current_branch], + number_of_snps, + branches_snp_sites, + branch_genome_size, + number_of_branch_snps, + snp_locations, + child_nodes[current_branch], + block_file_pointer, + root, + branch_snp_sequence, + gff_file_pointer, + min_snps, + length_of_original_genome, + leaf_sequence, + window_min, + window_max, + uncorrected_p_value, + trimming_ratio, + extensive_search_flag); free(branch_snp_sequence); free(branch_snp_ancestor_sequence); free(child_sequences[current_branch]); @@ -353,7 +429,7 @@ char *generate_branch_sequences(newick_node *root, FILE *vcf_file_pointer,int * // calculate window size // starting at coord of first snp, count number of snps which fall into window // if region is blank, move on -int calculate_window_size(int branch_genome_size, int number_of_branch_snps,int window_min, int window_max) +int calculate_window_size(int branch_genome_size, int number_of_branch_snps,int window_min, int window_max, int min_snps, int window_factor) { int window_size = 0; if(number_of_branch_snps == 0) @@ -361,7 +437,7 @@ int calculate_window_size(int branch_genome_size, int number_of_branch_snps,int return window_min; } - window_size = (int) ((branch_genome_size*1.0)/(number_of_branch_snps*1.0/WINDOW_SNP_MODE_TARGET)); + window_size = (int) ((branch_genome_size*1.0)/(window_factor*number_of_branch_snps*1.0/(min_snps - 1))); if(window_size < window_min) { @@ -376,153 +452,242 @@ int calculate_window_size(int branch_genome_size, int number_of_branch_snps,int } -void get_likelihood_for_windows(char * child_sequence, int length_of_sequence, int * snp_site_coords, int branch_genome_size, int number_of_branch_snps, int * snp_locations, newick_node * current_node, FILE * block_file_pointer, newick_node *root, char * branch_snp_sequence, FILE * gff_file_pointer, int min_snps, int length_of_original_genome, char * original_sequence,int window_min, int window_max) +void get_likelihood_for_windows(char * child_sequence, int length_of_sequence, int * snp_site_coords, int branch_genome_size, int number_of_branch_snps, int * snp_locations, newick_node * current_node, FILE * block_file_pointer, newick_node *root, char * branch_snp_sequence, FILE * gff_file_pointer, int min_snps, int length_of_original_genome, char * original_sequence,int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag) { - int i = 0; - int window_size = 0; - int number_of_snps_in_block = 0; - int block_genome_size_without_gaps = 0; - double branch_snp_density = 0.0; - double block_snp_density = 0.0; - int number_of_blocks = 0 ; - int original_branch_genome_size = branch_genome_size; - - // place to store coordinates of recombinations snps - current_node->recombinations = (int *) calloc((number_of_branch_snps+1),sizeof(int)); - - int number_of_windows = (branch_genome_size/window_min) + 1; - int * block_coordinates[4]; - block_coordinates[0] = (int *) calloc((number_of_windows+1),sizeof(int)); - block_coordinates[1] = (int *) calloc((number_of_windows+1),sizeof(int)); - block_coordinates[2] = (int *) calloc((number_of_windows+1),sizeof(int)); - block_coordinates[3] = (int *) calloc((number_of_windows+1),sizeof(int)); - - double * block_likelihoods; - block_likelihoods = (double *) calloc((number_of_windows+1),sizeof(double)); - - while(number_of_branch_snps > min_snps) - { - if(number_of_branch_snps <= min_snps) - { - free(block_coordinates[0]) ; - free(block_coordinates[1]) ; - free(block_coordinates[2]) ; - free(block_coordinates[3]) ; - free(block_likelihoods); - return; - } - branch_snp_density = snp_density(branch_genome_size, number_of_branch_snps); - - window_size = calculate_window_size(branch_genome_size, number_of_branch_snps,window_min,window_max); - - int cutoff = calculate_cutoff(branch_genome_size, window_size, number_of_branch_snps); - - number_of_blocks = get_blocks(block_coordinates, length_of_original_genome, snp_site_coords, number_of_branch_snps, window_size, cutoff, child_sequence,snp_locations,length_of_sequence); - + + // define variables + int i = 0; + int window_size = window_max; + int number_of_snps_in_block = 0; + int block_genome_size_without_gaps = 0; + double branch_snp_density = 0.0; + double block_snp_density = 0.0; + int number_of_blocks = 0 ; + int original_branch_genome_size = branch_genome_size; + + // place to store coordinates of recombinations snps + current_node->recombinations = (int *) calloc((number_of_branch_snps+1),sizeof(int)); + + // place to store candidate recombination information + int number_of_windows = (branch_genome_size/window_min) + 1; + int * block_coordinates[4]; + block_coordinates[0] = (int *) calloc((number_of_windows+1),sizeof(int)); + block_coordinates[1] = (int *) calloc((number_of_windows+1),sizeof(int)); + block_coordinates[2] = (int *) calloc((number_of_windows+1),sizeof(int)); + block_coordinates[3] = (int *) calloc((number_of_windows+1),sizeof(int)); + + // place to store candidate block likelihoods + double * block_likelihoods; + block_likelihoods = (double *) calloc((number_of_windows+1),sizeof(double)); + + // iterate over SNPs + // Keep searching while there is the possibility of detecting a small recombination containing + // the minimum number of SNPs + int window_factor = 1; + int cutoff = min_snps - 1; + int previous_cutoff = number_of_branch_snps + 1; + while(number_of_branch_snps >= min_snps && window_size > window_min) + { + + // return SNP density as double + branch_snp_density = snp_density(branch_genome_size, number_of_branch_snps); + + // return sensible window size + window_size = calculate_window_size(branch_genome_size, + number_of_branch_snps, + window_min, + window_max, + min_snps, + window_factor); + + // return a cutoff number of SNPs in a window + // for extensive search, this is every window containing min SNPs count + // otherwise focus only on windows likely to exceed the statistical threshold + // for detecting recombination (faster) + if (extensive_search_flag == 0) + { + cutoff = calculate_cutoff(branch_genome_size, + window_size, + number_of_branch_snps, + min_snps, + uncorrected_p_value); + } + + // If returned cutoff == 0, then exit the program and error + if (cutoff == 0) + { + fprintf(stderr, + "Cannot identify recombinations on at least one branch with window size %d; try increasing this value\n", + window_size); + exit(EXIT_FAILURE); + } - for(i = 0; i < number_of_blocks; i++) - { - number_of_snps_in_block = find_number_of_snps_in_block(block_coordinates[0][i], block_coordinates[1][i], snp_site_coords, branch_snp_sequence, number_of_branch_snps); - block_genome_size_without_gaps = calculate_block_size_without_gaps(child_sequence, snp_locations, block_coordinates[0][i], block_coordinates[1][i], length_of_sequence); + // Test if reduction in window size has reduced the cutoff + int number_of_candidate_blocks = 0; + if (cutoff < previous_cutoff) + { + // populate block coordinate data structure by identifying windows containing + // a greater number of SNPs than the threshold and trimming them based on SNP + // positions + number_of_blocks = get_blocks(block_coordinates, + length_of_original_genome, + snp_site_coords, + number_of_branch_snps, + window_size, + cutoff, + child_sequence, + snp_locations, + length_of_sequence); + + // iterate over blocks + for(i = 0; i < number_of_blocks; i++) + { + // get number of SNPs in block + number_of_snps_in_block = find_number_of_snps_in_block(block_coordinates[0][i], + block_coordinates[1][i], + snp_site_coords, + branch_snp_sequence, + number_of_branch_snps); + + // get number of bases in block + block_genome_size_without_gaps = calculate_block_size_without_gaps(child_sequence, + snp_locations, + block_coordinates[0][i], + block_coordinates[1][i], + length_of_sequence); + + // minimum number of snps to be statistically significant in block + if(number_of_snps_in_block < min_snps) + { + block_coordinates[0][i] = -1; + block_coordinates[1][i] = -1; + continue; + } - // minimum number of snps to be statistically significant in block - if(number_of_snps_in_block <= min_snps) - { - block_coordinates[0][i] = -1; - block_coordinates[1][i] = -1; - continue; - } - - block_snp_density = snp_density(block_genome_size_without_gaps, number_of_snps_in_block); - // region with low number of snps so skip over - if(block_snp_density <= branch_snp_density) - { - block_coordinates[0][i] = -1; - block_coordinates[1][i] = -1; - continue; - } - - block_likelihoods[i] = get_block_likelihood(branch_genome_size, number_of_branch_snps, block_genome_size_without_gaps, number_of_snps_in_block); - block_coordinates[2][i] = (int) block_likelihoods[i]; - block_coordinates[3][i] = block_genome_size_without_gaps; - } + // calculate SNP density of block + block_snp_density = snp_density(block_genome_size_without_gaps, number_of_snps_in_block); + + // region with low number of snps so skip over + if(block_snp_density <= branch_snp_density) + { + block_coordinates[0][i] = -1; + block_coordinates[1][i] = -1; + continue; + } - move_blocks_inwards_while_likelihood_improves(number_of_blocks,block_coordinates, min_snps, snp_site_coords, number_of_branch_snps, branch_snp_sequence, snp_locations, branch_genome_size, child_sequence, length_of_sequence,block_likelihoods,cutoff); + // calculate block log likelihood under null model + block_likelihoods[i] = get_block_likelihood(branch_genome_size, + number_of_branch_snps, + block_genome_size_without_gaps, + number_of_snps_in_block); + block_coordinates[2][i] = (int) block_likelihoods[i]; // casts double log likelihood to int + block_coordinates[3][i] = block_genome_size_without_gaps; + + } - int * candidate_blocks[4]; - candidate_blocks[0] = (int *) calloc((number_of_blocks+1),sizeof(int)); - candidate_blocks[1] = (int *) calloc((number_of_blocks+1),sizeof(int)); - candidate_blocks[2] = (int *) calloc((number_of_blocks+1),sizeof(int)); - candidate_blocks[3] = (int *) calloc((number_of_blocks+1),sizeof(int)); - - double * candidate_block_likelihoods; - candidate_block_likelihoods = (double *) calloc((number_of_blocks+1),sizeof(double)); - - int number_of_candidate_blocks = 0; + // trim the edges of candidate recombinations + move_blocks_inwards_while_likelihood_improves(number_of_blocks, + block_coordinates, + min_snps, + snp_site_coords, + number_of_branch_snps, + branch_snp_sequence, + snp_locations, + branch_genome_size, + child_sequence, + length_of_sequence, + block_likelihoods, + cutoff, + trimming_ratio); + + int * candidate_blocks[4]; + double * candidate_block_likelihoods; + + candidate_blocks[0] = (int *) calloc((number_of_blocks+1),sizeof(int)); + candidate_blocks[1] = (int *) calloc((number_of_blocks+1),sizeof(int)); + candidate_blocks[2] = (int *) calloc((number_of_blocks+1),sizeof(int)); + candidate_blocks[3] = (int *) calloc((number_of_blocks+1),sizeof(int)); + + candidate_block_likelihoods = (double *) calloc((number_of_blocks+1),sizeof(double)); + + for(i = 0 ; i < number_of_blocks; i++) + { + if(block_coordinates[0][i] == -1 || block_coordinates[1][i] == -1) + { + continue; + } + int current_start = block_coordinates[0][i]; + int current_end = block_coordinates[1][i]; + int block_snp_count = find_number_of_snps_in_block(current_start, + current_end, + snp_site_coords, + branch_snp_sequence, + number_of_branch_snps); + int block_genome_size_without_gaps = block_coordinates[3][i]; + + if(p_value_test(branch_genome_size, block_genome_size_without_gaps, number_of_branch_snps, block_snp_count, min_snps, uncorrected_p_value) == 1) + { + + candidate_blocks[0][number_of_candidate_blocks] = block_coordinates[0][i]; + candidate_blocks[1][number_of_candidate_blocks] = block_coordinates[1][i]; + // TODO use a float in a struct here, should be okay for the moment but assumes that there will be a clear integer difference between best and second best + candidate_blocks[2][number_of_candidate_blocks] = block_coordinates[2][i]; + candidate_blocks[3][number_of_candidate_blocks] = block_genome_size_without_gaps; + + candidate_block_likelihoods[number_of_candidate_blocks] = block_likelihoods[i]; + number_of_candidate_blocks++; + } + } + + if(number_of_candidate_blocks > 0) + { + // remove recombination with smallest log likelihood and + // correspondingly reduce the number of branch SNPs + number_of_branch_snps = flag_smallest_log_likelihood_recombinations(candidate_blocks, + number_of_candidate_blocks, + number_of_branch_snps, + snp_site_coords, + current_node->recombinations, + current_node->num_recombinations, + current_node, + block_file_pointer, + root, + snp_locations, + length_of_sequence, + gff_file_pointer, + candidate_block_likelihoods); + + branch_genome_size = original_branch_genome_size - current_node->total_bases_removed_excluding_gaps; - for(i = 0 ; i < number_of_blocks; i++) - { - if(block_coordinates[0][i] == -1 || block_coordinates[1][i] == -1) - { - continue; - } - int current_start = block_coordinates[0][i]; - int current_end = block_coordinates[1][i]; - int block_snp_count = find_number_of_snps_in_block(current_start, current_end, snp_site_coords, branch_snp_sequence, number_of_branch_snps); - int block_genome_size_without_gaps = block_coordinates[3][i]; + } + + free(candidate_blocks[0]); + free(candidate_blocks[1]); + free(candidate_blocks[2]); + free(candidate_blocks[3]); + free(candidate_block_likelihoods); - if(p_value_test(branch_genome_size, block_genome_size_without_gaps, number_of_branch_snps, block_snp_count, min_snps) == 1) - { - - candidate_blocks[0][number_of_candidate_blocks] = block_coordinates[0][i]; - candidate_blocks[1][number_of_candidate_blocks] = block_coordinates[1][i]; - // TODO use a float in a struct here, should be okay for the moment but assumes that there will be a clear integer difference between best and second best - candidate_blocks[2][number_of_candidate_blocks] = block_coordinates[2][i]; - candidate_blocks[3][number_of_candidate_blocks] = block_genome_size_without_gaps; - - candidate_block_likelihoods[number_of_candidate_blocks] = block_likelihoods[i]; - number_of_candidate_blocks++; - } - } - if(number_of_candidate_blocks == 0 ) - { - free(block_coordinates[0]) ; - free(block_coordinates[1]) ; - free(block_coordinates[2]) ; - free(block_coordinates[3]) ; - free(block_likelihoods); - free(candidate_blocks[0]); - free(candidate_blocks[1]); - free(candidate_blocks[2]); - free(candidate_blocks[3]); - free(candidate_block_likelihoods); - - int new_recombination_size = (current_node->num_recombinations+1)*sizeof(int); - if(new_recombination_size > 1024) - { - current_node->recombinations = (int *) realloc(current_node->recombinations, new_recombination_size); - } - return; - } - number_of_branch_snps = flag_smallest_log_likelihood_recombinations(candidate_blocks, number_of_candidate_blocks, number_of_branch_snps, snp_site_coords, current_node->recombinations, current_node->num_recombinations,current_node, block_file_pointer, root, snp_locations, length_of_sequence,gff_file_pointer,candidate_block_likelihoods ); - branch_genome_size = original_branch_genome_size - current_node->total_bases_removed_excluding_gaps; - free(candidate_blocks[0]); - free(candidate_blocks[1]); - free(candidate_blocks[2]); - free(candidate_blocks[3]); - free(candidate_block_likelihoods); - - } - free(block_coordinates[0]) ; - free(block_coordinates[1]) ; - free(block_coordinates[2]) ; - free(block_coordinates[3]) ; - free(block_likelihoods); - int new_recombination_size = (current_node->num_recombinations+1)*sizeof(int); - if(new_recombination_size > 1024) - { - current_node->recombinations = (int *) realloc(current_node->recombinations, new_recombination_size); - } + } + + if(number_of_candidate_blocks == 0) + { + window_factor = window_factor * 2; + previous_cutoff = cutoff; + } + + } + + free(block_coordinates[0]) ; + free(block_coordinates[1]) ; + free(block_coordinates[2]) ; + free(block_coordinates[3]) ; + free(block_likelihoods); + + int new_recombination_size = (current_node->num_recombinations+1)*sizeof(int); + if(new_recombination_size > 1024) + { + current_node->recombinations = (int *) realloc(current_node->recombinations, new_recombination_size); + } } int extend_upper_part_of_window(int starting_coord, int initial_max_coord, int genome_size, int8_t * gaps_in_original_genome_space) @@ -561,110 +726,123 @@ int extend_lower_part_of_window(int starting_coord, int initial_min_coord, int g int get_blocks(int ** block_coordinates, int genome_size,int * snp_site_coords,int number_of_branch_snps, int window_size, int cutoff, char * original_sequence, int * snp_locations, int number_of_snps) { - // Set up the window counter with 1 value per base in the branch + // Set up the window counter with 1 value per base in the branch int * window_count; window_count = (int *) calloc((genome_size+1),sizeof(int)); - - // Integer array with location of gaps + + // Integer array with location of gaps int8_t * gaps_in_original_genome_space; - gaps_in_original_genome_space = (int8_t *) calloc((genome_size+1),sizeof(int8_t)); - int x =0; - for(x=0; x< number_of_snps; x++) - { - if((original_sequence[x] == 'N' || original_sequence[x] == '-' ) && snp_locations[x] != 0) - { - gaps_in_original_genome_space[snp_locations[x]-1] = 1; - } - } - - - // create the pileup of the snps and their sphere of influence - int snp_counter = 0; - for(snp_counter = 0; snp_counter < number_of_branch_snps; snp_counter++) - { - int j = 0; - // Lower bound of the window around a snp - int snp_sliding_window_counter = snp_site_coords[snp_counter]-(window_size/2); - - snp_sliding_window_counter = extend_lower_part_of_window(snp_site_coords[snp_counter] - 1 , snp_sliding_window_counter, genome_size, gaps_in_original_genome_space); - - if(snp_sliding_window_counter < 0) - { - snp_sliding_window_counter = 0; - } - - // Upper bound of the window around a snp - int max_snp_sliding_window_counter = snp_site_coords[snp_counter]+(window_size/2); - max_snp_sliding_window_counter = extend_upper_part_of_window(snp_site_coords[snp_counter] + 1, max_snp_sliding_window_counter, genome_size, gaps_in_original_genome_space); - - if(max_snp_sliding_window_counter>genome_size) - { - max_snp_sliding_window_counter = genome_size; - } - - for(j = snp_sliding_window_counter; j < max_snp_sliding_window_counter; j++) - { - window_count[j] += 1; - } - } - - int number_of_blocks = 0; - int in_block = 0; - int block_lower_bound = 0; - // Scan across the pileup and record where blocks are above the cutoff - int i; - for(i = 0; i < genome_size; i++) - { - // Just entered the start of a block - if(window_count[i] > cutoff && in_block == 0) - { - block_lower_bound = i; - in_block = 1; + gaps_in_original_genome_space = (int8_t *) calloc((genome_size+1),sizeof(int8_t)); + int x = 0; + for(x=0; x< number_of_snps; x++) + { + if((original_sequence[x] == 'N' || original_sequence[x] == '-' ) && snp_locations[x] != 0) + { + gaps_in_original_genome_space[snp_locations[x]-1] = 1; + } } - // Just left a block - if((window_count[i] <= cutoff || i+1 == genome_size ) && in_block == 1) - { - block_coordinates[0][number_of_blocks] = block_lower_bound; - block_coordinates[1][number_of_blocks] = i-1; - number_of_blocks++; - in_block = 0; - } - - } + // create the pileup of the snps and their sphere of influence + int snp_counter = 0; + for(snp_counter = 0; snp_counter < number_of_branch_snps; snp_counter++) + { + int j = 0; + // Lower bound of the window around a snp + int snp_sliding_window_counter = snp_site_coords[snp_counter]-(window_size/2); - // Move blocks inwards to next SNP - for(i = 0; i < number_of_blocks; i++) - { - for(snp_counter = 0; snp_counter < number_of_branch_snps; snp_counter++) - { - if(snp_site_coords[snp_counter] >= block_coordinates[0][i] ) - { - block_coordinates[0][i] = snp_site_coords[snp_counter]; - break; - } - } - - for(snp_counter = number_of_branch_snps-1; snp_counter >= 0 ; snp_counter--) - { - if(snp_site_coords[snp_counter] <= block_coordinates[1][i] ) - { - block_coordinates[1][i] = snp_site_coords[snp_counter]; - break; - } - } - } + snp_sliding_window_counter = extend_lower_part_of_window(snp_site_coords[snp_counter] - 1 , + snp_sliding_window_counter, + genome_size, + gaps_in_original_genome_space); + + if(snp_sliding_window_counter < 0) + { + snp_sliding_window_counter = 0; + } + + // Upper bound of the window around a snp + int max_snp_sliding_window_counter = snp_site_coords[snp_counter]+(window_size/2); + max_snp_sliding_window_counter = extend_upper_part_of_window(snp_site_coords[snp_counter] + 1, + max_snp_sliding_window_counter, + genome_size, + gaps_in_original_genome_space); + + if(max_snp_sliding_window_counter>genome_size) + { + max_snp_sliding_window_counter = genome_size; + } + + for(j = snp_sliding_window_counter; j < max_snp_sliding_window_counter; j++) + { + window_count[j] += 1; + } + + } + + int number_of_blocks = 0; + int in_block = 0; + int block_lower_bound = 0; + // Scan across the pileup and record where blocks are above the cutoff + int i; + for(i = 0; i <= genome_size; i++) + { + // Just entered the start of a block + if(window_count[i] > cutoff && in_block == 0) + { + block_lower_bound = i; + in_block = 1; + } + + // Reached end of genome + if(i == genome_size && in_block == 1) + { + block_coordinates[0][number_of_blocks] = block_lower_bound; + block_coordinates[1][number_of_blocks] = i; + number_of_blocks++; + in_block = 0; + } + // Just left a block + else if(window_count[i] <= cutoff && in_block == 1) + { + block_coordinates[0][number_of_blocks] = block_lower_bound; + block_coordinates[1][number_of_blocks] = i-1; + number_of_blocks++; + in_block = 0; + } + + } + // Move blocks inwards to next SNP + for(i = 0; i < number_of_blocks; i++) + { + for(snp_counter = 0; snp_counter < number_of_branch_snps; snp_counter++) + { + if(snp_site_coords[snp_counter] >= block_coordinates[0][i] ) + { + block_coordinates[0][i] = snp_site_coords[snp_counter]; + break; + } + } + + for(snp_counter = number_of_branch_snps-1; snp_counter >= 0 ; snp_counter--) + { + if(snp_site_coords[snp_counter] <= block_coordinates[1][i] ) + { + block_coordinates[1][i] = snp_site_coords[snp_counter]; + break; + } + } + } - free(gaps_in_original_genome_space); - free(window_count); - return number_of_blocks; + free(gaps_in_original_genome_space); + free(window_count); + return number_of_blocks; } -void move_blocks_inwards_while_likelihood_improves(int number_of_blocks,int ** block_coordinates, int min_snps, int * snp_site_coords, int number_of_branch_snps,char * branch_snp_sequence, int * snp_locations, int branch_genome_size,char * child_sequence, int length_of_sequence, double * block_likelihoods, int cutoff_value) +void move_blocks_inwards_while_likelihood_improves(int number_of_blocks,int ** block_coordinates, int min_snps, int * snp_site_coords, int number_of_branch_snps,char * branch_snp_sequence, int * snp_locations, int branch_genome_size,char * child_sequence, int length_of_sequence, double * block_likelihoods, int cutoff_value, float trimming_ratio) { int i; @@ -705,9 +883,9 @@ void move_blocks_inwards_while_likelihood_improves(int number_of_blocks,int ** b int block_genome_size_without_gaps = block_coordinates[3][i]; int block_snp_count; - int next_start_position = current_start; + int next_start_position = current_start; int start_index = find_starting_index( current_start, snp_site_coords,0, number_of_branch_snps); - int end_index = find_starting_index( current_end, snp_site_coords, start_index, number_of_branch_snps); + int end_index = find_starting_index( current_end, snp_site_coords, start_index, number_of_branch_snps); block_snp_count = find_number_of_snps_in_block_with_start_end_index(current_start, current_end, snp_site_coords, branch_snp_sequence, number_of_branch_snps,start_index,end_index); @@ -722,39 +900,41 @@ void move_blocks_inwards_while_likelihood_improves(int number_of_blocks,int ** b // Move left inwards while the likelihood gets better + // to avoid the current asymmetry in which the two edges are treated, we + // should trim SNPs from each edge in an alternating order while(current_start < current_end && block_snp_count >= min_snps) { - next_start_position++; - next_start_position = advance_window_start_to_next_snp(next_start_position, snp_site_coords, branch_snp_sequence, number_of_branch_snps); + next_start_position++; + next_start_position = advance_window_start_to_next_snp(next_start_position, snp_site_coords, branch_snp_sequence, number_of_branch_snps); - if(next_start_position >= current_end) - { - break; - } - if(next_start_position <= current_start) - { - break; - } + if(next_start_position >= current_end) + { + break; + } + if(next_start_position <= current_start) + { + break; + } - int previous_block_snp_count = block_snp_count; - int previous_block_genome_size_without_gaps = block_genome_size_without_gaps; - block_snp_count = find_number_of_snps_in_block_with_start_end_index(next_start_position, current_end, snp_site_coords, branch_snp_sequence, number_of_branch_snps,start_index,end_index); - block_genome_size_without_gaps = calculate_block_size_without_gaps(child_sequence, snp_locations, next_start_position, current_end, length_of_sequence); + int previous_block_snp_count = block_snp_count; + int previous_block_genome_size_without_gaps = block_genome_size_without_gaps; + block_snp_count = find_number_of_snps_in_block_with_start_end_index(next_start_position, current_end, snp_site_coords, branch_snp_sequence, number_of_branch_snps,start_index,end_index); + block_genome_size_without_gaps = calculate_block_size_without_gaps(child_sequence, snp_locations, next_start_position, current_end, length_of_sequence); - double next_block_likelihood = get_block_likelihood(branch_genome_size, number_of_branch_snps, block_genome_size_without_gaps, block_snp_count); + double next_block_likelihood = get_block_likelihood(branch_genome_size, number_of_branch_snps, block_genome_size_without_gaps, block_snp_count); - if(next_block_likelihood <= current_block_likelihood) - { - current_block_likelihood = next_block_likelihood; - current_start = next_start_position; - start_index++; - } - else - { - block_snp_count = previous_block_snp_count; - block_genome_size_without_gaps = previous_block_genome_size_without_gaps; - break; - } + if(next_block_likelihood*trimming_ratio <= current_block_likelihood && block_snp_count >= min_snps) + { + current_block_likelihood = next_block_likelihood; + current_start = next_start_position; + start_index++; + } + else + { + block_snp_count = previous_block_snp_count; + block_genome_size_without_gaps = previous_block_genome_size_without_gaps; + break; + } } int next_end_position = current_end; @@ -762,45 +942,45 @@ void move_blocks_inwards_while_likelihood_improves(int number_of_blocks,int ** b // Move Right inwards while the likelihood gets better while(current_start < current_end && block_snp_count >= min_snps) { - next_end_position--; - next_end_position = rewind_window_end_to_last_snp(next_end_position, snp_site_coords, branch_snp_sequence, number_of_branch_snps); + next_end_position--; + next_end_position = rewind_window_end_to_last_snp(next_end_position, snp_site_coords, branch_snp_sequence, number_of_branch_snps); - if(next_end_position <= current_start ) - { - break; - } - if(next_end_position >= current_end) - { - break; - } + if(next_end_position <= current_start ) + { + break; + } + if(next_end_position >= current_end) + { + break; + } - int previous_block_snp_count = block_snp_count; - int previous_block_genome_size_without_gaps = block_genome_size_without_gaps; - block_snp_count = find_number_of_snps_in_block(current_start, next_end_position, snp_site_coords, branch_snp_sequence, number_of_branch_snps); - block_genome_size_without_gaps = calculate_block_size_without_gaps(child_sequence, snp_locations, current_start, next_end_position, length_of_sequence); + int previous_block_snp_count = block_snp_count; + int previous_block_genome_size_without_gaps = block_genome_size_without_gaps; + block_snp_count = find_number_of_snps_in_block(current_start, next_end_position, snp_site_coords, branch_snp_sequence, number_of_branch_snps); + block_genome_size_without_gaps = calculate_block_size_without_gaps(child_sequence, snp_locations, current_start, next_end_position, length_of_sequence); - double next_block_likelihood = get_block_likelihood(branch_genome_size, number_of_branch_snps, block_genome_size_without_gaps, block_snp_count); - if(next_block_likelihood <= current_block_likelihood) - { - current_block_likelihood = next_block_likelihood; - current_end = next_end_position; - end_index--; - } - else - { - block_snp_count = previous_block_snp_count; - block_genome_size_without_gaps = previous_block_genome_size_without_gaps; - break; - } + double next_block_likelihood = get_block_likelihood(branch_genome_size, number_of_branch_snps, block_genome_size_without_gaps, block_snp_count); + if(next_block_likelihood*trimming_ratio <= current_block_likelihood && block_snp_count >= min_snps) + { + current_block_likelihood = next_block_likelihood; + current_end = next_end_position; + end_index--; + } + else + { + block_snp_count = previous_block_snp_count; + block_genome_size_without_gaps = previous_block_genome_size_without_gaps; + break; + } } - block_coordinates[0][i] = current_start; - block_coordinates[1][i] = current_end; - block_coordinates[2][i] = (int) current_block_likelihood; - block_coordinates[3][i] = block_genome_size_without_gaps; - - block_likelihoods[i] = current_block_likelihood; + block_coordinates[0][i] = current_start; + block_coordinates[1][i] = current_end; + block_coordinates[2][i] = (int) current_block_likelihood; + block_coordinates[3][i] = block_genome_size_without_gaps; + + block_likelihoods[i] = current_block_likelihood; } } @@ -890,24 +1070,24 @@ double snp_density(int length_of_sequence, int number_of_snps) return number_of_snps*1.0/length_of_sequence; } - - - -double calculate_threshold(int branch_genome_size, int window_size) +// calculate an approximate p value threshold corrected to multiple testing +double calculate_threshold(int branch_genome_size, int window_size, float uncorrected_p_value) { - return 1-(RANDOMNESS_DAMPNER/((branch_genome_size*1.0)/((window_size*1.0)/WINDOW_SNP_MODE_TARGET))); + return 1-(uncorrected_p_value/((branch_genome_size*1.0)/(window_size*1.0))); } -int calculate_cutoff(int branch_genome_size, int window_size, int num_branch_snps) +int calculate_cutoff(int branch_genome_size, int window_size, int num_branch_snps, int min_snps, float uncorrected_p_value) { double threshold = 0.0; int cutoff = 0; double pvalue = 0.0; double part1, part2, part3 = 0.0; - threshold = calculate_threshold(branch_genome_size, window_size); + threshold = calculate_threshold(branch_genome_size, + window_size, + uncorrected_p_value); - while( pvalue <= threshold) + while(pvalue <= threshold) { part1 = reduce_factorial(window_size,cutoff)-reduce_factorial(cutoff,cutoff); part2 = log10((num_branch_snps*1.0)/branch_genome_size)*cutoff; @@ -917,11 +1097,23 @@ int calculate_cutoff(int branch_genome_size, int window_size, int num_branch_snp } cutoff--; - + if (cutoff < min_snps) + { + cutoff = min_snps - 1; + } + + // End if the SNP density of the branch is too high for the specified window size + if (cutoff >= 2*(int)(window_size/2)) // Account for integer division/rounding in this condition + { + return 0; // In this case, it is impossible to call recombinations on the branch + } + +// printf("Window size %i cutoff %i num_snps %i\n", window_size,cutoff,num_branch_snps); + return cutoff; } -int p_value_test(int branch_genome_size, int window_size, int num_branch_snps, int block_snp_count, int min_snps) +int p_value_test(int branch_genome_size, int window_size, int num_branch_snps, int block_snp_count, int min_snps, float uncorrected_p_value) { double threshold = 0.0; int cutoff = 0; @@ -933,7 +1125,7 @@ int p_value_test(int branch_genome_size, int window_size, int num_branch_snps, i return 0; } - threshold = 0.05/branch_genome_size; + threshold = uncorrected_p_value/branch_genome_size; while( cutoff < block_snp_count) { diff --git a/src/branch_sequences.h b/src/branch_sequences.h index f4937a38..1ab664f3 100644 --- a/src/branch_sequences.h +++ b/src/branch_sequences.h @@ -21,26 +21,26 @@ #define _BRANCH_SEQUENCES_H_ #include "seqUtil.h" #include "Newickform.h" -char *generate_branch_sequences(newick_node *root, FILE *vcf_file_pointer,int * snp_locations, int number_of_snps, char** column_names, int number_of_columns, char * leaf_sequence, int length_of_original_genome, FILE * block_file_pointer, FILE * gff_file_pointer,int min_snps, FILE * branch_snps_file_pointer, int window_min, int window_max); +char *generate_branch_sequences(newick_node *root, FILE *vcf_file_pointer,int * snp_locations, int number_of_snps, char** column_names, int number_of_columns, char * leaf_sequence, int length_of_original_genome, FILE * block_file_pointer, FILE * gff_file_pointer,int min_snps, FILE * branch_snps_file_pointer, int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag); void identify_recombinations(int number_of_branch_snps, int * branches_snp_sites,int length_of_original_genome); double calculate_snp_density(int * branches_snp_sites, int number_of_branch_snps, int index); -void get_likelihood_for_windows(char * child_sequence, int length_of_sequence, int * snp_site_coords, int branch_genome_size, int number_of_branch_snps, int * snp_locations, newick_node * current_node, FILE * block_file_pointer, newick_node *root, char * branch_snp_sequence, FILE * gff_file_pointer,int min_snps, int length_of_original_genome, char * original_sequence,int window_min, int window_max); +void get_likelihood_for_windows(char * child_sequence, int length_of_sequence, int * snp_site_coords, int branch_genome_size, int number_of_branch_snps, int * snp_locations, newick_node * current_node, FILE * block_file_pointer, newick_node *root, char * branch_snp_sequence, FILE * gff_file_pointer,int min_snps, int length_of_original_genome, char * original_sequence,int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag); double get_block_likelihood(int branch_genome_size, int number_of_branch_snps, int block_genome_size_without_gaps, int number_of_block_snps); -int calculate_window_size(int branch_genome_size, int number_of_branch_snps,int window_min, int window_max); -double calculate_threshold(int branch_genome_size, int window_size); -int p_value_test(int branch_genome_size, int window_size, int num_branch_snps, int block_snp_count, int min_snps); +int calculate_window_size(int branch_genome_size, int number_of_branch_snps,int window_min, int window_max, int min_snps, int window_factor); +double calculate_threshold(int branch_genome_size, int window_size, float uncorrected_p_value); +int p_value_test(int branch_genome_size, int window_size, int num_branch_snps, int block_snp_count, int min_snps, float uncorrected_p_value); double reduce_factorial(int l, int i); void fill_in_recombinations_with_gaps(newick_node *root, int * parent_recombinations, int parent_num_recombinations, int current_total_snps,int num_blocks, int ** current_block_coordinates,int length_of_original_genome,int * snp_locations, int number_of_snps); int copy_and_concat_integer_arrays(int * array_1, int array_1_size, int * array_2, int array_2_size, int * output_array); int copy_and_concat_2d_integer_arrays(int ** array_1, int array_1_size, int ** array_2, int array_2_size, int ** output_array); double snp_density(int length_of_sequence, int number_of_snps); -int calculate_cutoff(int branch_genome_size, int window_size, int num_branch_snps); +int calculate_cutoff(int branch_genome_size, int window_size, int num_branch_snps, int min_snps, float uncorrected_p_value); int get_smallest_log_likelihood(double * candidate_blocks, int number_of_candidate_blocks); int exclude_snp_sites_in_block(int window_start_coordinate, int window_end_coordinate, int * snp_site_coords, int number_of_branch_snps); int flag_smallest_log_likelihood_recombinations(int ** candidate_blocks, int number_of_candidate_blocks, int number_of_branch_snps, int * snp_site_coords, int * recombinations, int number_of_recombinations,newick_node * current_node, FILE * block_file_pointer, newick_node *root,int * snp_locations, int total_num_snps, FILE * gff_file_pointer, double * block_likelihooods); int calculate_number_of_bases_in_recombations_excluding_gaps(int ** block_coordinates, int num_blocks,char * child_sequence, int * snp_locations,int length_of_original_genome); void carry_unambiguous_gaps_up_tree(newick_node *root); -void move_blocks_inwards_while_likelihood_improves(int number_of_blocks,int ** block_coordinates, int min_snps, int * snp_site_coords, int number_of_branch_snps,char * branch_snp_sequence, int * snp_locations, int branch_genome_size,char * child_sequence, int length_of_sequence, double * block_likelihoods, int cutoff_value); +void move_blocks_inwards_while_likelihood_improves(int number_of_blocks,int ** block_coordinates, int min_snps, int * snp_site_coords, int number_of_branch_snps,char * branch_snp_sequence, int * snp_locations, int branch_genome_size,char * child_sequence, int length_of_sequence, double * block_likelihoods, int cutoff_value, float trimming_ratio); int get_blocks(int ** block_coordinates, int branch_genome_size,int * snp_site_coords,int number_of_branch_snps, int window_size, int cutoff, char * original_sequence, int * snp_locations, int number_of_snps); int extend_lower_part_of_window(int starting_coord, int initial_min_coord, int genome_size, int8_t * gaps_in_original_genome_space); int extend_upper_part_of_window(int starting_coord, int initial_max_coord, int genome_size, int8_t * gaps_in_original_genome_space); @@ -48,8 +48,6 @@ int get_list_of_snp_indices_which_fall_in_downstream_recombinations(int ** curre int calculate_genome_length_excluding_blocks_and_gaps(char * sequence, int length_of_sequence, int ** block_coordinates, int num_blocks); -#define WINDOW_SNP_MODE_TARGET 10 -#define RANDOMNESS_DAMPNER 0.05 #define MAX_SAMPLE_NAME_SIZE 1024 #endif diff --git a/src/gff_file.c b/src/gff_file.c index a277a1a3..753acc2a 100644 --- a/src/gff_file.c +++ b/src/gff_file.c @@ -31,16 +31,16 @@ void print_gff_header(FILE * gff_file_pointer, int genome_length) void print_gff_line(FILE * gff_file_pointer, int start_coordinate, int end_coordinate, int number_of_snps, char * current_node_id, char * parent_node_id, char * taxon_names, double neg_log_likelihood) { - fprintf(gff_file_pointer, "SEQUENCE\tGUBBINS\tCDS\t"); - fprintf(gff_file_pointer, "%d\t",start_coordinate); - fprintf(gff_file_pointer, "%d\t",end_coordinate); - fprintf(gff_file_pointer, "0.000\t.\t0\t"); - - fprintf(gff_file_pointer, "node=\"%s->%s\";", parent_node_id, current_node_id ); - fprintf(gff_file_pointer, "neg_log_likelihood=\"%f\";", neg_log_likelihood); - fprintf(gff_file_pointer, "taxa=\"%s\";", taxon_names); - fprintf(gff_file_pointer, "snp_count=\"%d\";", number_of_snps); - fprintf(gff_file_pointer, "\n"); - - fflush(gff_file_pointer); + fprintf(gff_file_pointer, "SEQUENCE\tGUBBINS\tCDS\t"); + fprintf(gff_file_pointer, "%d\t",start_coordinate); + fprintf(gff_file_pointer, "%d\t",end_coordinate); + fprintf(gff_file_pointer, "0.000\t.\t0\t"); + + fprintf(gff_file_pointer, "node=\"%s->%s\";", parent_node_id, current_node_id ); + fprintf(gff_file_pointer, "neg_log_likelihood=\"%f\";", neg_log_likelihood); + fprintf(gff_file_pointer, "taxa=\"%s\";", taxon_names); + fprintf(gff_file_pointer, "snp_count=\"%d\";", number_of_snps); + fprintf(gff_file_pointer, "\n"); + + fflush(gff_file_pointer); } diff --git a/src/gubbins.c b/src/gubbins.c index 21e06938..cdc7cdea 100644 --- a/src/gubbins.c +++ b/src/gubbins.c @@ -39,16 +39,16 @@ // given a sample name extract the sequences from the vcf // compare two sequences to get pseudo sequnece and fill in with difference from reference sequence -void run_gubbins(char vcf_filename[], char tree_filename[],char multi_fasta_filename[], int min_snps, char original_multi_fasta_filename[], int window_min, int window_max) +void run_gubbins(char vcf_filename[], char tree_filename[],char multi_fasta_filename[], int min_snps, char original_multi_fasta_filename[], int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag) { load_sequences_from_multifasta_file(multi_fasta_filename); - extract_sequences(vcf_filename, tree_filename, multi_fasta_filename,min_snps,original_multi_fasta_filename,window_min, window_max); + extract_sequences(vcf_filename, tree_filename, multi_fasta_filename,min_snps,original_multi_fasta_filename,window_min, window_max, uncorrected_p_value, trimming_ratio, extensive_search_flag); create_tree_statistics_file(tree_filename,get_sample_statistics(),number_of_samples_from_parse_phylip()); freeup_memory(); } -void extract_sequences(char vcf_filename[], char tree_filename[],char multi_fasta_filename[],int min_snps, char original_multi_fasta_filename[], int window_min, int window_max) +void extract_sequences(char vcf_filename[], char tree_filename[],char multi_fasta_filename[],int min_snps, char original_multi_fasta_filename[], int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag) { FILE *vcf_file_pointer; vcf_file_pointer=fopen(vcf_filename, "r"); @@ -74,7 +74,7 @@ void extract_sequences(char vcf_filename[], char tree_filename[],char multi_fast get_integers_from_column_in_vcf(vcf_file_pointer, snp_locations, number_of_snps, column_number_for_column_name(column_names, "POS", number_of_columns)); - root_node = build_newick_tree(tree_filename, vcf_file_pointer,snp_locations, number_of_snps, column_names, number_of_columns, length_of_original_genome,min_snps,window_min, window_max); + root_node = build_newick_tree(tree_filename, vcf_file_pointer,snp_locations, number_of_snps, column_names, number_of_columns, length_of_original_genome,min_snps,window_min, window_max, uncorrected_p_value, trimming_ratio, extensive_search_flag); fclose(vcf_file_pointer); int* filtered_snp_locations = calloc((number_of_snps+1), sizeof(int)); diff --git a/src/gubbins.h b/src/gubbins.h index 8a362501..10c00d95 100644 --- a/src/gubbins.h +++ b/src/gubbins.h @@ -23,8 +23,8 @@ #include "seqUtil.h" #include "Newickform.h" -void run_gubbins(char vcf_filename[], char tree_filename[], char multi_fasta_filename[], int min_snps, char original_multi_fasta_filename[], int window_min, int window_max); -void extract_sequences(char vcf_filename[], char tree_filename[],char multi_fasta_filename[], int min_snps, char original_multi_fasta_filename[], int window_min, int window_max); +void run_gubbins(char vcf_filename[], char tree_filename[], char multi_fasta_filename[], int min_snps, char original_multi_fasta_filename[], int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag); +void extract_sequences(char vcf_filename[], char tree_filename[],char multi_fasta_filename[], int min_snps, char original_multi_fasta_filename[], int window_min, int window_max, float uncorrected_p_value, float trimming_ratio, int extensive_search_flag); char find_first_real_base(int base_position, int number_of_child_sequences, char ** child_sequences); diff --git a/src/main.c b/src/main.c index f581db27..97c52478 100644 --- a/src/main.c +++ b/src/main.c @@ -49,6 +49,8 @@ void print_usage(FILE* stream, int exit_code) " -m Min SNPs for identifying a recombination block\n" " -a Min window size\n" " -b Max window size\n" + " -p p value for detecting recombinations\n" + " -i p value ratio for trimming recombinations\n" " -h Display this usage information.\n\n" ); exit (exit_code); @@ -70,21 +72,24 @@ int check_file_exists_or_exit(char * filename) int main (argc, argv) int argc; char **argv; { - int c; - char multi_fasta_filename[MAX_FILENAME_SIZE] = {""}; - char vcf_filename[MAX_FILENAME_SIZE] = {""}; - char tree_filename[MAX_FILENAME_SIZE] = {""}; - char original_multi_fasta_filename[MAX_FILENAME_SIZE] = {""}; + int c; + char multi_fasta_filename[MAX_FILENAME_SIZE] = {""}; + char vcf_filename[MAX_FILENAME_SIZE] = {""}; + char tree_filename[MAX_FILENAME_SIZE] = {""}; + char original_multi_fasta_filename[MAX_FILENAME_SIZE] = {""}; - int recombination_flag = 0 ; - int min_snps = 3; - int window_min = 100; - int window_max = 10000; - program_name = argv[0]; + int recombination_flag = 0 ; + int min_snps = 3; + int window_min = 100; + int window_max = 10000; + float uncorrected_p_value = 0.05; + float trimming_ratio = 1.0; + int extensive_search_flag = 0; + program_name = argv[0]; - while (1) + while (1) { - static struct option long_options[] = + static struct option long_options[] = { {"help", no_argument, 0, 'h'}, {"recombination", no_argument, 0, 'r'}, @@ -94,64 +99,76 @@ int main (argc, argv) int argc; char **argv; {"min_snps", required_argument, 0, 'm'}, {"window_min", required_argument, 0, 'a'}, {"window_max", required_argument, 0, 'b'}, - - {0, 0, 0, 0} + {"p_value", required_argument, 0, 'p'}, + {"trimming_ratio", required_argument, 0, 'i'}, + {"extended_search", required_argument, 0, 'x'}, + + {0, 0, 0, 0} }; - /* getopt_long stores the option index here. */ - int option_index = 0; - c = getopt_long (argc, argv, "hrv:f:t:m:a:b:", - long_options, &option_index); - /* Detect the end of the options. */ - if (c == -1) + /* getopt_long stores the option index here. */ + int option_index = 0; + c = getopt_long (argc, argv, "hrxv:f:t:m:a:b:p:i:", + long_options, &option_index); + /* Detect the end of the options. */ + if (c == -1) break; - switch (c) + switch (c) { - case 0: - /* If this option set a flag, do nothing else now. */ - if (long_options[option_index].flag != 0) - break; - printf ("option %s", long_options[option_index].name); - if (optarg) - printf (" with arg %s", optarg); - printf ("\n"); - break; - case 'h': - print_usage(stdout, EXIT_SUCCESS); - case 'r': - recombination_flag = 1; - break; - case 'f': - memcpy(original_multi_fasta_filename, optarg, size_of_string(optarg) +1); - break; - case 'v': - memcpy(vcf_filename, optarg, size_of_string(optarg) +1); - break; - case 'm': - min_snps = atoi(optarg); - break; - case 'a': - window_min = atoi(optarg); - break; - case 'b': - window_max = atoi(optarg); - break; - case 't': - memcpy(tree_filename, optarg, size_of_string(optarg) +1); - break; - case '?': - /* getopt_long already printed an error message. */ - break; - default: - abort (); + case 0: + /* If this option set a flag, do nothing else now. */ + if (long_options[option_index].flag != 0) + break; + printf ("option %s", long_options[option_index].name); + if (optarg) + printf (" with arg %s", optarg); + printf ("\n"); + break; + case 'h': + print_usage(stdout, EXIT_SUCCESS); + case 'r': + recombination_flag = 1; + break; + case 'x': + extensive_search_flag = 1; + break; + case 'f': + memcpy(original_multi_fasta_filename, optarg, size_of_string(optarg) +1); + break; + case 'v': + memcpy(vcf_filename, optarg, size_of_string(optarg) +1); + break; + case 'm': + min_snps = atoi(optarg); + break; + case 'a': + window_min = atoi(optarg); + break; + case 'b': + window_max = atoi(optarg); + break; + case 'p': + uncorrected_p_value = atof(optarg); + break; + case 'i': + trimming_ratio = atof(optarg); + break; + case 't': + memcpy(tree_filename, optarg, size_of_string(optarg) +1); + break; + case '?': + /* getopt_long already printed an error message. */ + break; + default: + abort (); } } /* Print any remaining command line arguments (not options). */ if (optind < argc) { - memcpy(multi_fasta_filename, argv[optind], size_of_string(argv[optind]) +1); - optind++; + memcpy(multi_fasta_filename, argv[optind], size_of_string(argv[optind]) +1); + optind++; } check_file_exists_or_exit(multi_fasta_filename); @@ -161,7 +178,16 @@ int main (argc, argv) int argc; char **argv; check_file_exists_or_exit(vcf_filename); check_file_exists_or_exit(tree_filename); check_file_exists_or_exit(original_multi_fasta_filename); - run_gubbins(vcf_filename,tree_filename,multi_fasta_filename, min_snps,original_multi_fasta_filename,window_min, window_max); + run_gubbins(vcf_filename, + tree_filename, + multi_fasta_filename, + min_snps, + original_multi_fasta_filename, + window_min, + window_max, + uncorrected_p_value, + trimming_ratio, + extensive_search_flag); } else { diff --git a/src/parse_phylip.c b/src/parse_phylip.c index c1c96de0..24bb2aa4 100644 --- a/src/parse_phylip.c +++ b/src/parse_phylip.c @@ -61,15 +61,15 @@ int get_internal_node(int sequence_index) void get_sequence_for_sample_name(char * sequence_bases, char * sample_name) { - int sequence_index; - sequence_index = find_sequence_index_from_sample_name( sample_name); - if(sequence_index < 0) - { - printf("Couldnt find sequence name %s with index %d\n", sample_name,sequence_index); - exit(1); - } + int sequence_index; + sequence_index = find_sequence_index_from_sample_name( sample_name); + if(sequence_index < 0) + { + printf("Couldnt find sequence name %s with index %d\n", sample_name,sequence_index); + exit(1); + } - memcpy(sequence_bases, sequences[sequence_index], size_of_string(sequences[sequence_index]) +1); + memcpy(sequence_bases, sequences[sequence_index], size_of_string(sequences[sequence_index]) +1); } @@ -99,36 +99,36 @@ void fill_in_unambiguous_gaps_in_parent_from_children(int parent_sequence_index, void fill_in_unambiguous_bases_in_parent_from_children_where_parent_has_a_gap(int parent_sequence_index, int * child_sequence_indices, int num_children) { - int snp_counter = 0; + int snp_counter = 0; - for(snp_counter = 0; snp_counter < num_snps ; snp_counter++) - { - if(toupper(sequences[parent_sequence_index][snp_counter]) != 'N' && sequences[parent_sequence_index][snp_counter] != '-') - { - break; - } - - int child_counter = 0; + for(snp_counter = 0; snp_counter < num_snps ; snp_counter++) + { + if(toupper(sequences[parent_sequence_index][snp_counter]) != 'N' && sequences[parent_sequence_index][snp_counter] != '-') + { + break; + } + + int child_counter = 0; char comparison_base = '\0'; - for(child_counter = 0 ; child_counter < num_children ; child_counter++) - { - int child_index = child_sequence_indices[child_counter]; - if(child_counter == 0) - { - comparison_base = toupper(sequences[child_index][snp_counter]); - } - - if(comparison_base != toupper(sequences[child_index][snp_counter]) ) - { - break; - } - } - - if(toupper(sequences[parent_sequence_index][snp_counter]) != comparison_base) - { - sequences[parent_sequence_index][snp_counter] = comparison_base; - } - } + for(child_counter = 0 ; child_counter < num_children ; child_counter++) + { + int child_index = child_sequence_indices[child_counter]; + if(child_counter == 0) + { + comparison_base = toupper(sequences[child_index][snp_counter]); + } + + if(comparison_base != toupper(sequences[child_index][snp_counter]) ) + { + break; + } + } + + if(toupper(sequences[parent_sequence_index][snp_counter]) != comparison_base) + { + sequences[parent_sequence_index][snp_counter] = comparison_base; + } + } } int does_column_contain_snps(int snp_column, char reference_base) @@ -239,52 +239,52 @@ void filter_sequence_bases_and_rotate(char * reference_bases, char ** filtered_b void set_number_of_recombinations_for_sample(char * sample_name, int number_of_recombinations) { - int sample_index ; - sample_index = find_sequence_index_from_sample_name( sample_name); - if( sample_index == -1) - { - return; - } - ((sample_statistics *) statistics_for_samples[sample_index])->number_of_recombinations = number_of_recombinations; + int sample_index ; + sample_index = find_sequence_index_from_sample_name( sample_name); + if( sample_index == -1) + { + return; + } + ((sample_statistics *) statistics_for_samples[sample_index])->number_of_recombinations = number_of_recombinations; } void set_number_of_snps_for_sample(char * sample_name, int number_of_snps) { - int sample_index ; - sample_index = find_sequence_index_from_sample_name( sample_name); - if( sample_index == -1) - { - return; - } - - ((sample_statistics *) statistics_for_samples[sample_index])->number_of_snps = number_of_snps; + int sample_index ; + sample_index = find_sequence_index_from_sample_name( sample_name); + if( sample_index == -1) + { + return; + } + + ((sample_statistics *) statistics_for_samples[sample_index])->number_of_snps = number_of_snps; } void set_number_of_blocks_for_sample(char * sample_name,int num_blocks) { - int sample_index ; - sample_index = find_sequence_index_from_sample_name( sample_name); - if( sample_index == -1) - { - return; - } - - ((sample_statistics *) statistics_for_samples[sample_index])->number_of_blocks = num_blocks; + int sample_index ; + sample_index = find_sequence_index_from_sample_name( sample_name); + if( sample_index == -1) + { + return; + } + + ((sample_statistics *) statistics_for_samples[sample_index])->number_of_blocks = num_blocks; } void set_genome_length_without_gaps_for_sample(char * sample_name, int genome_length_without_gaps) { - int sample_index ; - sample_index = find_sequence_index_from_sample_name( sample_name); - if( sample_index == -1) - { - return; - } - - ((sample_statistics *) statistics_for_samples[sample_index])->genome_length_without_gaps = genome_length_without_gaps; + int sample_index ; + sample_index = find_sequence_index_from_sample_name( sample_name); + if( sample_index == -1) + { + return; + } + + ((sample_statistics *) statistics_for_samples[sample_index])->genome_length_without_gaps = genome_length_without_gaps; } void set_genome_length_excluding_blocks_and_gaps_for_sample(char * sample_name, int genome_length_excluding_blocks_and_gaps) @@ -299,15 +299,26 @@ void set_genome_length_excluding_blocks_and_gaps_for_sample(char * sample_name, ((sample_statistics *) statistics_for_samples[sample_index])->genome_length_excluding_blocks_and_gaps = genome_length_excluding_blocks_and_gaps; } +void set_number_of_branch_bases_in_recombinations(char * sample_name, int bases_in_recombinations) +{ + int sample_index ; + sample_index = find_sequence_index_from_sample_name( sample_name); + if( sample_index == -1) + { + return; + } + ((sample_statistics *) statistics_for_samples[sample_index])->branch_bases_in_recombinations = bases_in_recombinations; +} + void set_number_of_bases_in_recombinations(char * sample_name, int bases_in_recombinations) { - int sample_index ; - sample_index = find_sequence_index_from_sample_name( sample_name); - if( sample_index == -1) - { - return; - } - ((sample_statistics *) statistics_for_samples[sample_index])->bases_in_recombinations = bases_in_recombinations; + int sample_index ; + sample_index = find_sequence_index_from_sample_name( sample_name); + if( sample_index == -1) + { + return; + } + ((sample_statistics *) statistics_for_samples[sample_index])->bases_in_recombinations = bases_in_recombinations; } @@ -360,61 +371,61 @@ int number_of_snps_in_phylip() void load_sequences_from_multifasta_file(char filename[]) { - int i; + int i; - num_snps = genome_length(filename); - num_samples = number_of_sequences_in_file(filename); - - sequences = (char **) calloc((num_samples+1),sizeof(char *)); - phylip_sample_names = (char **) calloc((num_samples+1),sizeof(char *)); - - for(i = 0; i < num_samples; i++) - { - sequences[i] = (char *) calloc((num_snps+1),sizeof(char)); - phylip_sample_names[i] = (char *) calloc((MAX_SAMPLE_NAME_SIZE+1),sizeof(char)); - } - get_sample_names_for_header(filename, phylip_sample_names, num_samples); - - int l; - i = 0; - int sequence_number = 0; - - gzFile fp; - kseq_t *seq; - - fp = gzopen(filename, "r"); - seq = kseq_init(fp); - - while ((l = kseq_read(seq)) >= 0) - { - for(i = 0; i< num_snps; i++) - { - sequences[sequence_number][i] = toupper(((char *) seq->seq.s)[i]); - if(sequences[sequence_number][i] == 'N') - { - sequences[sequence_number][i] = '-'; - } - } - sequence_number++; - } - - kseq_destroy(seq); - gzclose(fp); - - initialise_statistics(); - initialise_internal_node(); + num_snps = genome_length(filename); + num_samples = number_of_sequences_in_file(filename); + + sequences = (char **) calloc((num_samples+1),sizeof(char *)); + phylip_sample_names = (char **) calloc((num_samples+1),sizeof(char *)); + + for(i = 0; i < num_samples; i++) + { + sequences[i] = (char *) calloc((num_snps+1),sizeof(char)); + phylip_sample_names[i] = (char *) calloc((MAX_SAMPLE_NAME_SIZE+1),sizeof(char)); + } + get_sample_names_for_header(filename, phylip_sample_names, num_samples); + + int l; + i = 0; + int sequence_number = 0; + + gzFile fp; + kseq_t *seq; + + fp = gzopen(filename, "r"); + seq = kseq_init(fp); + + while ((l = kseq_read(seq)) >= 0) + { + for(i = 0; i< num_snps; i++) + { + sequences[sequence_number][i] = toupper(((char *) seq->seq.s)[i]); + if(sequences[sequence_number][i] == 'N') + { + sequences[sequence_number][i] = '-'; + } + } + sequence_number++; + } + + kseq_destroy(seq); + gzclose(fp); + + initialise_statistics(); + initialise_internal_node(); } void freeup_memory() { - int i; - for(i = 0; i < num_samples; i++) - { - free(sequences[i]); - free(phylip_sample_names[i]); - } - free(sequences); - free(phylip_sample_names); - free(internal_node); + int i; + for(i = 0; i < num_samples; i++) + { + free(sequences[i]); + free(phylip_sample_names[i]); + } + free(sequences); + free(phylip_sample_names); + free(internal_node); } diff --git a/src/parse_phylip.h b/src/parse_phylip.h index 29cdfa6a..9be4c939 100644 --- a/src/parse_phylip.h +++ b/src/parse_phylip.h @@ -28,6 +28,7 @@ int genome_length_without_gaps; int number_of_blocks; int bases_in_recombinations; + int branch_bases_in_recombinations; int genome_length_excluding_blocks_and_gaps; } sample_statistics; @@ -52,6 +53,7 @@ int get_internal_node(int sequence_index); void fill_in_unambiguous_bases_in_parent_from_children_where_parent_has_a_gap(int parent_sequence_index, int * child_sequence_indices, int num_children); void fill_in_unambiguous_gaps_in_parent_from_children(int parent_sequence_index, int * child_sequence_indices, int num_children); void freeup_memory(void); +void set_number_of_branch_bases_in_recombinations(char * sample_name, int bases_in_recombinations); void set_number_of_bases_in_recombinations(char * sample_name, int bases_in_recombinations); void filter_sequence_bases_and_rotate(char * reference_bases, char ** filtered_bases_for_snps, int number_of_filtered_snps); void set_genome_length_excluding_blocks_and_gaps_for_sample(char * sample_name, int genome_length_excluding_blocks_and_gaps); diff --git a/src/parse_vcf.c b/src/parse_vcf.c index db94c420..e81dc2fb 100644 --- a/src/parse_vcf.c +++ b/src/parse_vcf.c @@ -31,62 +31,62 @@ int * column_data; void get_integers_from_column_in_vcf(FILE * vcf_file_pointer, int * integer_values, int number_of_snps, int column_number) { - rewind(vcf_file_pointer); - char * szBuffer; - szBuffer = (char *) calloc(MAX_READ_BUFFER,sizeof(char)); - int reference_index = 0; - char result[1000] = {0}; - - do{ - szBuffer[0] = '\0'; - // check the first character of the line to see if its in the header - szBuffer = read_line(szBuffer, vcf_file_pointer); - - if(szBuffer[0] == '\0') - { - break; - } - - if(szBuffer[0] != '#') - { - split_string_and_return_specific_index(result, szBuffer, column_number,100000); - integer_values[reference_index] = atoi(result); - reference_index++; - } - - }while(szBuffer[0] != '\0'); - free(szBuffer); + rewind(vcf_file_pointer); + char * szBuffer; + szBuffer = (char *) calloc(MAX_READ_BUFFER,sizeof(char)); + int reference_index = 0; + char result[1000] = {0}; + + do{ + szBuffer[0] = '\0'; + // check the first character of the line to see if its in the header + szBuffer = read_line(szBuffer, vcf_file_pointer); + + if(szBuffer[0] == '\0') + { + break; + } + + if(szBuffer[0] != '#') + { + split_string_and_return_specific_index(result, szBuffer, column_number,100000); + integer_values[reference_index] = atoi(result); + reference_index++; + } + + }while(szBuffer[0] != '\0'); + free(szBuffer); } void get_sequence_from_column_in_vcf(FILE * vcf_file_pointer, char * sequence_bases, int number_of_snps, int column_number) { - rewind(vcf_file_pointer); - char * szBuffer; - szBuffer = (char *) calloc(MAX_READ_BUFFER,sizeof(char)); - int reference_index = 0; - char result[1000] = {0}; - - do{ - szBuffer[0] = '\0'; - // check the first character of the line to see if its in the header - szBuffer = read_line(szBuffer, vcf_file_pointer); - - if(szBuffer[0] == '\0') - { - break; - } - - if(szBuffer[0] != '#') - { - split_string_and_return_specific_index(result, szBuffer, column_number, 1000); - sequence_bases[reference_index] = result[0]; - reference_index++; - } - - }while(szBuffer[0] != '\0'); - - sequence_bases[reference_index] = '\0'; + rewind(vcf_file_pointer); + char * szBuffer; + szBuffer = (char *) calloc(MAX_READ_BUFFER,sizeof(char)); + int reference_index = 0; + char result[1000] = {0}; + + do{ + szBuffer[0] = '\0'; + // check the first character of the line to see if its in the header + szBuffer = read_line(szBuffer, vcf_file_pointer); + + if(szBuffer[0] == '\0') + { + break; + } + + if(szBuffer[0] != '#') + { + split_string_and_return_specific_index(result, szBuffer, column_number, 1000); + sequence_bases[reference_index] = result[0]; + reference_index++; + } + + }while(szBuffer[0] != '\0'); + + sequence_bases[reference_index] = '\0'; } void split_string_and_return_specific_index(char * result, char * input_string, int token_index, int input_string_length) diff --git a/src/phylip_of_snp_sites.c b/src/phylip_of_snp_sites.c index 2113678b..4b9668db 100644 --- a/src/phylip_of_snp_sites.c +++ b/src/phylip_of_snp_sites.c @@ -66,6 +66,6 @@ void create_phylip_of_snp_sites(char filename[], int number_of_snps, char ** bas } fprintf( fasta_file_pointer, "\n"); } - fclose(fasta_file_pointer); - free(base_filename); + fclose(fasta_file_pointer); + free(base_filename); } diff --git a/src/snp_searching.c b/src/snp_searching.c index d018f9b0..e83f40a5 100644 --- a/src/snp_searching.c +++ b/src/snp_searching.c @@ -137,7 +137,7 @@ int get_window_end_coordinates_excluding_gaps_with_start_end_index(int window_st for(i = start_index; i < number_of_snps; i++) { - if(snp_locations[i]>= window_start_coordinate && snp_locations[i] < window_end_coordinate) + if(snp_locations[i]>= window_start_coordinate && snp_locations[i] <= window_end_coordinate) { last_snp_location = i; if(child_sequence[i] == '-' || child_sequence[i] == 'N') @@ -150,7 +150,7 @@ int get_window_end_coordinates_excluding_gaps_with_start_end_index(int window_st break; } } - + if(last_snp_location > 0) { return snp_locations[last_snp_location] + 1 ; @@ -175,12 +175,15 @@ int find_number_of_snps_in_block_with_start_end_index(int window_start_coordinat return number_of_snps; } int i; - int number_of_snps_in_block =0; - start_index = find_starting_index( window_start_coordinate, snp_locations,start_index, end_index); + int number_of_snps_in_block = 0; + start_index = find_starting_index(window_start_coordinate, + snp_locations, + start_index, + end_index); for(i = start_index; i < number_of_snps; i++) { - if(snp_locations[i]>= window_start_coordinate && snp_locations[i] < window_end_coordinate) + if(snp_locations[i]>= window_start_coordinate && snp_locations[i] <= window_end_coordinate) { if(child_sequence[i] != '-' || child_sequence[i] != 'N') { @@ -214,7 +217,7 @@ int calculate_block_size_without_gaps_with_start_end_index(char * child_sequence for(i = start_index; i < length_of_original_genome ; i++) { - if(snp_locations[i]< ending_coordinate && snp_locations[i]>= starting_coordinate) + if(snp_locations[i]<= ending_coordinate && snp_locations[i]>= starting_coordinate) { if(child_sequence[i] == '-' || child_sequence[i] == 'N') { diff --git a/src/tree_statistics.c b/src/tree_statistics.c index 5832fba4..bf331e33 100644 --- a/src/tree_statistics.c +++ b/src/tree_statistics.c @@ -35,7 +35,7 @@ void create_tree_statistics_file(char filename[], sample_statistics ** statistic char extension[7] = {".stats"}; concat_strings_created_with_malloc(base_filename,extension); file_pointer = fopen(base_filename, "w"); - fprintf( file_pointer, "Node\tTotal SNPs\tNum of SNPs inside recombinations\tNum of SNPs outside recombinations\tNum of Recombination Blocks\tBases in Recombinations\tr/m\trho/theta\tGenome Length\tBases in Clonal Frame\n"); + fprintf( file_pointer, "Node\tTotal SNPs\tNumber of SNPs Inside Recombinations\tNumber of SNPs Outside Recombinations\tNumber of Recombination Blocks\tBases in Recombinations\tCumulative Bases in Recombinations\tr/m\trho/theta\tGenome Length\tBases in Clonal Frame\n"); for(sample_counter=0; sample_counter< number_of_samples; sample_counter++) { @@ -45,6 +45,7 @@ void create_tree_statistics_file(char filename[], sample_statistics ** statistic fprintf( file_pointer, "%i\t", sample_details->number_of_recombinations); fprintf( file_pointer, "%i\t", (sample_details->number_of_snps)); fprintf( file_pointer, "%i\t", sample_details->number_of_blocks); + fprintf( file_pointer, "%i\t", sample_details->branch_bases_in_recombinations); fprintf( file_pointer, "%i\t", sample_details->bases_in_recombinations); fprintf( file_pointer, "%f\t", recombination_to_mutation_ratio(sample_details->number_of_recombinations, (sample_details->number_of_snps))); fprintf( file_pointer, "%f\t", rho_theta(sample_details->number_of_blocks,sample_details->number_of_snps)); diff --git a/tests/check_gubbins.c b/tests/check_gubbins.c index 8b0578a4..931745c7 100644 --- a/tests/check_gubbins.c +++ b/tests/check_gubbins.c @@ -10,7 +10,7 @@ START_TEST (check_gubbins_no_recombinations) { remove("../tests/data/no_recombinations.tre"); cp("../tests/data/no_recombinations.tre", "../tests/data/no_recombinations.original.tre"); - run_gubbins("../tests/data/no_recombinations.aln.vcf", "../tests/data/no_recombinations.tre","../tests/data/no_recombinations.aln.snp_sites.aln",3,"../tests/data/no_recombinations.aln.snp_sites.aln",100,10000); + run_gubbins("../tests/data/no_recombinations.aln.vcf", "../tests/data/no_recombinations.tre","../tests/data/no_recombinations.aln.snp_sites.aln",3,"../tests/data/no_recombinations.aln.snp_sites.aln",100,10000,0.05,1,0); ck_assert(file_exists("../tests/data/no_recombinations.tre.tab") == 1); ck_assert(file_exists("../tests/data/no_recombinations.tre.vcf") == 1); ck_assert(file_exists("../tests/data/no_recombinations.tre.phylip") == 1); @@ -37,7 +37,7 @@ START_TEST (check_gubbins_one_recombination) { remove("../tests/data/one_recombination.tre"); cp("../tests/data/one_recombination.tre", "../tests/data/one_recombination.original.tre"); - run_gubbins("../tests/data/one_recombination.aln.vcf", "../tests/data/one_recombination.tre","../tests/data/one_recombination.aln.snp_sites.aln",3,"../tests/data/one_recombination.aln.snp_sites.aln",100,10000); + run_gubbins("../tests/data/one_recombination.aln.vcf", "../tests/data/one_recombination.tre","../tests/data/one_recombination.aln.snp_sites.aln",3,"../tests/data/one_recombination.aln.snp_sites.aln",100,10000,0.05,1,0); ck_assert(file_exists("../tests/data/one_recombination.tre.tab") == 1); ck_assert(file_exists("../tests/data/one_recombination.tre.vcf") == 1); ck_assert(file_exists("../tests/data/one_recombination.tre.phylip") == 1); @@ -68,7 +68,13 @@ START_TEST (check_gubbins_multiple_recombinations) remove("../tests/data/multiple_recombinations.tre"); cp("../tests/data/multiple_recombinations.tre", "../tests/data/multiple_recombinations.original.tre"); - run_gubbins("../tests/data/multiple_recombinations.aln.vcf", "../tests/data/multiple_recombinations.tre","../tests/data/multiple_recombinations.aln.snp_sites.aln",3,"../tests/data/multiple_recombinations.aln.snp_sites.aln",100,10000); + run_gubbins("../tests/data/multiple_recombinations.aln.vcf", + "../tests/data/multiple_recombinations.tre", + "../tests/data/multiple_recombinations.jar.aln", + 3, + "../tests/data/multiple_recombinations.aln", + 30,100,0.05,1,0); + ck_assert(file_exists("../tests/data/multiple_recombinations.tre.tab") == 1); ck_assert(file_exists("../tests/data/multiple_recombinations.tre.vcf") == 1); ck_assert(file_exists("../tests/data/multiple_recombinations.tre.phylip") == 1); @@ -76,9 +82,9 @@ START_TEST (check_gubbins_multiple_recombinations) ck_assert(file_exists("../tests/data/multiple_recombinations.tre.gff") == 1); ck_assert(file_exists("../tests/data/multiple_recombinations.tre.snp_sites.aln") == 1); - ck_assert(number_of_recombinations_in_file("../tests/data/multiple_recombinations.tre.tab") == 3); - ck_assert(compare_files("../tests/data/multiple_recombinations.tre","../tests/data/multiple_recombinations.expected.tre") == 1); - ck_assert(compare_files("../tests/data/multiple_recombinations.tre.branch_snps.tab","../tests/data/multiple_recombinations.tre.branch_snps.expected.tab") == 1); + ck_assert(number_of_recombinations_in_file("../tests/data/multiple_recombinations.tre.tab") == 4); + ck_assert(compare_files("../tests/data/multiple_recombinations.tre","../tests/data/multiple_recombinations.expected.tre") == 1); + ck_assert(compare_files("../tests/data/multiple_recombinations.tre.branch_snps.tab","../tests/data/multiple_recombinations.tre.branch_snps.expected.tab") == 1); remove("../tests/data/multiple_recombinations.tre"); remove("../tests/data/multiple_recombinations.tre.tab"); @@ -97,10 +103,15 @@ START_TEST (check_recombination_at_root) cp("../tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1", "../tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1.original.tre"); - run_gubbins("../tests/data/recombination_at_root/recombination_at_root.aln.gaps.vcf", "../tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1", "../tests/data/recombination_at_root/recombination_at_root.aln.gaps.snp_sites.aln",3,"../tests/data/recombination_at_root/recombination_at_root.aln",100,10000); - + run_gubbins("../tests/data/recombination_at_root/recombination_at_root.aln.gaps.vcf", + "../tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1", + "../tests/data/recombination_at_root/recombination_at_root.aln.gaps.snp_sites.aln", + 3, + "../tests/data/recombination_at_root/recombination_at_root.aln", + 100,10000,0.05,1,0); + ck_assert(compare_files("../tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1.tab","../tests/data/recombination_at_root/expected_RAxML_result.recombination_at_root.iteration_1.tab") == 1); - + ck_assert(file_exists("../tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1.vcf") == 1); ck_assert(file_exists("../tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1.tab") == 1); ck_assert(file_exists("../tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1.stats") == 1); diff --git a/tests/check_snp_searching.c b/tests/check_snp_searching.c index 17414734..a2725a6e 100644 --- a/tests/check_snp_searching.c +++ b/tests/check_snp_searching.c @@ -150,73 +150,72 @@ END_TEST START_TEST (check_get_window_end_coordinates_excluding_gaps) { - int coords_empty[0] = {}; - int coords_one[1] = {1}; - int coords_odd[3] = {1,3,5}; - int coords_even[8] = {1,3,5,7,11,13,17,19}; - char *child_sequence_without_gaps = "ACGTACGT"; - char *child_sequence_with_gaps = "-AC-GT-A"; - - //int get_window_end_coordinates_excluding_gaps(1, 3, coords_even, char * child_sequence, int number_of_snps) + int coords_empty[0] = {}; + int coords_one[1] = {1}; + int coords_odd[3] = {1,3,5}; + int coords_even[8] = {1,3,5,7,11,13,17,19}; + char *child_sequence_without_gaps = "ACGTACGT"; + char *child_sequence_with_gaps = "-AC-GT-A"; - ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_empty,child_sequence_without_gaps, 0) == 3); - ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_one, child_sequence_without_gaps, 1) == 3); - ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_one, child_sequence_without_gaps, 1) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_odd, child_sequence_without_gaps, 3) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(3, 3, coords_odd, child_sequence_without_gaps, 3) == 6); + //int get_window_end_coordinates_excluding_gaps(1, 3, coords_even, char * child_sequence, int number_of_snps) - ck_assert( get_window_end_coordinates_excluding_gaps(5, 3, coords_odd, child_sequence_without_gaps, 3) == 6); - ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_odd, child_sequence_without_gaps, 3) == 3); + ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_empty,child_sequence_without_gaps, 0) == 3); + ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_one, child_sequence_without_gaps, 1) == 3); + ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_one, child_sequence_without_gaps, 1) == 4); + ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_odd, child_sequence_without_gaps, 3) == 4); + ck_assert( get_window_end_coordinates_excluding_gaps(3, 3, coords_odd, child_sequence_without_gaps, 3) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(5, 3, coords_odd, child_sequence_without_gaps, 3) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_odd, child_sequence_without_gaps, 3) == 4); + + ck_assert( get_window_end_coordinates_excluding_gaps(2, 3, coords_odd, child_sequence_without_gaps, 3) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(4, 3, coords_odd, child_sequence_without_gaps, 3) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_even, child_sequence_without_gaps, 8) == 4); + ck_assert( get_window_end_coordinates_excluding_gaps(3, 3, coords_even, child_sequence_without_gaps, 8) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(5, 3, coords_even, child_sequence_without_gaps, 8) == 8); + ck_assert( get_window_end_coordinates_excluding_gaps(7, 3, coords_even, child_sequence_without_gaps, 8) == 8); + ck_assert( get_window_end_coordinates_excluding_gaps(9, 3, coords_even, child_sequence_without_gaps, 8) == 12); + ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_even, child_sequence_without_gaps, 8) == 4); + ck_assert( get_window_end_coordinates_excluding_gaps(2, 3, coords_even, child_sequence_without_gaps, 8) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(4, 3, coords_even, child_sequence_without_gaps, 8) == 8); + ck_assert( get_window_end_coordinates_excluding_gaps(6, 3, coords_even, child_sequence_without_gaps, 8) == 8); + ck_assert( get_window_end_coordinates_excluding_gaps(8, 3, coords_even, child_sequence_without_gaps, 8) == 12); + + ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_empty,child_sequence_with_gaps, 0) == 3); + ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_one, child_sequence_with_gaps, 1) == 4); + ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_one, child_sequence_with_gaps, 1) == 5); + ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_odd, child_sequence_with_gaps, 3) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(3, 3, coords_odd, child_sequence_with_gaps, 3) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(5, 3, coords_odd, child_sequence_with_gaps, 3) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_odd, child_sequence_with_gaps, 3) == 4); + ck_assert( get_window_end_coordinates_excluding_gaps(2, 3, coords_odd, child_sequence_with_gaps, 3) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(4, 3, coords_odd, child_sequence_with_gaps, 3) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_even, child_sequence_with_gaps, 8) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(3, 3, coords_even, child_sequence_with_gaps, 8) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(5, 3, coords_even, child_sequence_with_gaps, 8) == 8); + ck_assert( get_window_end_coordinates_excluding_gaps(7, 3, coords_even, child_sequence_with_gaps, 8) == 12); + ck_assert( get_window_end_coordinates_excluding_gaps(9, 3, coords_even, child_sequence_with_gaps, 8) == 12); + ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_even, child_sequence_with_gaps, 8) == 4); + ck_assert( get_window_end_coordinates_excluding_gaps(2, 3, coords_even, child_sequence_with_gaps, 8) == 6); + ck_assert( get_window_end_coordinates_excluding_gaps(4, 3, coords_even, child_sequence_with_gaps, 8) == 8); + ck_assert( get_window_end_coordinates_excluding_gaps(6, 3, coords_even, child_sequence_with_gaps, 8) == 8); + ck_assert( get_window_end_coordinates_excluding_gaps(8, 3, coords_even, child_sequence_with_gaps, 8) == 12); - ck_assert( get_window_end_coordinates_excluding_gaps(2, 3, coords_odd, child_sequence_without_gaps, 3) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(4, 3, coords_odd, child_sequence_without_gaps, 3) == 6); - ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_even, child_sequence_without_gaps, 8) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(3, 3, coords_even, child_sequence_without_gaps, 8) == 6); - ck_assert( get_window_end_coordinates_excluding_gaps(5, 3, coords_even, child_sequence_without_gaps, 8) == 8); - ck_assert( get_window_end_coordinates_excluding_gaps(7, 3, coords_even, child_sequence_without_gaps, 8) == 8); - ck_assert( get_window_end_coordinates_excluding_gaps(9, 3, coords_even, child_sequence_without_gaps, 8) == 12); - ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_even, child_sequence_without_gaps, 8) == 3); - ck_assert( get_window_end_coordinates_excluding_gaps(2, 3, coords_even, child_sequence_without_gaps, 8) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(4, 3, coords_even, child_sequence_without_gaps, 8) == 6); - ck_assert( get_window_end_coordinates_excluding_gaps(6, 3, coords_even, child_sequence_without_gaps, 8) == 8); - ck_assert( get_window_end_coordinates_excluding_gaps(8, 3, coords_even, child_sequence_without_gaps, 8) == 11); - - ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_empty,child_sequence_with_gaps, 0) == 3); - ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_one, child_sequence_with_gaps, 1) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_one, child_sequence_with_gaps, 1) == 5); - ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_odd, child_sequence_with_gaps, 3) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(3, 3, coords_odd, child_sequence_with_gaps, 3) == 6); - ck_assert( get_window_end_coordinates_excluding_gaps(5, 3, coords_odd, child_sequence_with_gaps, 3) == 6); - ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_odd, child_sequence_with_gaps, 3) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(2, 3, coords_odd, child_sequence_with_gaps, 3) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(4, 3, coords_odd, child_sequence_with_gaps, 3) == 6); - ck_assert( get_window_end_coordinates_excluding_gaps(1, 3, coords_even, child_sequence_with_gaps, 8) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(3, 3, coords_even, child_sequence_with_gaps, 8) == 6); - ck_assert( get_window_end_coordinates_excluding_gaps(5, 3, coords_even, child_sequence_with_gaps, 8) == 8); - ck_assert( get_window_end_coordinates_excluding_gaps(7, 3, coords_even, child_sequence_with_gaps, 8) == 8); - ck_assert( get_window_end_coordinates_excluding_gaps(9, 3, coords_even, child_sequence_with_gaps, 8) == 12); - ck_assert( get_window_end_coordinates_excluding_gaps(0, 3, coords_even, child_sequence_with_gaps, 8) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(2, 3, coords_even, child_sequence_with_gaps, 8) == 4); - ck_assert( get_window_end_coordinates_excluding_gaps(4, 3, coords_even, child_sequence_with_gaps, 8) == 6); - ck_assert( get_window_end_coordinates_excluding_gaps(6, 3, coords_even, child_sequence_with_gaps, 8) == 8); - ck_assert( get_window_end_coordinates_excluding_gaps(8, 3, coords_even, child_sequence_with_gaps, 8) == 11); - } END_TEST START_TEST (check_find_number_of_snps_in_block) { - int coords_empty[0] = {}; - int coords_even[8] = {1,3,5,7,11,13,17,19}; - char *child_sequence = "AAAAA-AAAAAAAAAAAAA"; + int coords_empty[0] = {}; + int coords_even[8] = {1,3,5,7,11,13,17,19}; + char *child_sequence = "AAAAA-AAAAAAAAAAAAA"; - ck_assert( find_number_of_snps_in_block(1,3, coords_empty, child_sequence, 0) == 0); - ck_assert( find_number_of_snps_in_block(2,2, coords_even, child_sequence, 8) == 0); - ck_assert( find_number_of_snps_in_block(1,3, coords_even, child_sequence, 8) == 1); - ck_assert( find_number_of_snps_in_block(1,4, coords_even, child_sequence, 8) == 2); - ck_assert( find_number_of_snps_in_block(1,5, coords_even, child_sequence, 8) == 2); - ck_assert( find_number_of_snps_in_block(1,19, coords_even, child_sequence, 8) == 7); - ck_assert( find_number_of_snps_in_block(0,20, coords_even, child_sequence, 8) == 8); + ck_assert( find_number_of_snps_in_block(1,3, coords_empty, child_sequence, 0) == 0); + ck_assert( find_number_of_snps_in_block(2,2, coords_even, child_sequence, 8) == 0); + ck_assert( find_number_of_snps_in_block(1,3, coords_even, child_sequence, 8) == 2); + ck_assert( find_number_of_snps_in_block(1,4, coords_even, child_sequence, 8) == 2); + ck_assert( find_number_of_snps_in_block(1,5, coords_even, child_sequence, 8) == 3); + ck_assert( find_number_of_snps_in_block(1,19, coords_even, child_sequence, 8) == 8); + ck_assert( find_number_of_snps_in_block(0,20, coords_even, child_sequence, 8) == 8); } END_TEST diff --git a/tests/data/multiple_recombinations.aln.vcf b/tests/data/multiple_recombinations.aln.vcf index 8f6c4656..0af9a27f 100644 --- a/tests/data/multiple_recombinations.aln.vcf +++ b/tests/data/multiple_recombinations.aln.vcf @@ -1,216 +1,232 @@ -##fileformat=VCFv4.1 -##INFO= -#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sequence_1 sequence_2 sequence_3 sequence_4 sequence_5 sequence_6 sequence_7 sequence_8 sequence_9 sequence_10 -1 15 . A G . . AB . . . . . G G G G G G -1 21 . A C . . AB . . C C C . . . . . . -1 23 . A G . . AB . . . . . . . G G G . -1 27 . A C . . AB . . . . . . C C C . . -1 28 . A G . . AB . . . . . . . G G G G -1 29 . A G . . AB . . . . . . . G G G G -1 30 . A G . . AB . . . . . . . G G G G -1 31 . A G . . AB . . . . . . . G G G G -1 32 . A G . . AB . . . . . . . G G G G -1 33 . A G . . AB . . . . . . . G G G G -1 34 . A T,G . . AB . . . T T . . G G G G -1 35 . A G . . AB . . . . . . . G G G G -1 36 . A G . . AB . . . . . . . G G G G -1 37 . A G . . AB . . . . . . . G G G G -1 38 . A G . . AB . . . . . . . G G G G -1 39 . A G . . AB . . . . . . . G G G G -1 40 . A G . . AB . . . . G . . G G G G -1 41 . A G . . AB . . . . . . . G G G G -1 42 . A G . . AB . . . . . . . G G G G -1 43 . A G . . AB . . . . . . . G G G G -1 44 . A G . . AB . . . . . . . G G G G -1 45 . A G . . AB . . . . . . . G G G G -1 46 . A G . . AB . . . . . . . G G G G -1 47 . A G . . AB . . . . . . . G G G G -1 48 . A G . . AB . . . . . . . G G G G -1 50 . A C . . AB . . C C C C . . . . . -1 52 . A C . . AB . . C C C C . . . . . -1 53 . A C . . AB . . C C C C . . . . . -1 54 . A C . . AB . . C C C C . . . . . -1 55 . A C . . AB . . C C C C . . . . . -1 56 . A C . . AB . . C C C C . . . . . -1 57 . A C . . AB . . C C C C . . . . . -1 58 . A C . . AB . . C C C C . . . . . -1 59 . A C . . AB . . C C C C . . . . . -1 60 . A C . . AB . . C C C C . . . . . -1 61 . A C . . AB . . C C C C . . . . . -1 62 . A C . . AB . . C C C C . . . . . -1 63 . A C . . AB . . C C C C . . . . . -1 64 . A C . . AB . . C C C C . . . . . -1 65 . A C . . AB . . C C C C . . . . . -1 66 . A C . . AB . . C C C C . . . . . -1 68 . A C . . AB . . C C C C . . . . . -1 69 . A C . . AB . . C C C C . . . . . -1 70 . A C . . AB . . C C C C . . . . . -1 72 . A C . . AB . . C C C C . . . . . -1 73 . A C . . AB . . C C C C . . . . . -1 74 . A C . . AB . . C C C C . . . . . -1 75 . A C . . AB . . C C C C . . . . . -1 76 . A C . . AB . . C C C C . . . . . -1 77 . A C . . AB . . C C C C . . . . . -1 78 . A C . . AB . . C C C C . . . . . -1 79 . A C . . AB . . C C C C . . . . . -1 80 . A C . . AB . . C C C C . . . . . -1 81 . A C . . AB . . C C C C . . . . . -1 83 . A C,T . . AB . . C C C C . . . . T -1 84 . A T . . AB . . . . . . . . . . T -1 85 . A T . . AB . . . . . . . . . . T -1 86 . A T . . AB . . . . . . . . . . T -1 87 . A T . . AB . . . . . . . . . . T -1 88 . A T . . AB . . . . . . . . . . T -1 89 . A T . . AB . . . . . . . . . . T -1 90 . A T . . AB . . . . . . . . . . T -1 91 . A T . . AB . . . . . . . . . . T -1 92 . A T . . AB . . . . . . . . . . T -1 93 . A T . . AB . . . . . . . . . . T -1 94 . A T . . AB . . . . . . . . . . T -1 95 . A T . . AB . . . . . . . . . . T -1 96 . A T . . AB . . . . . . . . . . T -1 97 . A T . . AB . . . . . . . . . . T -1 98 . A T . . AB . . . . . . . . . . T -1 99 . A T . . AB . . . . . . . . . . T -1 100 . A T . . AB . . . . . . . . . . T -1 101 . A T . . AB . . . . . . . . . . T -1 102 . A T . . AB . . . . . . . . . . T -1 103 . A T . . AB . . . . . . . . . . T -1 104 . A T . . AB . . . . . . . . . . T -1 105 . A T . . AB . . . . . . . . . . T -1 106 . A T . . AB . . . . . . . . . . T -1 107 . A T . . AB . . . . . . . . . . T -1 108 . A T . . AB . . . . . . . . . . T -1 109 . A T . . AB . . . . . . . . . . T -1 110 . A T . . AB . . . . . . . . . . T -1 111 . A T . . AB . . . . . . . . . . T -1 112 . A T . . AB . . . . . . . . . . T -1 113 . A T . . AB . . . . . . . . . . T -1 114 . A T . . AB . . . . . . . . . . T -1 115 . A T . . AB . . . . . . . . . . T -1 116 . A T . . AB . . . . . . . . . . T -1 117 . A T . . AB . . . . . . . . . . T -1 118 . A T . . AB . . . . . . . . . . T -1 119 . A T . . AB . . . . . . . . . . T -1 120 . A T . . AB . . . . . . . . . . T -1 121 . A T . . AB . . . . . . . . . . T -1 122 . A T . . AB . . . . . . . . . . T -1 123 . A T . . AB . . . . . . . . . . T -1 124 . A T . . AB . . . . . . . . . . T -1 125 . A T . . AB . . . . . . . . . . T -1 126 . A T . . AB . . . . . . . . . . T -1 127 . A T . . AB . . . . . . . . . . T -1 128 . A T . . AB . . . . . . . . . . T -1 129 . A T . . AB . . . . . . . . . . T -1 130 . A T . . AB . . . . . . . . . . T -1 131 . A T . . AB . . . . . . . . . . T -1 132 . A T . . AB . . . . . . . . . . T -1 133 . A T . . AB . . . . . . . . . . T -1 134 . A T . . AB . . . . . . . . . . T -1 135 . A T . . AB . . . . . . . . . . T -1 136 . A T . . AB . . . . . . . . . . T -1 137 . A T . . AB . . . . . . . . . . T -1 138 . A T . . AB . . . . . . . . . . T -1 139 . A T . . AB . . . . . . . . . . T -1 140 . A T . . AB . . . . . . . . . . T -1 141 . A T . . AB . . . . . . . . . . T -1 142 . A T . . AB . . . . . . . . . . T -1 143 . A T . . AB . . . . . . . . . . T -1 144 . A T . . AB . . . . . . . . . . T -1 145 . A T . . AB . . . . . . . . . . T -1 146 . A T . . AB . . . . . . . . . . T -1 147 . A T . . AB . . . . . . . . . . T -1 148 . A T . . AB . . . . . . . . . . T -1 149 . A T . . AB . . . . . . . . . . T -1 150 . A T . . AB . . . . . . . . . . T -1 151 . A T . . AB . . . . . . . . . . T -1 152 . A T . . AB . . . . . . . . . . T -1 153 . A T . . AB . . . . . . . . . . T -1 154 . A T . . AB . . . . . . . . . . T -1 155 . A T . . AB . . . . . . . . . . T -1 156 . A T . . AB . . . . . . . . . . T -1 157 . A T . . AB . . . . . . . . . . T -1 158 . A T . . AB . . . . . . . . . . T -1 159 . A T . . AB . . . . . . . . . . T -1 160 . A T . . AB . . . . . . . . . . T -1 161 . A T . . AB . . . . . . . . . . T -1 162 . A T . . AB . . . . . . . . . . T -1 163 . A T . . AB . . . . . . . . . . T -1 164 . A T . . AB . . . . . . . . . . T -1 165 . A T . . AB . . . . . . . . . . T -1 166 . A T . . AB . . . . . . . . . . T -1 167 . A T . . AB . . . . . . . . . . T -1 168 . A T . . AB . . . . . . . . . . T -1 169 . A T . . AB . . . . . . . . . . T -1 170 . A T . . AB . . . . . . . . . . T -1 171 . A T . . AB . . . . . . . . . . T -1 172 . A T . . AB . . . . . . . . . . T -1 173 . A T . . AB . . . . . . . . . . T -1 174 . A T . . AB . . . . . . . . . . T -1 175 . A T . . AB . . . . . . . . . . T -1 176 . A T . . AB . . . . . . . . . . T -1 177 . A T . . AB . . . . . . . . . . T -1 178 . A T . . AB . . . . . . . . . . T -1 179 . A T . . AB . . . . . . . . . . T -1 180 . A T . . AB . . . . . . . . . . T -1 181 . A T . . AB . . . . . . . . . . T -1 182 . A T . . AB . . . . . . . . . . T -1 183 . A T . . AB . . . . . . . . . . T -1 184 . A T . . AB . . . . . . . . . . T -1 185 . A T . . AB . . . . . . . . . . T -1 186 . A T . . AB . . . . . . . . . . T -1 187 . A T . . AB . . . . . . . . . . T -1 188 . A T . . AB . . . . . . . . . . T -1 189 . A T . . AB . . . . . . . . . . T -1 190 . A T . . AB . . . . . . . . . . T -1 191 . A T . . AB . . . . . . . . . . T -1 192 . A T . . AB . . . . . . . . . . T -1 193 . A T . . AB . . . . . . . . . . T -1 194 . A T . . AB . . . . . . . . . . T -1 195 . A T . . AB . . . . . . . . . . T -1 196 . A T . . AB . . . . . . . . . . T -1 197 . A T . . AB . . . . . . . . . . T -1 198 . A T . . AB . . . . . . . . . . T -1 199 . A T . . AB . . . . . . . . . . T -1 200 . A T . . AB . . . . . . . . . . T -1 201 . A T . . AB . . . . . . . . . . T -1 202 . A T . . AB . . . . . . . . . . T -1 203 . A T . . AB . . . . . . . . . . T -1 204 . A T . . AB . . . . . . . . . . T -1 205 . A T . . AB . . . . . . . . . . T -1 206 . A T . . AB . . . . . . . . . . T -1 207 . A T . . AB . . . . . . . . . . T -1 208 . A T . . AB . . . . . . . . . . T -1 209 . A T . . AB . . . . . . . . . . T -1 210 . A T . . AB . . . . . . . . . . T -1 211 . A T . . AB . . . . . . . . . . T -1 212 . A T . . AB . . . . . . . . . . T -1 213 . A T . . AB . . . . . . . . . . T -1 214 . A T . . AB . . . . . . . . . . T -1 215 . A T . . AB . . . . . . . . . . T -1 216 . A T . . AB . . . . . . . . . . T -1 217 . A T . . AB . . . . . . . . . . T -1 218 . A T . . AB . . . . . . . . . . T -1 219 . A T . . AB . . . . . . . . . . T -1 220 . A T . . AB . . . . . . . . . . T -1 221 . A T . . AB . . . . . . . . . . T -1 222 . A T . . AB . . . . . . . . . . T -1 223 . A T . . AB . . . . . . . . . . T -1 224 . A T . . AB . . . . . . . . . . T -1 225 . A T . . AB . . . . . . . . . . T -1 226 . A T . . AB . . . . . . . . . . T -1 227 . A T . . AB . . . . . . . . . . T -1 228 . A T . . AB . . . . . . . . . . T -1 229 . A T . . AB . . . . . . . . . . T -1 230 . A T . . AB . . . . . . . . . . T -1 231 . A T . . AB . . . . . . . . . . T -1 232 . A T . . AB . . . . . . . . . . T -1 233 . A T . . AB . . . . . . . . . . T -1 234 . A T . . AB . . . . . . . . . . T -1 235 . A T . . AB . . . . . . . . . . T -1 236 . A T . . AB . . . . . . . . . . T -1 237 . A T . . AB . . . . . . . . . . T -1 238 . A T . . AB . . . . . . . . . . T -1 239 . A T . . AB . . . . . . . . . . T -1 240 . A T . . AB . . . . . . . . . . T -1 241 . A T . . AB . . . . . . . . . . T +##fileformat=VCFv4.2 +##contig= +##FORMAT= +#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sequence_1 sequence_2 sequence_3 sequence_4 sequence_5 sequence_6 sequence_7 sequence_8 sequence_9 sequence_10 +1 1 . A - . . . AB A A A A A A A A - - +1 2 . A - . . . AB A A A A A A A A - - +1 3 . A - . . . AB A A A A A A A A - - +1 4 . A - . . . AB A A A A A A A A - - +1 5 . A - . . . AB A A A A A A A A - - +1 6 . A - . . . AB A A A A A A A A - - +1 7 . A - . . . AB A A A A A A A A - - +1 8 . A - . . . AB A A A A A A A A - - +1 9 . A - . . . AB A A A A A A A A - - +1 10 . A - . . . AB A A A A A A A A - - +1 11 . A - . . . AB A A A A A A A A - - +1 12 . A - . . . AB A A A A A A A A - - +1 13 . A - . . . AB A A A A A A A A - - +1 14 . A - . . . AB A A A A A A A A - - +1 15 . A - . . . AB A A A A A A A A - - +1 16 . A G . . . AB A A A A G G G G G G +1 22 . A C . . . AB A C C C A A A A A A +1 24 . A G . . . AB A A A A A A G G G A +1 28 . A C . . . AB A A A A A C C C A A +1 29 . A G . . . AB A A A A A A G G G G +1 30 . A G . . . AB A A A A A A G G G G +1 31 . A G . . . AB A A A A A A G G G G +1 32 . A G . . . AB A A A A A A G G G G +1 33 . A G . . . AB A A A A A A G G G G +1 34 . A G . . . AB A A A A A A G G G G +1 35 . A T,G . . . AB A A T T A A G G G G +1 36 . A G . . . AB A A A A A A G G G G +1 37 . A G . . . AB A A A A A A G G G G +1 38 . A G . . . AB A A A A A A G G G G +1 39 . A G . . . AB A A A A A A G G G G +1 40 . A G . . . AB A A A A A A G G G G +1 41 . A G . . . AB A A A G A A G G G G +1 42 . A G . . . AB A A A A A A G G G G +1 43 . A G . . . AB A A A A A A G G G G +1 44 . A G . . . AB A A A A A A G G G G +1 45 . A G . . . AB A A A A A A G G G G +1 46 . A G . . . AB A A A A A A G G G G +1 47 . A G . . . AB A A A A A A G G G G +1 48 . A G . . . AB A A A A A A G G G G +1 49 . A G . . . AB A A A A A A G G G G +1 51 . A C . . . AB A C C C C A A A A A +1 53 . A C . . . AB A C C C C A A A A A +1 54 . A C . . . AB A C C C C A A A A A +1 55 . A C . . . AB A C C C C A A A A A +1 56 . A C . . . AB A C C C C A A A A A +1 57 . A C . . . AB A C C C C A A A A A +1 58 . A C . . . AB A C C C C A A A A A +1 59 . A C . . . AB A C C C C A A A A A +1 60 . A C . . . AB A C C C C A A A A A +1 61 . A C . . . AB A C C C C A A A A A +1 62 . A C . . . AB A C C C C A A A A A +1 63 . A C . . . AB A C C C C A A A A A +1 64 . A C . . . AB A C C C C A A A A A +1 65 . A C . . . AB A C C C C A A A A A +1 66 . A C . . . AB A C C C C A A A A A +1 67 . A C . . . AB A C C C C A A A A A +1 69 . A C . . . AB A C C C C A A A A A +1 70 . A C . . . AB A C C C C A A A A A +1 71 . A C . . . AB A C C C C A A A A A +1 73 . A C . . . AB A C C C C A A A A A +1 74 . A C . . . AB A C C C C A A A A A +1 75 . A C . . . AB A C C C C A A A A A +1 76 . A C . . . AB A C C C C A A A A A +1 77 . A C . . . AB A C C C C A A A A A +1 78 . A C . . . AB A C C C C A A A A A +1 79 . A C . . . AB A C C C C A A A A A +1 80 . A C . . . AB A C C C C A A A A A +1 81 . A C . . . AB A C C C C A A A A A +1 82 . A C . . . AB A C C C C A A A A A +1 84 . A C,T . . . AB A C C C C A A A A T +1 85 . A T . . . AB A A A A A A A A A T +1 86 . A T . . . AB A A A A A A A A A T +1 87 . A T . . . AB A A A A A A A A A T +1 88 . A T . . . AB A A A A A A A A A T +1 89 . A T . . . AB A A A A A A A A A T +1 90 . A T . . . AB A A A A A A A A A T +1 91 . A T . . . AB A A A A A A A A A T +1 92 . A T . . . AB A A A A A A A A A T +1 93 . A T . . . AB A A A A A A A A A T +1 94 . A T . . . AB A A A A A A A A A T +1 95 . A T . . . AB A A A A A A A A A T +1 96 . A T . . . AB A A A A A A A A A T +1 97 . A T . . . AB A A A A A A A A A T +1 98 . A T . . . AB A A A A A A A A A T +1 99 . A T . . . AB A A A A A A A A A T +1 100 . A T . . . AB A A A A A A A A A T +1 101 . A T . . . AB A A A A A A A A A T +1 102 . A T . . . AB A A A A A A A A A T +1 103 . A T . . . AB A A A A A A A A A T +1 104 . A T . . . AB A A A A A A A A A T +1 105 . A T . . . AB A A A A A A A A A T +1 106 . A T . . . AB A A A A A A A A A T +1 107 . A T . . . AB A A A A A A A A A T +1 108 . A T . . . AB A A A A A A A A A T +1 109 . A T . . . AB A A A A A A A A A T +1 110 . A T . . . AB A A A A A A A A A T +1 111 . A T . . . AB A A A A A A A A A T +1 112 . A T . . . AB A A A A A A A A A T +1 113 . A T . . . AB A A A A A A A A A T +1 114 . A T . . . AB A A A A A A A A A T +1 115 . A T . . . AB A A A A A A A A A T +1 116 . A T . . . AB A A A A A A A A A T +1 117 . A T . . . AB A A A A A A A A A T +1 118 . A T . . . AB A A A A A A A A A T +1 119 . A T . . . AB A A A A A A A A A T +1 120 . A T . . . AB A A A A A A A A A T +1 121 . A T . . . AB A A A A A A A A A T +1 122 . A T . . . AB A A A A A A A A A T +1 123 . A T . . . AB A A A A A A A A A T +1 124 . A T . . . AB A A A A A A A A A T +1 125 . A T . . . AB A A A A A A A A A T +1 126 . A T . . . AB A A A A A A A A A T +1 127 . A T . . . AB A A A A A A A A A T +1 128 . A T . . . AB A A A A A A A A A T +1 129 . A T . . . AB A A A A A A A A A T +1 130 . A T . . . AB A A A A A A A A A T +1 131 . A T . . . AB A A A A A A A A A T +1 132 . A T . . . AB A A A A A A A A A T +1 133 . A T . . . AB A A A A A A A A A T +1 134 . A T . . . AB A A A A A A A A A T +1 135 . A T . . . AB A A A A A A A A A T +1 136 . A T . . . AB A A A A A A A A A T +1 137 . A T . . . AB A A A A A A A A A T +1 138 . A T . . . AB A A A A A A A A A T +1 139 . A T . . . AB A A A A A A A A A T +1 140 . A T . . . AB A A A A A A A A A T +1 141 . A T . . . AB A A A A A A A A A T +1 142 . A T . . . AB A A A A A A A A A T +1 143 . A T . . . AB A A A A A A A A A T +1 144 . A T . . . AB A A A A A A A A A T +1 145 . A T . . . AB A A A A A A A A A T +1 146 . A T . . . AB A A A A A A A A A T +1 147 . A T . . . AB A A A A A A A A A T +1 148 . A T . . . AB A A A A A A A A A T +1 149 . A T . . . AB A A A A A A A A A T +1 150 . A T . . . AB A A A A A A A A A T +1 151 . A T . . . AB A A A A A A A A A T +1 152 . A T . . . AB A A A A A A A A A T +1 153 . A T . . . AB A A A A A A A A A T +1 154 . A T . . . AB A A A A A A A A A T +1 155 . A T . . . AB A A A A A A A A A T +1 156 . A T . . . AB A A A A A A A A A T +1 157 . A T . . . AB A A A A A A A A A T +1 158 . A T . . . AB A A A A A A A A A T +1 159 . A T . . . AB A A A A A A A A A T +1 160 . A T . . . AB A A A A A A A A A T +1 161 . A T . . . AB A A A A A A A A A T +1 162 . A T . . . AB A A A A A A A A A T +1 163 . A T . . . AB A A A A A A A A A T +1 164 . A T . . . AB A A A A A A A A A T +1 165 . A T . . . AB A A A A A A A A A T +1 166 . A T . . . AB A A A A A A A A A T +1 167 . A T . . . AB A A A A A A A A A T +1 168 . A T . . . AB A A A A A A A A A T +1 169 . A T . . . AB A A A A A A A A A T +1 170 . A T . . . AB A A A A A A A A A T +1 171 . A T . . . AB A A A A A A A A A T +1 172 . A T . . . AB A A A A A A A A A T +1 173 . A T . . . AB A A A A A A A A A T +1 174 . A T . . . AB A A A A A A A A A T +1 175 . A T . . . AB A A A A A A A A A T +1 176 . A T . . . AB A A A A A A A A A T +1 177 . A T . . . AB A A A A A A A A A T +1 178 . A T . . . AB A A A A A A A A A T +1 179 . A T . . . AB A A A A A A A A A T +1 180 . A T . . . AB A A A A A A A A A T +1 181 . A T . . . AB A A A A A A A A A T +1 182 . A T . . . AB A A A A A A A A A T +1 183 . A T . . . AB A A A A A A A A A T +1 184 . A T . . . AB A A A A A A A A A T +1 185 . A T . . . AB A A A A A A A A A T +1 186 . A T . . . AB A A A A A A A A A T +1 187 . A T . . . AB A A A A A A A A A T +1 188 . A T . . . AB A A A A A A A A A T +1 189 . A T . . . AB A A A A A A A A A T +1 190 . A T . . . AB A A A A A A A A A T +1 191 . A T . . . AB A A A A A A A A A T +1 192 . A T . . . AB A A A A A A A A A T +1 193 . A T . . . AB A A A A A A A A A T +1 194 . A T . . . AB A A A A A A A A A T +1 195 . A T . . . AB A A A A A A A A A T +1 196 . A T . . . AB A A A A A A A A A T +1 197 . A T . . . AB A A A A A A A A A T +1 198 . A T . . . AB A A A A A A A A A T +1 199 . A T . . . AB A A A A A A A A A T +1 200 . A T . . . AB A A A A A A A A A T +1 201 . A T . . . AB A A A A A A A A A T +1 202 . A T . . . AB A A A A A A A A A T +1 203 . A T . . . AB A A A A A A A A A T +1 204 . A T . . . AB A A A A A A A A A T +1 205 . A T . . . AB A A A A A A A A A T +1 206 . A T . . . AB A A A A A A A A A T +1 207 . A T . . . AB A A A A A A A A A T +1 208 . A T . . . AB A A A A A A A A A T +1 209 . A T . . . AB A A A A A A A A A T +1 210 . A T . . . AB A A A A A A A A A T +1 211 . A T . . . AB A A A A A A A A A T +1 212 . A T . . . AB A A A A A A A A A T +1 213 . A T . . . AB A A A A A A A A A T +1 214 . A T . . . AB A A A A A A A A A T +1 215 . A T . . . AB A A A A A A A A A T +1 216 . A T . . . AB A A A A A A A A A T +1 217 . A T . . . AB A A A A A A A A A T +1 218 . A T . . . AB A A A A A A A A A T +1 219 . A T . . . AB A A A A A A A A A T +1 220 . A T . . . AB A A A A A A A A A T +1 221 . A T . . . AB A A A A A A A A A T +1 222 . A T . . . AB A A A A A A A A A T +1 223 . A T . . . AB A A A A A A A A A T +1 224 . A T . . . AB A A A A A A A A A T +1 225 . A T . . . AB A A A A A A A A A T +1 226 . A T . . . AB A A A A A A A A A T +1 227 . A T . . . AB A A A A A A A A A T +1 228 . A T . . . AB A A A A A A A A A T +1 229 . A T . . . AB A A A A A A A A A T +1 230 . A T . . . AB A A A A A A A A A T +1 231 . A T . . . AB A A A A A A A A A T +1 232 . A T . . . AB A A A A A A A A A T +1 233 . A T . . . AB A A A A A A A A A T +1 234 . A T . . . AB A A A A A A A A A T +1 235 . A T . . . AB A A A A A A A A A T +1 236 . A T . . . AB A A A A A A A A A T +1 237 . A T . . . AB A A A A A A A A A T +1 238 . A T . . . AB A A A A A A A A A T +1 239 . A T . . . AB A A A A A A A A A T +1 240 . A T . . . AB A A A A A A A A A T +1 241 . A T . . . AB A A A A A A A A A T +1 242 . A T . . . AB A A A A A A A A A T diff --git a/tests/data/multiple_recombinations.expected.tre b/tests/data/multiple_recombinations.expected.tre index d05ed2e4..1d7f9395 100644 --- a/tests/data/multiple_recombinations.expected.tre +++ b/tests/data/multiple_recombinations.expected.tre @@ -1 +1 @@ -(sequence_3:0.000000,sequence_4:0.392946,(sequence_2:0.000000,(sequence_5:0.292537,(((sequence_7:0.000000,sequence_8:0.000000)N6:0.000000,(sequence_6:0.000000,sequence_1:1.218891)N7:15.292978)N5:0.051192,(sequence_10:395.160065,sequence_9:0.024964)N8:0.419411)N4:70.987579)N3:0.471393)N2:0.393420)N1:0.000000; \ No newline at end of file +(sequence_10:8.688181,(((((sequence_7:0.000000,sequence_8:0.000000)Node_1:1.408467,sequence_9:0.273120)Node_2:1.278621,sequence_6:0.441660)Node_3:0.709665,sequence_5:0.000000)Node_4:0.810390,(sequence_1:0.266832,(sequence_3:0.000000,(sequence_2:0.000000,sequence_4:0.000000)Node_5:0.000000)Node_6:1.063758)Node_7:0.941001)Node_8:4.481039)Node_9:0.000000; \ No newline at end of file diff --git a/tests/data/multiple_recombinations.jar.aln b/tests/data/multiple_recombinations.jar.aln new file mode 100644 index 00000000..ea202bf1 --- /dev/null +++ b/tests/data/multiple_recombinations.jar.aln @@ -0,0 +1,68 @@ +>sequence_1 +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>sequence_2 +AAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCC +CCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>sequence_3 +AAAAAAAAAAAAAAAACAAAAAAAATAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCC +CCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>sequence_4 +AAAAAAAAAAAAAAAACAAAAAAAATAAAAAGAAAAAAAACCCCCCCCCCCCCCCCCCCC +CCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>sequence_5 +AAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCC +CCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>sequence_6 +AAAAAAAAAAAAAAAGAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>sequence_7 +AAAAAAAAAAAAAAAGAGCGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>sequence_8 +AAAAAAAAAAAAAAAGAGCGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>sequence_9 +---------------GAGAGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>sequence_10 +---------------GAAAGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAA +AAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT +TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT +TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT +>Node_1 +AAAAAAAAAAAAAAAGAGCGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>Node_2 +AAAAAAAAAAAAAAAGAGCGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>Node_3 +AAAAAAAAAAAAAAAGAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>Node_4 +AAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>Node_5 +AAAAAAAAAAAAAAAACAAAAAAAATAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>Node_6 +AAAAAAAAAAAAAAAACAAAAAAAATAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>Node_7 +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>Node_8 +AAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +>Node_9 +AAAAAAAAAAAAAAAGAAAGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT diff --git a/tests/data/multiple_recombinations.original.tre b/tests/data/multiple_recombinations.original.tre index 12aa4010..c6ea9622 100644 --- a/tests/data/multiple_recombinations.original.tre +++ b/tests/data/multiple_recombinations.original.tre @@ -1 +1 @@ -(sequence_3:0.000000,sequence_4:0.004974,(sequence_2:0.000000,(sequence_5:0.003703,(((sequence_7:0.000000,sequence_8:0.000000)N6:0.000000,(sequence_6:0.000000,sequence_1:0.015429)N7:0.193582)N5:0.000648,(sequence_10:5.002026,sequence_9:0.000316)N8:0.005309)N4:0.898577)N3:0.005967)N2:0.004980)N1; +(sequence_10:0.1703565,(((((sequence_7:0.0,sequence_8:0.0)Node_1:0.027617,sequence_9:0.0053553)Node_2:0.025071,sequence_6:0.00866)Node_3:0.013915,sequence_5:0.0)Node_4:0.01589,(sequence_1:0.005232,(sequence_3:0.0,(sequence_2:0.0,sequence_4:0.0)Node_5:0.0)Node_6:0.020858)Node_7:0.018451)Node_8:0.08786350000000001)Node_9; diff --git a/tests/data/multiple_recombinations.tre b/tests/data/multiple_recombinations.tre new file mode 100644 index 00000000..16c05bf8 --- /dev/null +++ b/tests/data/multiple_recombinations.tre @@ -0,0 +1 @@ +(sequence_10:0.000000,(((((sequence_7:0.000000,sequence_8:0.000000)Node_1:0.000000,sequence_9:40.000000)Node_2:880.000000,sequence_6:0.000000)Node_3:40.000000,sequence_5:1200.000000)Node_4:0.000000,((sequence_3:0.000000,(sequence_2:40.000000,sequence_4:40.000000)Node_5:0.000000)Node_6:1280.000000,sequence_1:0.000000)Node_7:40.000000)Node_8:7200.000000)Node_9:0.000000; \ No newline at end of file diff --git a/tests/data/multiple_recombinations.tre.branch_snps.expected.tab b/tests/data/multiple_recombinations.tre.branch_snps.expected.tab index 44de466e..26f61f5e 100644 --- a/tests/data/multiple_recombinations.tre.branch_snps.expected.tab +++ b/tests/data/multiple_recombinations.tre.branch_snps.expected.tab @@ -1,1452 +1,1614 @@ -FT variation 15 -FT /node="N7->sequence_1" -FT /colour="4" -FT /taxa="sequence_1" -FT /parent_base="G" -FT /replace="A" -FT variation 27 -FT /node="N7->sequence_1" +FT variation 28 +FT /node="Node_2->sequence_9" FT /colour="4" -FT /taxa="sequence_1" +FT /taxa="sequence_9" FT /parent_base="C" FT /replace="A" -FT variation 23 -FT /node="N5->N7" -FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" -FT variation 28 -FT /node="N5->N7" +FT variation 24 +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 29 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 30 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 31 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 32 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 33 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 34 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 35 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 36 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 37 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 38 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 39 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 40 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 41 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 42 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 43 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 44 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 45 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 46 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 47 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" +FT /taxa=" sequence_7 sequence_8 sequence_9" +FT /parent_base="A" +FT /replace="G" FT variation 48 -FT /node="N5->N7" +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa=" sequence_6 sequence_1" -FT /parent_base="G" -FT /replace="A" -FT variation 23 -FT /node="N8->sequence_10" -FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="G" -FT /replace="A" -FT variation 83 -FT /node="N8->sequence_10" -FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9" FT /parent_base="A" -FT /replace="T" -FT variation 84 -FT /node="N8->sequence_10" +FT /replace="G" +FT variation 49 +FT /node="Node_3->Node_2" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9" FT /parent_base="A" -FT /replace="T" -FT variation 85 -FT /node="N8->sequence_10" +FT /replace="G" +FT variation 28 +FT /node="Node_4->Node_3" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6" FT /parent_base="A" -FT /replace="T" -FT variation 86 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 51 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 87 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 53 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 88 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 54 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 89 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 55 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 90 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 56 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 91 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 57 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 92 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 58 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 93 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 59 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 94 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 60 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 95 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 61 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 96 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 62 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 97 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 63 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 98 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 64 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 99 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 65 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 100 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 66 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 101 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 67 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 102 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 69 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 103 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 70 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 104 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 71 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 105 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 73 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 106 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 74 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 107 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 75 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 108 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 76 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 109 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 77 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 110 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 78 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 111 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 79 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 112 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 80 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 113 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 81 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 114 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 82 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 115 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 84 +FT /node="Node_4->sequence_5" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_5" FT /parent_base="A" -FT /replace="T" -FT variation 116 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 35 +FT /node="Node_5->sequence_2" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 117 -FT /node="N8->sequence_10" +FT /taxa="sequence_2" +FT /parent_base="T" +FT /replace="A" +FT variation 41 +FT /node="Node_5->sequence_4" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa="sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 118 -FT /node="N8->sequence_10" +FT /replace="G" +FT variation 22 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 119 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 35 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" FT /replace="T" -FT variation 120 -FT /node="N8->sequence_10" +FT variation 51 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 121 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 53 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 122 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 54 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 123 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 55 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 124 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 56 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 125 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 57 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 126 -FT /node="N8->sequence_10" -FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 127 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 58 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 128 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 59 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 129 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 60 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 130 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 61 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 131 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 62 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 132 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 63 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 133 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 64 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 134 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 65 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 135 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 66 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 136 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 67 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 137 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 69 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 138 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 70 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 139 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 71 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 140 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 73 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 141 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 74 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 142 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 75 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 143 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 76 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 144 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 77 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 145 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 78 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 146 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 79 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 147 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 80 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 148 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 81 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 149 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 82 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 150 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 84 +FT /node="Node_7->Node_6" FT /colour="4" -FT /taxa="sequence_10" +FT /taxa=" sequence_3 sequence_2 sequence_4" FT /parent_base="A" -FT /replace="T" -FT variation 151 -FT /node="N8->sequence_10" +FT /replace="C" +FT variation 16 +FT /node="Node_8->Node_7" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 152 -FT /node="N8->sequence_10" +FT /taxa=" sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 29 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 153 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 30 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 154 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 31 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 155 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 32 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 156 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 33 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 157 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 34 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 158 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 35 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 159 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 36 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 160 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 37 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 161 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 38 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 162 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 39 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 163 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 40 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 164 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 41 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 165 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 42 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 166 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 43 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 167 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 44 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 168 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 45 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 169 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 46 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 170 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 47 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 171 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 48 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 172 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 49 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 173 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="G" +FT /replace="A" +FT variation 84 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 174 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 85 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 175 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 86 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 176 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 87 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 177 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 88 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 178 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 89 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 179 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 90 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 180 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 91 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 181 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 92 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 182 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 93 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 183 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 94 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 184 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 95 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 185 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 96 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 186 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 97 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 187 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 98 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 188 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 99 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 189 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 100 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 190 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 101 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 191 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 102 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 192 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 103 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 193 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 104 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 194 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 105 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 195 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 106 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 196 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 107 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 197 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 108 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 198 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 109 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 199 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 110 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 200 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 111 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 201 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 112 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 202 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 113 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 114 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 115 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 116 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 117 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 118 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 119 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 120 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 121 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 122 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 123 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 124 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 125 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 126 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 127 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 128 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 129 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 130 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 131 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 132 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 133 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 134 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 135 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 136 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 137 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 138 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 139 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 140 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 141 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 142 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 143 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 144 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 203 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 145 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 204 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 146 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 205 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 147 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 206 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 148 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 207 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 149 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 208 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 150 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 209 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 151 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 210 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 152 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 211 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 153 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 212 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 154 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 213 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 155 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 214 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 156 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 215 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 157 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 216 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 158 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 217 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 159 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 218 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 160 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 219 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 161 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 220 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 162 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 221 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 163 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 222 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 164 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 223 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 165 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 224 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 166 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 225 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 167 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 226 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 168 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 227 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 169 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 228 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 170 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 229 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 171 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 230 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 172 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 231 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 173 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 232 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 174 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 233 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 175 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 234 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 176 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 235 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 177 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 236 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 178 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 237 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 179 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 238 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 180 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 239 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 181 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 240 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 182 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 241 -FT /node="N8->sequence_10" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 183 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_10" -FT /parent_base="A" -FT /replace="T" -FT variation 27 -FT /node="N4->N8" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 184 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 23 -FT /node="N3->N4" +FT variation 185 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 27 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 186 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="C" -FT variation 28 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 187 +FT /node="Node_9->Node_8" +FT /colour="4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 188 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 29 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 189 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 30 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 190 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 31 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 191 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 32 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 192 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 33 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 193 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 34 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 194 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 35 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 195 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 36 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 196 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 37 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 197 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 38 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 198 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 39 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 199 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 40 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 200 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 41 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 201 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 42 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 202 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 43 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 203 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 44 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 204 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 45 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 205 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 46 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 206 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 47 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 207 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 48 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 208 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 50 -FT /node="N3->N4" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 209 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 52 -FT /node="N3->N4" +FT variation 210 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 53 -FT /node="N3->N4" +FT variation 211 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 54 -FT /node="N3->N4" +FT variation 212 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 55 -FT /node="N3->N4" +FT variation 213 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 56 -FT /node="N3->N4" +FT variation 214 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 57 -FT /node="N3->N4" +FT variation 215 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 58 -FT /node="N3->N4" +FT variation 216 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 59 -FT /node="N3->N4" +FT variation 217 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 60 -FT /node="N3->N4" +FT variation 218 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 61 -FT /node="N3->N4" +FT variation 219 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 62 -FT /node="N3->N4" +FT variation 220 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 63 -FT /node="N3->N4" +FT variation 221 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 64 -FT /node="N3->N4" +FT variation 222 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 65 -FT /node="N3->N4" +FT variation 223 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 66 -FT /node="N3->N4" +FT variation 224 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 68 -FT /node="N3->N4" +FT variation 225 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 69 -FT /node="N3->N4" +FT variation 226 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 70 -FT /node="N3->N4" +FT variation 227 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 72 -FT /node="N3->N4" +FT variation 228 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 73 -FT /node="N3->N4" +FT variation 229 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 74 -FT /node="N3->N4" +FT variation 230 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 75 -FT /node="N3->N4" +FT variation 231 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 76 -FT /node="N3->N4" +FT variation 232 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 77 -FT /node="N3->N4" +FT variation 233 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 78 -FT /node="N3->N4" +FT variation 234 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 79 -FT /node="N3->N4" +FT variation 235 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 80 -FT /node="N3->N4" +FT variation 236 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 81 -FT /node="N3->N4" +FT variation 237 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 83 -FT /node="N3->N4" +FT variation 238 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 15 -FT /node="N2->N3" +FT variation 239 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_5 sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="A" -FT /replace="G" -FT variation 21 -FT /node="N2->N3" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 240 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_5 sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" -FT /parent_base="C" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" FT /replace="A" -FT variation 40 -FT /node="N1->sequence_4" +FT variation 241 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa="sequence_4" -FT /parent_base="A" -FT /replace="G" -FT variation 34 -FT /node="N1->N2" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" +FT /parent_base="T" +FT /replace="A" +FT variation 242 +FT /node="Node_9->Node_8" FT /colour="4" -FT /taxa=" sequence_2 sequence_5 sequence_7 sequence_8 sequence_6 sequence_1 sequence_10 sequence_9" +FT /taxa=" sequence_7 sequence_8 sequence_9 sequence_6 sequence_5 sequence_1 sequence_3 sequence_2 sequence_4" FT /parent_base="T" FT /replace="A" diff --git a/tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1 b/tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1 index cdb87066..396de1a1 100644 --- a/tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1 +++ b/tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1 @@ -1 +1 @@ -((sequence_three:48642.050781,sequence_two:8550105088.000000)N2:185586221056.000000,sequence_one:48642.050781,sequence_four:385245056.000000)N1:0.000000; \ No newline at end of file +((sequence_three:0.000000,sequence_two:380.000000)Node_1:0.000000,(sequence_four:0.000000,sequence_one:20.000000)Node_2:2230.000000)Node_3:0.000000; \ No newline at end of file diff --git a/tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1.original.tre b/tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1.original.tre index 214201f0..50f876df 100644 --- a/tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1.original.tre +++ b/tests/data/recombination_at_root/RAxML_result.recombination_at_root.iteration_1.original.tre @@ -1 +1,2 @@ -((sequence_three:2702.336182,sequence_two:475005824.000000)N2:10310345728.000000,sequence_one:2702.336182,sequence_four:21402504.000000)N1:0.000000; \ No newline at end of file +((sequence_three:0.0,sequence_two:38.0)Node_1:0.0,(sequence_four:0.0,sequence_one:2.0)Node_2:223.0)Node_3:0.0; + diff --git a/tests/data/recombination_at_root/expected_RAxML_result.recombination_at_root.iteration_1.tab b/tests/data/recombination_at_root/expected_RAxML_result.recombination_at_root.iteration_1.tab index 36d15d3c..d917ab70 100644 --- a/tests/data/recombination_at_root/expected_RAxML_result.recombination_at_root.iteration_1.tab +++ b/tests/data/recombination_at_root/expected_RAxML_result.recombination_at_root.iteration_1.tab @@ -1,12 +1,18 @@ -FT misc_feature 25..978 -FT /node="N2->sequence_two" -FT /neg_log_likelihood="81.250897" +FT misc_feature 339..978 +FT /node="Node_1->sequence_two" +FT /neg_log_likelihood="77.515346" FT /colour="4" FT /taxa="sequence_two" -FT /SNP_count="34" +FT /SNP_count="32" FT misc_feature 11421..12388 -FT /node="N1->N2" -FT /neg_log_likelihood="271.826391" +FT /node="Node_3->Node_2" +FT /neg_log_likelihood="266.061457" FT /colour="2" -FT /taxa=" sequence_three sequence_two" -FT /SNP_count="210" +FT /taxa=" sequence_four sequence_one" +FT /SNP_count="209" +FT misc_feature 25..429 +FT /node="Node_3->Node_2" +FT /neg_log_likelihood="35.854004" +FT /colour="2" +FT /taxa=" sequence_four sequence_one" +FT /SNP_count="10" diff --git a/tests/data/recombination_at_root/recombination_at_root.aln.gaps.snp_sites.aln b/tests/data/recombination_at_root/recombination_at_root.aln.gaps.snp_sites.aln index 66ff6d3a..875e3b71 100644 --- a/tests/data/recombination_at_root/recombination_at_root.aln.gaps.snp_sites.aln +++ b/tests/data/recombination_at_root/recombination_at_root.aln.gaps.snp_sites.aln @@ -22,15 +22,9 @@ AGTACGTGTGGTTGATTCATGGGGTAAATAAACGGAGAATGTATAAATTAGACAATCGAA ATTCAGTAATGGGCAGACCGCATACAAGCTCATCAATCTCGGGCTAGAAGGTGATTCCAA CACTCTGATTAACCGTAAGTCTTGCCGGTCTCTAAAGGGGGGACATGGTTGTAAATGTAT GATAAGAAATAAATAGTCCCGGG ->N2 -ACTCAGGTGACATTAATAGTTTTGTTTGGCCCTTCCGCGTTATGGGAGGGCCAATAAACA -TAACAAGACACAATTAATGCATAAAGGGGTGGAAAGATGAACGATTGCTGACGGTCTAGG -GACAGAATTCAATTGACTTATGAGTCTATCTCCGTTCAATATATCGATGCCGAGAGAAGT -AGTCAATGATCGTTACTGACTACCAATACTGACTGGTAAAATGTGCATCATCTCGATGGC -ACCCGATGCGTCGCCTAGGAAAA ->N1 -GTCTGATCAATATTAATAGTTTTGTTTGGCCCTTCCTCATTATGGGAGAGTTTTCTGTTG -AGTACGTGTGGTTGATTCATGGGGTAAATAAACGGAGAATGTATAAATTAGACAATCGAA -ATTCAGTAATGGGCAGACCGCATACAAGCTCATCAATCTCGGGCTAGAAGGTGATTCCAA -CACTCTGATCAACCGTAAGTCTTGCCGGTCTCTAAAGGGGGGACATGGTTGTAAATGTAT -GATAAGAAATAAATAGTCCCGGG +>Node_1 +ACTCAGGTGACATTAATAGTTTTGTTTGGCCCTTCCGCGTTATGGGAGGGCCAATAAACATAACAAGACACAATTAATGCATAAAGGGGTGGAAAGATGAACGATTGCTGACGGTCTAGGGACAGAATTCAATTGACTTATGAGTCTATCTCCGTTCAATATATCGATGCCGAGAGAAGTAGTCAATGATCGTTACTGACTACCAATACTGACTGGTAAAATGTGCATCATCTCGATGGCACCCGATGCGTCGCCTAGGAAAA +>Node_2 +GTCTGATCAATATTAATAGTTTTGTTTGGCCCTTCCTCATTATGGGAGGGTTTTCTGTTGAGTACGTGTGGTTGATTCATGGGGTAAATAAACGGAGAATGTATAAATTAGACAATCGAAATTCAGTAATGGGCAGACCGCATACAAGCTCATCAATCTCGGGCTAGAAGGTGATTCCAACACTCTGATTAACCGTAAGTCTTGCCGGTCTCTAAAGGGGGGACATGGTTGTAAATGTATGATAAGAAATAAATAGTCCCGGG +>Node_3 +ACTCAGGTGACATTAATAGTTTTGTTTGGCCCTTCCGCGTTATGGGAGGGCCAATAAACATAACAAGACACAATTAATGCATAAAGGGGTGGAAAGATGAACGATTGCTGACGGTCTAGGGACAGAATTCAATTGACTTATGAGTCTATCTCCGTTCAATATATCGATGCCGAGAGAAGTAGTCAATGATCGTTACTGACTACCAATACTGACTGGTAAAATGTGCATCATCTCGATGGCACCCGATGCGTCGCCTAGGAAAA diff --git a/tests/helper_methods.c b/tests/helper_methods.c index 258d7310..294505f1 100644 --- a/tests/helper_methods.c +++ b/tests/helper_methods.c @@ -64,7 +64,7 @@ int number_of_recombinations_in_file(char * fileName) { ++lines; } - return (int) lines/5; + return (int) lines/6; } int file_exists(char * fileName)