Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where is the guide tree estimated? #21

Open
GavinHuttley opened this issue Dec 1, 2022 · 1 comment
Open

where is the guide tree estimated? #21

GavinHuttley opened this issue Dec 1, 2022 · 1 comment
Assignees

Comments

@GavinHuttley
Copy link
Collaborator

Lines of code where dbga is estimating the guide-tree.

@xingjianleng
Copy link
Owner

In the latest implementation, the DistanceMatrix object from cogent3 was used in de Bruijn MSA. The DistanceMatrix was one input parameter for the alignment() function. The guide tree was constructed at the beginning of alignment.

The current implementation for estimating the DistanceMatrix doesn't have any mathematical/biological meaning, especially how k_estimated is calculated. The implementation is at

DBGA/src/dbga/utils.py

Lines 416 to 437 in 0da8dca

def distance_matrix_prediction(seq_sc: SequenceCollection) -> Any: # pragma: no cover
"""generate the estimated distance matrix for input sequences using the predicted pairwise similarity
Parameters
----------
seq_sc : SequenceCollection
the SequenceCollection containing multiple sequences
Returns
-------
Any
the estimated distance matrix using the predicted pairwise similarity
"""
dist_dict = {}
for seq_name1, seq_name2 in combinations(seq_sc.names, r=2):
sub_seqs = seq_sc.take_seqs((seq_name1, seq_name2))
k_estimated = math.ceil(math.log(min(map(len, sub_seqs.seqs)), 4)) + 2
distance = 1 - predict_p(sub_seqs, k=k_estimated)
dist_dict[(seq_name1, seq_name2)] = distance
dist_dict[(seq_name2, seq_name1)] = distance
dm = DistanceMatrix(dist_dict)
return dm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants