Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluating alignment quality #22

Open
GavinHuttley opened this issue Dec 1, 2022 · 2 comments
Open

evaluating alignment quality #22

GavinHuttley opened this issue Dec 1, 2022 · 2 comments
Assignees

Comments

@GavinHuttley
Copy link
Collaborator

where in dbga is the code for quantifying the alignment quality? (The squared-differences measure?)

@xingjianleng
Copy link
Owner

In the report the alignment_quality score from cogent3 was used for quantifying the alignment quality.

There is the implementation of the sum of pairs measure especially for multiple sequence alignment. The related paper is https://doi.org/10.1093/nar/27.13.2682 and the implementation is at

DBGA/src/dbga/utils.py

Lines 440 to 534 in 0da8dca

def pairwise_alignment_score(seqs: ArrayAlignment) -> Dict[str, int]:
"""alignment score of two sequences using SOP (sum of pairs)
Parameters
----------
seqs : ArrayAlignment
the ArrayAlignment object containing `two` aligned sequences
Returns
-------
Dict[str, int]
the mapping of alignment attribute to the statistics
"""
# seqs should contain exactly two sequences
assert seqs.num_seqs == len(seqs.seqs) == 2
rtn = {
"match": 0,
"mismatch": 0,
"gap_open": 0,
"gap_extend": 0,
}
seq1_gap_opened, seq2_gap_opened = False, False
for i in range(seqs.seq_len):
seq1_char = seqs.seqs[0][i]
seq2_char = seqs.seqs[1][i]
# exclude cases where there is a match, or both gaps
if seq1_char != seq2_char :
if seq1_char != "-" and seq2_char != "-":
# two different nucleotides
rtn["mismatch"] += 1
seq1_gap_opened = False
seq2_gap_opened = False
elif (seq1_char == "-" and seq1_gap_opened) or (seq2_char == "-" and seq2_gap_opened):
# extending a current gap
rtn["gap_extend"] += 1
elif seq1_char == "-" and seq2_gap_opened:
# end gap in seq2, start a gap in seq1
rtn["gap_open"] += 1
seq1_gap_opened = True
seq2_gap_opened = False
elif seq2_char == "-" and seq1_gap_opened:
# end a gap in seq1, start a gap in seq2
rtn["gap_open"] += 1
seq1_gap_opened = False
seq2_gap_opened = True
elif seq1_char == "-":
# open a gap in seq1 (one gap, one nucleotide)
rtn["gap_open"] += 1
seq1_gap_opened = True
elif seq2_char == "-":
# open a gap in seq2 (one gap, one nucleotide)
rtn["gap_open"] += 1
seq2_gap_opened = True
elif seq1_char != "-":
# two matching nucleotides
rtn["match"] += 1
seq1_gap_opened = False
seq2_gap_opened = False
else:
# two gaps
seq1_gap_opened = False
seq2_gap_opened = False
return rtn
def sop(seqs: ArrayAlignment) -> Dict[str, int]:
"""sop stands for sum of pairs, function for measuring alignment quality for multiple sequence alignment
Parameters
----------
seqs : ArrayAlignment
the ArrayAlignment object with two or more aligned sequences
Returns
-------
Dict[str, int]
the mapping of alignment attribute to the statistics
"""
rtn = {
"match": 0,
"mismatch": 0,
"gap_open": 0,
"gap_extend": 0,
}
for chosen in combinations(seqs.names, 2):
sub_seqs = seqs.take_seqs(chosen)
sub_seqs_sop = pairwise_alignment_score(sub_seqs)
print(sub_seqs_sop)
for key, value in sub_seqs_sop.items():
rtn[key] += value
return rtn

@GavinHuttley
Copy link
Collaborator Author

move this statistic into cogent3 and making it an option via the Alignmentalignment_quality() method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants