Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small changes to simplify build_primer_pairs(). #89

Merged
merged 3 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions prymer/api/picking.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from individual left and primers.

"""

from collections.abc import Sequence
from pathlib import Path
from typing import Iterator
Expand Down Expand Up @@ -140,13 +141,14 @@ def build_primer_pairs(
# generate all the primer pairs that don't violate hard size and Tm constraints
for lp in left_primers:
for rp in right_primers:
if rp.span.end - lp.span.start + 1 > amplicon_sizes.max:
amp_span = PrimerPair.calculate_amplicon_span(lp, rp)

if amp_span.length > amplicon_sizes.max:
continue

amp_mapping = Span(refname=target.refname, start=lp.span.start, end=rp.span.end)
amp_bases = bases[
amp_mapping.start - region_start : amp_mapping.end - region_start + 1
]
# Since the amplicon span and the region_start are both 1-based, the minuend
# becomes a zero-based offset
amp_bases = bases[amp_span.start - region_start : amp_span.end - region_start + 1]
tfenne marked this conversation as resolved.
Show resolved Hide resolved
amp_tm = calculate_long_seq_tm(amp_bases)

if amp_tm < amplicon_tms.min or amp_tm > amplicon_tms.max:
Expand All @@ -159,7 +161,7 @@ def build_primer_pairs(
penalty = score(
left_primer=lp,
right_primer=rp,
amplicon=amp_mapping,
amplicon=amp_span,
amplicon_tm=amp_tm,
amplicon_sizes=amplicon_sizes,
amplicon_tms=amplicon_tms,
Expand Down
54 changes: 39 additions & 15 deletions prymer/api/primer_pair.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,11 @@ class PrimerPair(OligoLike):
def __post_init__(self) -> None:
# Derive the amplicon from the left and right primers. This must be done before
# calling super() as `PrimerLike.id` depends on the amplicon being set
object.__setattr__(self, "_amplicon", self._calculate_amplicon())
object.__setattr__(
self,
"_amplicon",
PrimerPair.calculate_amplicon_span(self.left_primer, self.right_primer),
)
Comment on lines +87 to +91
Copy link
Collaborator

@msto msto Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion Is this a good opportunity to refactor to make amplicon a cached_property, rather than mucking about with a private field and setattr?

e.g.

@cached_property
def amplicon(self) -> Span:
    """The interval spanned by the pair's amplicon."""
    return self.calculate_amplicon_span(self.left_primer, self.right_primer)

and then no need for the post-init

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's questionable. For better or worse, PrimerPair relies on the checks in calculate_amplicon_span() to enforce that the pairing makes sense ... so we could make it a cached property and the still have a post_init that accesses it, or replicate the checks ... or separate the checks into yet another function. None of which seem obviously better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to remove _amplicon as a "private" field.

As a consequence of the current implementation, both asdict() and fields() return a field that isn't accepted by the class constructor. This is unusual behavior and requires the user to remember and manually remove the field.

Could we make amplicon a cached_property, and update the post-init checks to reference that property? That protects all the cases I'm aware of motivating the current implementation:

  1. Mutation-free access to an amplicon "attribute".
  2. The value of amplicon is still derived from the input primers.

As a bonus, it removes the need to use setattr (currently necessary because we're trying to mutate a frozen dataclass after instantiation).

Copy link
Member Author

@tfenne tfenne Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried ... but failed. Changed it to a cached property and tried calling it from post_init() to run the validation. Now I get this in tests, which is super unhelpful and I'm not sure why:

ERROR tests/api/test_primer_pair.py - TypeError: No '__dict__' attribute on 'PrimerPair' instance to cache 'amplicon' property.
ERROR tests/api/test_primer_pair.py - TypeError: No '__dict__' attribute on 'PrimerPair' instance to cache 'amplicon' property.```

Going to merge without making the change, and we can circle back to it separately.

super(PrimerPair, self).__post_init__()

@property
Expand Down Expand Up @@ -226,29 +230,49 @@ def __str__(self) -> str:
+ f"{self.amplicon_tm}\t{self.penalty}"
)

def _calculate_amplicon(self) -> Span:
"""
Calculates the amplicon from the left and right primers, spanning from the start of the
left primer to the end of the right primer.
@staticmethod
def calculate_amplicon_span(left_primer: Oligo, right_primer: Oligo) -> Span:
"""
Calculates the amplicon Span from the left and right primers.

Args:
left_primer: the left primer for the amplicon
right_primer: the right primer for the amplicon

Returns:
a Span starting at the first base of the left primer and ending at the last base of
the right primer
tfenne marked this conversation as resolved.
Show resolved Hide resolved

Raises:
ValueError: If `left_primer` and `right_primer` have different reference names.
ValueError: If `left_primer` doesn't start before the right primer.
ValueError: If `right_primer` ends before `left_primer`.
"""
# Require that `left_primer` and `right_primer` both map to the same reference sequence
if self.left_primer.span.refname != self.right_primer.span.refname:
if left_primer.span.refname != right_primer.span.refname:
raise ValueError(
"Left and right primers are on different references. "
f"Left primer ref: {left_primer.span.refname}. "
f"Right primer ref: {right_primer.span.refname}"
)

# Require that the left primer starts before the right primer
if left_primer.span.start > right_primer.span.start:
tfenne marked this conversation as resolved.
Show resolved Hide resolved
raise ValueError(
"The reference must be the same across primers in a pair; received "
f"left primer ref: {self.left_primer.span.refname}, "
f"right primer ref: {self.right_primer.span.refname}"
"Left primer does not start before the right primer. "
f"Left primer span: {left_primer.span}, "
f"Right primer span: {right_primer.span}"
Comment on lines +259 to +264
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add validation for primer overlap.

Current check only verifies start positions. Also validate that left primer doesn't overlap significantly with right primer.

Add this check after the existing validation:

 if left_primer.span.start > right_primer.span.start:
     raise ValueError(
         "Left primer does not start before the right primer. "
         f"Left primer span: {left_primer.span}, "
         f"Right primer span: {right_primer.span}"
     )
+
+# Ensure primers don't have significant overlap
+overlap = min(left_primer.span.end, right_primer.span.end) - max(left_primer.span.start, right_primer.span.start)
+if overlap > len(left_primer.bases) / 2:  # Allow up to 50% overlap
+    raise ValueError(
+        f"Primers overlap too much ({overlap} bases). "
+        f"Left primer span: {left_primer.span}, "
+        f"Right primer span: {right_primer.span}"
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Require that the left primer starts before the right primer
if left_primer.span.start > right_primer.span.start:
raise ValueError(
"The reference must be the same across primers in a pair; received "
f"left primer ref: {self.left_primer.span.refname}, "
f"right primer ref: {self.right_primer.span.refname}"
"Left primer does not start before the right primer. "
f"Left primer span: {left_primer.span}, "
f"Right primer span: {right_primer.span}"
# Require that the left primer starts before the right primer
if left_primer.span.start > right_primer.span.start:
raise ValueError(
"Left primer does not start before the right primer. "
f"Left primer span: {left_primer.span}, "
f"Right primer span: {right_primer.span}"
)
# Ensure primers don't have significant overlap
overlap = min(left_primer.span.end, right_primer.span.end) - max(left_primer.span.start, right_primer.span.start)
if overlap > len(left_primer.bases) / 2: # Allow up to 50% overlap
raise ValueError(
f"Primers overlap too much ({overlap} bases). "
f"Left primer span: {left_primer.span}, "
f"Right primer span: {right_primer.span}"
)

)

# Require that the left primer does not start to the right of the right primer
if self.left_primer.span.start > self.right_primer.span.end:
# Require that the left primer starts before the right primer
if right_primer.span.end < left_primer.span.end:
raise ValueError(
"Left primer start must be less than or equal to right primer end; received "
"left primer genome span: {self.left_primer.span}, "
"right primer genome span: {self.right_primer.span}"
"Right primer ends before left primer ends. "
f"Left primer span: {left_primer.span}, "
f"Right primer span: {right_primer.span}"
)

return replace(self.left_primer.span, end=self.right_primer.span.end)
return Span(left_primer.span.refname, left_primer.span.start, right_primer.span.end)
clintval marked this conversation as resolved.
Show resolved Hide resolved

@staticmethod
def compare(
Expand Down
29 changes: 24 additions & 5 deletions tests/api/test_primer_pair.py
Original file line number Diff line number Diff line change
Expand Up @@ -427,7 +427,7 @@ def test_reference_mismatch() -> None:

pp = PRIMER_PAIR_TEST_CASES[0].primer_pair

with pytest.raises(ValueError, match="The reference must be the same across primers in a pair"):
with pytest.raises(ValueError, match="different references"):
replace(
pp,
left_primer=replace(
Expand All @@ -436,7 +436,7 @@ def test_reference_mismatch() -> None:
),
)

with pytest.raises(ValueError, match="The reference must be the same across primers in a pair"):
with pytest.raises(ValueError, match="different references"):
replace(
pp,
right_primer=replace(
Expand All @@ -449,9 +449,7 @@ def test_reference_mismatch() -> None:
def test_right_primer_before_left_primer() -> None:
"""Test that an exception is raised if the left primer starts after the right primer ends"""
pp = PRIMER_PAIR_TEST_CASES[0].primer_pair
with pytest.raises(
ValueError, match="Left primer start must be less than or equal to right primer end"
):
with pytest.raises(ValueError, match="Left primer does not start before the right primer"):
replace(
pp,
left_primer=pp.right_primer,
Expand Down Expand Up @@ -556,3 +554,24 @@ def test_primer_pair_compare(
assert -expected_by_amplicon_false == PrimerPair.compare(
this=that, that=this, seq_dict=seq_dict, by_amplicon=False
)


def test_calculate_amplicon_span() -> None:
left = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr1", 50, 59))
right = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr1", 150, 159))
assert PrimerPair.calculate_amplicon_span(left, right) == Span("chr1", 50, 159)

left = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr2", 50, 59))
right = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr3", 150, 159))
with pytest.raises(ValueError, match="different references"):
PrimerPair.calculate_amplicon_span(left, right)

left = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr1", 150, 159))
right = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr1", 50, 59))
with pytest.raises(ValueError, match="Left primer does not start before the right primer"):
PrimerPair.calculate_amplicon_span(left, right)

left = Oligo(name="l", bases="AACCGGTTAAACGTT", tm=60, penalty=1, span=Span("chr1", 150, 164))
right = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr1", 150, 159))
with pytest.raises(ValueError, match="Right primer ends before left primer ends"):
PrimerPair.calculate_amplicon_span(left, right)
Comment on lines +559 to +577
Copy link
Collaborator

@msto msto Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note I usually try to use pytest.mark.parametrize to parallelize test execution as much as possible.

Tests exit after the first failed assertion, so if you bundle test cases in a single test, you run the risk of obscuring later failures. If you parametrize, you hit as many failures as possible in a single test run, and don't find yourself stuck in a loop of addressing one test case, running the test suite, and discovering a previously hidden failure.

I usually have one test parametrized over "good" cases, with each test case paired with its expected output, and one test parametrized over "bad" cases, where each case is expected to raise an exception.

Since here the span is the primary variable being manipulated by each case, I'd write something like the following:

@pytest.mark.parametrize(
    "left_primer,right_primer,expected_amplicon",
    [
        (Span("chr1", 50, 59), Span("chr1", 150, 159), Span("chr1", 50, 159)),
    ]
)
def test_calculate_amplicon_span(left_primer: Span, right_primer: Span, expected_amplicon: Span) -> None:
    """The amplicon should span from the start of the left primer to the end of the right primer."""
    # TODO add logic to build an `Oligo` from the test `Span`
    actual_amplicon: Span = calculate_amplicon_span(left_primer, right_primer)
    assert actual_amplicon == expected_amplicon


@pytest.mark.parametrize(
    "left_primer,right_primer,error_msg",
    [
        (Span("chr2", 50, 59), Span("chr3", 150, 159), "different references"),
    ],
)
def test_calculate_amplicon_span_raises(left_primer: Span, right_primer: Span, error_msg: str) -> None:
    """An error should be raised if the spans are on different references, or if the left primer is not to the left of the right primer."""
    with pytest.raises(ValueError, match=error_msg):
        calculate_amplicon_span(left_primer, right_primer)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, to me, a stylistic thing, and I come down very strongly on the other side. I almost never use pytest.mark.parametrize because I personally find that it makes tests less maintainable, harder to read etc.

If the tests aren't as simple as these, then I tend to break them up into multiple test functions.

Loading