Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing MARK03 Experiment #114

Merged
merged 27 commits into from
Jul 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
cb42585
refine workflow readme
gordonkoehn Jul 11, 2023
43da011
move run_huntress to general rules + add 4 threads
gordonkoehn Jul 11, 2023
cdad588
add test commit non worlflows
gordonkoehn Jul 12, 2023
9bcc687
test commit in .github/workflows
gordonkoehn Jul 12, 2023
474f9b2
undo test
gordonkoehn Jul 12, 2023
d781b53
add initial points mark03
gordonkoehn Jul 12, 2023
2058950
WIP
gordonkoehn Jul 13, 2023
6fb6bdc
WIP
gordonkoehn Jul 13, 2023
eb49df5
skelleton of rules done
gordonkoehn Jul 13, 2023
45fec71
filling in combined_metric_iteration_plot
gordonkoehn Jul 13, 2023
9a1afee
add lopP + local test run
gordonkoehn Jul 13, 2023
42230ed
add first working mark03
gordonkoehn Jul 13, 2023
ca72f13
add assertion mutation shape to core
gordonkoehn Jul 18, 2023
052209e
remove huntress id
gordonkoehn Jul 18, 2023
b921b1d
remove obsolete type ignore
gordonkoehn Jul 18, 2023
41866ae
seperate huntress from trees
gordonkoehn Jul 18, 2023
c2ad5c8
fix legend position
gordonkoehn Jul 18, 2023
9592b9e
remove blank
gordonkoehn Jul 18, 2023
58ea0da
document
gordonkoehn Jul 18, 2023
bd77a9a
WIP MCMC5 moves tree
gordonkoehn Jul 18, 2023
9cb9fb8
WIP base evolve mcmc fn
gordonkoehn Jul 19, 2023
2defe06
WIP add tree node envalope / make tree_from_tree_node static
gordonkoehn Jul 19, 2023
4ea160d
added MCMC5
gordonkoehn Jul 19, 2023
b97997f
remove print
gordonkoehn Jul 19, 2023
353db46
Merge branch 'main' into gordon/mark03
gordonkoehn Jul 19, 2023
63434d7
set full experiment
gordonkoehn Jul 19, 2023
ccd131a
remove snakefmt - deactivate until issues resolved
gordonkoehn Jul 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@ jobs:
- uses: actions/checkout@v3
- name: Run black formatting check
uses: psf/black@stable
- name: Run snakefmt formatting check
uses: super-linter/super-linter@v5
env:
VALIDATE_ALL_CODEBASE: false
DEFAULT_BRANCH: main
VALIDATE_SNAKEMAKE_SNAKEFMT: true
# TODO (Gordon): Add snakefmt back in when/if fixed. See
# https://github.com/snakemake/snakefmt/issues/197
# - name: Run snakefmt formatting check
# uses: super-linter/super-linter@v5
# env:
# VALIDATE_ALL_CODEBASE: false
# DEFAULT_BRANCH: main
# VALIDATE_SNAKEMAKE_SNAKEFMT: true
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
Expand Down
10 changes: 6 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,12 @@ repos:
rev: 23.1.0
hooks:
- id: black
- repo: https://github.com/snakemake/snakefmt
rev: 'v0.8.4'
hooks:
- id: snakefmt
# TODO (Gordon): Add snakefmt back in when/if fixed. See
# https://github.com/snakemake/snakefmt/issues/197
# - repo: https://github.com/snakemake/snakefmt
# rev: 'v0.8.4'
# hooks:
# - id: snakefmt
- repo: https://github.com/econchick/interrogate
rev: 1.5.0
hooks:
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
[![build](https://github.com/cbg-ethz/PYggdrasil/actions/workflows/test.yml/badge.svg)](https://github.com/cbg-ethz/PYggdrasil/actions/workflows/test.yml)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/charliermarsh/ruff)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Code style: snakefmt](https://img.shields.io/badge/code%20style-snakefmt-000000.svg)](https://github.com/snakemake/snakefmt)
<!-- TODO (Gordon): Add snakefmt back in when/if fixed. See https://github.com/snakemake/snakefmt/issues/197 [![Code style: snakefmt](https://img.shields.io/badge/code%20style-snakefmt-000000.svg)](https://github.com/snakemake/snakefmt) -->

# PYggdrasil

Expand Down Expand Up @@ -50,10 +50,11 @@ The code quality checks run during on GitHub can be seen in ``.github/workflows/
We are using:
- [Ruff](https://github.com/charliermarsh/ruff) to lint the code.
- [Black](https://github.com/psf/black) to format the code.
- [Snakefmt](https://github.com/snakemake/snakefmt) to format Snakemake workflows.
- [Pyright](https://github.com/microsoft/pyright) to check the types.
- [Pytest](https://docs.pytest.org/) to run the unit tests.
- [Interrogate](https://interrogate.readthedocs.io/) to check the documentation.
<!-- TODO (Gordon): Add snakefmt back in when/if fixed. See https://github.com/snakemake/snakefmt/issues/197 -->
<!-- [Snakefmt](https://github.com/snakemake/snakefmt) to format Snakemake workflows.-->


### Workflow
Expand Down
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ pytest-xdist = "^3.2.0"
pre-commit = "^3.1.0"
interrogate = "^1.5.0"
pyright = "^1.1.309"
snakefmt = "^0.8.4"
# TODO (Gordon): Add snakefmt back in when/if fixed. See
# https://github.com/snakemake/snakefmt/issues/197
# snakefmt = "^0.8.4"

[tool.coverage.report]
fail_under = 85.0
Expand Down
4 changes: 3 additions & 1 deletion scripts/make_huntress.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
Make a TreeNode tree given a mutation matrix
to generate a huntress tree.

Note: used 4 threads as default for the huntress tree inference.

Example Usage:
poetry run python ../scripts/make_huntress.py

Expand Down Expand Up @@ -134,7 +136,7 @@ def main() -> None:
mut_mat = cell_simulation_data["noisy_mutation_mat"]

# run huntress tree inference
tree_n = huntress_tree_inference(mut_mat, args.fpr, args.fnr, n_threads=2)
tree_n = huntress_tree_inference(mut_mat, args.fpr, args.fnr, n_threads=4)
gordonkoehn marked this conversation as resolved.
Show resolved Hide resolved
tree_tn = TreeNode(name=tree_n.name, parent=None, children=tree_n.children)

# Save the tree - make path
Expand Down
2 changes: 1 addition & 1 deletion scripts/run_mcmc.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ def run_chain(

init_tree_node = serialize.read_tree_node(params.init_tree_fp)
# convert TreeNode to Tree
init_tree = tree_inf.tree_from_tree_node(init_tree_node)
init_tree = tree_inf.Tree.tree_from_tree_node(init_tree_node)
logging.info("Loaded tree (TreeNode) from file.")

# Make Move Probabilities
Expand Down
9 changes: 6 additions & 3 deletions src/pyggdrasil/tree_inference/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
TreeAdjacencyMatrix,
AncestorMatrix,
CellAttachmentVector,
MoveProbabilities,
)

from pyggdrasil.tree_inference._tree_generator import (
Expand All @@ -22,7 +23,7 @@
generate_random_TreeNode,
)

from pyggdrasil.tree_inference._tree import Tree, tree_from_tree_node, get_descendants
from pyggdrasil.tree_inference._tree import Tree, get_descendants

from pyggdrasil.tree_inference._simulate import (
CellAttachmentStrategy,
Expand Down Expand Up @@ -52,7 +53,9 @@

from pyggdrasil.tree_inference._huntress import huntress_tree_inference

from pyggdrasil.tree_inference._mcmc_sampler import mcmc_sampler, MoveProbabilities
from pyggdrasil.tree_inference._mcmc_sampler import mcmc_sampler

from pyggdrasil.tree_inference._tree_mcmc import evolve_tree_mcmc


__all__ = [
Expand All @@ -67,7 +70,6 @@
"MutationMatrix",
"Tree",
"MoveProbabilities",
"tree_from_tree_node",
"unpack_sample",
"gen_sim_data",
"huntress_tree_inference",
Expand All @@ -93,4 +95,5 @@
"MoveProbConfigOptions",
"McmcConfigOptions",
"ErrorCombinations",
"evolve_tree_mcmc",
]
120 changes: 108 additions & 12 deletions src/pyggdrasil/tree_inference/_file_id.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
"""Provides classes for naming files Tree,
Cell Simulation and MCMC run files uniquely """

import re

from enum import Enum
from typing import Union, Optional

Expand All @@ -16,12 +18,14 @@ class TreeType(Enum):
- STAR (star tree)
- DEEP (deep tree)
- HUNTRESS (Huntress tree) - inferred from real / cell simulation data
- MCMC - generated tree evolve by MCMC moves
"""

RANDOM = "r"
STAR = "s"
DEEP = "d"
HUNTRESS = "h"
MCMC = "m"


class MutationDataId:
Expand Down Expand Up @@ -111,31 +115,124 @@ def from_str(cls, str_id: str):
# split string by underscore and assign to attributes
split_elements = str_id.split("_")
seed = None
mutation_data = None
rest_id = None
if len(split_elements) == 3:
_, tree_type, n_nodes = split_elements
elif len(split_elements) == 4:
_, tree_type, n_nodes, seed = split_elements
elif len(split_elements) == 5:
_, tree_type, n_nodes, seed, mutation_data = split_elements
elif len(split_elements) >= 5:
_, tree_type, n_nodes, *rest = split_elements
rest_id = "_".join(rest)
else:
raise AssertionError("Tree id has invalid format")

if seed is not None:
tree_id = TreeId(TreeType(tree_type), int(n_nodes), int(seed))
return tree_id
else:
if mutation_data is not None:
try:
mutation_data = CellSimulationId.from_str(mutation_data)
except AssertionError:
mutation_data = MutationDataId(mutation_data)

tree_id = TreeId(TreeType(tree_type), int(n_nodes), None, mutation_data)
if rest_id is not None:
# check if tree is MCMC tree
if tree_type == TreeType.MCMC.value:
try:
tree_id = McmcTreeId.from_str(str_id)
return tree_id
except AssertionError:
raise AssertionError(
"Tree id has invalid format for an MCMC tree"
)

# check if tree is Huntress tree
elif tree_type == TreeType.HUNTRESS.value:
try:
mutation_data = CellSimulationId.from_str(rest_id)
except AssertionError:
mutation_data = MutationDataId(rest_id)

tree_id = TreeId(
TreeType(tree_type), int(n_nodes), None, mutation_data
)
return tree_id
else:
tree_id = TreeId(TreeType(tree_type), int(n_nodes))
return tree_id


class McmcTreeId(TreeId):
"""Class for tree ids of trees evolved by MCMC moves under SCITE.

MCMC move probabilities are not specified in the id!
ID is not unique, fully reproducible only with the MCMC config.
Assumed default values for MCMC config.
"""

tree_type: TreeType
n_moves: int
n_nodes: int
mcmc_rng_seed: int
initial_tree_id: TreeId

def __init__(
self,
n_moves: int,
n_nodes: int,
mcmc_rng_seed: int,
initial_tree_id: TreeId,
tree_type: TreeType = TreeType.MCMC,
):
self.initial_tree_id = initial_tree_id
self.n_nodes = n_nodes
self.n_moves = n_moves
self.mcmc_rng_seed = mcmc_rng_seed
self.tree_type = tree_type
super().__init__(TreeType.MCMC, n_nodes)

self.id = self._create_id()

def _create_id(self) -> str:
"""Creates a unique id for the tree,
by concatenating the values of the attributes"""

str_rep = "T"
str_rep = str_rep + "_" + str(self.tree_type.value)
str_rep = str_rep + "_" + str(self.n_nodes)
str_rep = str_rep + "_" + str(self.n_moves)
str_rep = str_rep + "_" + str(self.mcmc_rng_seed)
str_rep = str_rep + "_o" + str(self.initial_tree_id)

return str_rep

def __str__(self) -> str:
return self.id

@classmethod
def from_str(cls, str_id: str):
"""Creates a tree id from a string representation of the id.

Args:
str_id: str
"""

# Define the regular expression pattern to match the variables
pattern = r"T_m_(\d+)_(\d+)_(\d+)_o(T_[a-zA-Z]_\d+_\d+)"

# Use re.findall() to extract the matched variables
matches = re.findall(pattern, str_id)

# The 'matches' variable now contains the extracted variables.
# Let's unpack the matches to get individual variable values.
if matches:
n_nodes, n_moves, mcmc_move_seed, initial_tree_id = matches[0]

tree_id = McmcTreeId(
int(n_moves),
int(n_nodes),
int(mcmc_move_seed),
TreeId.from_str(initial_tree_id),
)

return tree_id
else:
raise AssertionError("MCMC tree id has invalid format")


class CellSimulationId(MutationDataId):
Expand Down Expand Up @@ -243,10 +340,9 @@ def from_str(cls, str_id: str):
# create tree id
tree_id = TreeId.from_str(tree_id)

# TODO: remove type ignore once PR #64 is merged
return cls(
seed,
tree_id, # type: ignore
tree_id,
n_cells,
fpr,
fnr,
Expand Down
13 changes: 13 additions & 0 deletions src/pyggdrasil/tree_inference/_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
so we do not introduce circular imports.
"""
from typing import Union
import dataclasses

import jax
import numpy as np
Expand Down Expand Up @@ -50,3 +51,15 @@
# Observational Error rates
# tuple of (fpr, fnr)
ErrorRates = tuple[float, float]


@dataclasses.dataclass
class MoveProbabilities:
"""Move probabilities. The default values were taken from
the paragraph **Combining the three MCMC moves** of page 14
of the SCITE paper supplement.
"""

prune_and_reattach: float = 0.1
swap_node_labels: float = 0.65
swap_subtrees: float = 0.25
Loading