Skip to content

Commit

Permalink
Implementing MARK03 Experiment (#114)
Browse files Browse the repository at this point in the history
* refine workflow readme

* move run_huntress to general rules + add 4 threads

* add test commit non worlflows

* test commit in .github/workflows

* undo test

* add initial points mark03

* WIP

* WIP

* skelleton of rules done

* filling in combined_metric_iteration_plot

* add lopP + local test run

* add first working mark03

* add assertion mutation shape to core

* remove huntress id

* remove obsolete type ignore

* seperate huntress from trees

* fix legend position

* remove blank

* document

* WIP MCMC5 moves tree

* WIP base evolve mcmc fn

* WIP add tree node envalope / make tree_from_tree_node static

* added MCMC5

* remove print

* set full experiment

* remove snakefmt - deactivate until issues resolved
  • Loading branch information
gordonkoehn authored Jul 20, 2023
1 parent 67e06e5 commit 503d2fc
Show file tree
Hide file tree
Showing 26 changed files with 901 additions and 131 deletions.
14 changes: 8 additions & 6 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@ jobs:
- uses: actions/checkout@v3
- name: Run black formatting check
uses: psf/black@stable
- name: Run snakefmt formatting check
uses: super-linter/super-linter@v5
env:
VALIDATE_ALL_CODEBASE: false
DEFAULT_BRANCH: main
VALIDATE_SNAKEMAKE_SNAKEFMT: true
# TODO (Gordon): Add snakefmt back in when/if fixed. See
# https://github.com/snakemake/snakefmt/issues/197
# - name: Run snakefmt formatting check
# uses: super-linter/super-linter@v5
# env:
# VALIDATE_ALL_CODEBASE: false
# DEFAULT_BRANCH: main
# VALIDATE_SNAKEMAKE_SNAKEFMT: true
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
Expand Down
10 changes: 6 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,12 @@ repos:
rev: 23.1.0
hooks:
- id: black
- repo: https://github.com/snakemake/snakefmt
rev: 'v0.8.4'
hooks:
- id: snakefmt
# TODO (Gordon): Add snakefmt back in when/if fixed. See
# https://github.com/snakemake/snakefmt/issues/197
# - repo: https://github.com/snakemake/snakefmt
# rev: 'v0.8.4'
# hooks:
# - id: snakefmt
- repo: https://github.com/econchick/interrogate
rev: 1.5.0
hooks:
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
[![build](https://github.com/cbg-ethz/PYggdrasil/actions/workflows/test.yml/badge.svg)](https://github.com/cbg-ethz/PYggdrasil/actions/workflows/test.yml)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/charliermarsh/ruff)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Code style: snakefmt](https://img.shields.io/badge/code%20style-snakefmt-000000.svg)](https://github.com/snakemake/snakefmt)
<!-- TODO (Gordon): Add snakefmt back in when/if fixed. See https://github.com/snakemake/snakefmt/issues/197 [![Code style: snakefmt](https://img.shields.io/badge/code%20style-snakefmt-000000.svg)](https://github.com/snakemake/snakefmt) -->

# PYggdrasil

Expand Down Expand Up @@ -50,10 +50,11 @@ The code quality checks run during on GitHub can be seen in ``.github/workflows/
We are using:
- [Ruff](https://github.com/charliermarsh/ruff) to lint the code.
- [Black](https://github.com/psf/black) to format the code.
- [Snakefmt](https://github.com/snakemake/snakefmt) to format Snakemake workflows.
- [Pyright](https://github.com/microsoft/pyright) to check the types.
- [Pytest](https://docs.pytest.org/) to run the unit tests.
- [Interrogate](https://interrogate.readthedocs.io/) to check the documentation.
<!-- TODO (Gordon): Add snakefmt back in when/if fixed. See https://github.com/snakemake/snakefmt/issues/197 -->
<!-- [Snakefmt](https://github.com/snakemake/snakefmt) to format Snakemake workflows.-->


### Workflow
Expand Down
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ pytest-xdist = "^3.2.0"
pre-commit = "^3.1.0"
interrogate = "^1.5.0"
pyright = "^1.1.309"
snakefmt = "^0.8.4"
# TODO (Gordon): Add snakefmt back in when/if fixed. See
# https://github.com/snakemake/snakefmt/issues/197
# snakefmt = "^0.8.4"

[tool.coverage.report]
fail_under = 85.0
Expand Down
4 changes: 3 additions & 1 deletion scripts/make_huntress.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
Make a TreeNode tree given a mutation matrix
to generate a huntress tree.
Note: used 4 threads as default for the huntress tree inference.
Example Usage:
poetry run python ../scripts/make_huntress.py
Expand Down Expand Up @@ -134,7 +136,7 @@ def main() -> None:
mut_mat = cell_simulation_data["noisy_mutation_mat"]

# run huntress tree inference
tree_n = huntress_tree_inference(mut_mat, args.fpr, args.fnr, n_threads=2)
tree_n = huntress_tree_inference(mut_mat, args.fpr, args.fnr, n_threads=4)
tree_tn = TreeNode(name=tree_n.name, parent=None, children=tree_n.children)

# Save the tree - make path
Expand Down
2 changes: 1 addition & 1 deletion scripts/run_mcmc.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ def run_chain(

init_tree_node = serialize.read_tree_node(params.init_tree_fp)
# convert TreeNode to Tree
init_tree = tree_inf.tree_from_tree_node(init_tree_node)
init_tree = tree_inf.Tree.tree_from_tree_node(init_tree_node)
logging.info("Loaded tree (TreeNode) from file.")

# Make Move Probabilities
Expand Down
9 changes: 6 additions & 3 deletions src/pyggdrasil/tree_inference/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
TreeAdjacencyMatrix,
AncestorMatrix,
CellAttachmentVector,
MoveProbabilities,
)

from pyggdrasil.tree_inference._tree_generator import (
Expand All @@ -22,7 +23,7 @@
generate_random_TreeNode,
)

from pyggdrasil.tree_inference._tree import Tree, tree_from_tree_node, get_descendants
from pyggdrasil.tree_inference._tree import Tree, get_descendants

from pyggdrasil.tree_inference._simulate import (
CellAttachmentStrategy,
Expand Down Expand Up @@ -52,7 +53,9 @@

from pyggdrasil.tree_inference._huntress import huntress_tree_inference

from pyggdrasil.tree_inference._mcmc_sampler import mcmc_sampler, MoveProbabilities
from pyggdrasil.tree_inference._mcmc_sampler import mcmc_sampler

from pyggdrasil.tree_inference._tree_mcmc import evolve_tree_mcmc


__all__ = [
Expand All @@ -67,7 +70,6 @@
"MutationMatrix",
"Tree",
"MoveProbabilities",
"tree_from_tree_node",
"unpack_sample",
"gen_sim_data",
"huntress_tree_inference",
Expand All @@ -93,4 +95,5 @@
"MoveProbConfigOptions",
"McmcConfigOptions",
"ErrorCombinations",
"evolve_tree_mcmc",
]
120 changes: 108 additions & 12 deletions src/pyggdrasil/tree_inference/_file_id.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
"""Provides classes for naming files Tree,
Cell Simulation and MCMC run files uniquely """

import re

from enum import Enum
from typing import Union, Optional

Expand All @@ -16,12 +18,14 @@ class TreeType(Enum):
- STAR (star tree)
- DEEP (deep tree)
- HUNTRESS (Huntress tree) - inferred from real / cell simulation data
- MCMC - generated tree evolve by MCMC moves
"""

RANDOM = "r"
STAR = "s"
DEEP = "d"
HUNTRESS = "h"
MCMC = "m"


class MutationDataId:
Expand Down Expand Up @@ -111,31 +115,124 @@ def from_str(cls, str_id: str):
# split string by underscore and assign to attributes
split_elements = str_id.split("_")
seed = None
mutation_data = None
rest_id = None
if len(split_elements) == 3:
_, tree_type, n_nodes = split_elements
elif len(split_elements) == 4:
_, tree_type, n_nodes, seed = split_elements
elif len(split_elements) == 5:
_, tree_type, n_nodes, seed, mutation_data = split_elements
elif len(split_elements) >= 5:
_, tree_type, n_nodes, *rest = split_elements
rest_id = "_".join(rest)
else:
raise AssertionError("Tree id has invalid format")

if seed is not None:
tree_id = TreeId(TreeType(tree_type), int(n_nodes), int(seed))
return tree_id
else:
if mutation_data is not None:
try:
mutation_data = CellSimulationId.from_str(mutation_data)
except AssertionError:
mutation_data = MutationDataId(mutation_data)

tree_id = TreeId(TreeType(tree_type), int(n_nodes), None, mutation_data)
if rest_id is not None:
# check if tree is MCMC tree
if tree_type == TreeType.MCMC.value:
try:
tree_id = McmcTreeId.from_str(str_id)
return tree_id
except AssertionError:
raise AssertionError(
"Tree id has invalid format for an MCMC tree"
)

# check if tree is Huntress tree
elif tree_type == TreeType.HUNTRESS.value:
try:
mutation_data = CellSimulationId.from_str(rest_id)
except AssertionError:
mutation_data = MutationDataId(rest_id)

tree_id = TreeId(
TreeType(tree_type), int(n_nodes), None, mutation_data
)
return tree_id
else:
tree_id = TreeId(TreeType(tree_type), int(n_nodes))
return tree_id


class McmcTreeId(TreeId):
"""Class for tree ids of trees evolved by MCMC moves under SCITE.
MCMC move probabilities are not specified in the id!
ID is not unique, fully reproducible only with the MCMC config.
Assumed default values for MCMC config.
"""

tree_type: TreeType
n_moves: int
n_nodes: int
mcmc_rng_seed: int
initial_tree_id: TreeId

def __init__(
self,
n_moves: int,
n_nodes: int,
mcmc_rng_seed: int,
initial_tree_id: TreeId,
tree_type: TreeType = TreeType.MCMC,
):
self.initial_tree_id = initial_tree_id
self.n_nodes = n_nodes
self.n_moves = n_moves
self.mcmc_rng_seed = mcmc_rng_seed
self.tree_type = tree_type
super().__init__(TreeType.MCMC, n_nodes)

self.id = self._create_id()

def _create_id(self) -> str:
"""Creates a unique id for the tree,
by concatenating the values of the attributes"""

str_rep = "T"
str_rep = str_rep + "_" + str(self.tree_type.value)
str_rep = str_rep + "_" + str(self.n_nodes)
str_rep = str_rep + "_" + str(self.n_moves)
str_rep = str_rep + "_" + str(self.mcmc_rng_seed)
str_rep = str_rep + "_o" + str(self.initial_tree_id)

return str_rep

def __str__(self) -> str:
return self.id

@classmethod
def from_str(cls, str_id: str):
"""Creates a tree id from a string representation of the id.
Args:
str_id: str
"""

# Define the regular expression pattern to match the variables
pattern = r"T_m_(\d+)_(\d+)_(\d+)_o(T_[a-zA-Z]_\d+_\d+)"

# Use re.findall() to extract the matched variables
matches = re.findall(pattern, str_id)

# The 'matches' variable now contains the extracted variables.
# Let's unpack the matches to get individual variable values.
if matches:
n_nodes, n_moves, mcmc_move_seed, initial_tree_id = matches[0]

tree_id = McmcTreeId(
int(n_moves),
int(n_nodes),
int(mcmc_move_seed),
TreeId.from_str(initial_tree_id),
)

return tree_id
else:
raise AssertionError("MCMC tree id has invalid format")


class CellSimulationId(MutationDataId):
Expand Down Expand Up @@ -243,10 +340,9 @@ def from_str(cls, str_id: str):
# create tree id
tree_id = TreeId.from_str(tree_id)

# TODO: remove type ignore once PR #64 is merged
return cls(
seed,
tree_id, # type: ignore
tree_id,
n_cells,
fpr,
fnr,
Expand Down
13 changes: 13 additions & 0 deletions src/pyggdrasil/tree_inference/_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
so we do not introduce circular imports.
"""
from typing import Union
import dataclasses

import jax
import numpy as np
Expand Down Expand Up @@ -50,3 +51,15 @@
# Observational Error rates
# tuple of (fpr, fnr)
ErrorRates = tuple[float, float]


@dataclasses.dataclass
class MoveProbabilities:
"""Move probabilities. The default values were taken from
the paragraph **Combining the three MCMC moves** of page 14
of the SCITE paper supplement.
"""

prune_and_reattach: float = 0.1
swap_node_labels: float = 0.65
swap_subtrees: float = 0.25
Loading

0 comments on commit 503d2fc

Please sign in to comment.