Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Add documentation for model_finder #138

Merged
merged 7 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@
| Name | Summary |
|------|---------|
| [model_finder](model/model_finder.md) | Determine the best-fit model for your data. |
| [ModelFinderResult](model/ModelFinderResult.md) | Collection of data returned by IQ-TREE's ModelFinder. |
| [Model](model/Model.md) | Class for substitution models. |
| [make_model](model/make_model.md) | Function to construct Model classes from IQ-TREE strings. |
| [SubstitutionModel](model/SubstitutionModel.md) | Enums for substitution models. |
| [FreqType](model/FreqType.md) | Enum for base frequencies. |
| [RateModel](model/RateModel.md) | Classes for rate heterogeneity. |
Expand Down
7 changes: 7 additions & 0 deletions docs/api/model/ModelFinderResult.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# ModelFinderResult

::: piqtree.ModelFinderResult

## Usage

For usage, see ["Find the model of best fit with ModelFinder"](using_model_finder.md).
7 changes: 7 additions & 0 deletions docs/api/model/make_model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# make_model

::: piqtree.make_model

## Usage

For usage, see ["Use different kinds of substitution models"](../../quickstart/using_substitution_models.md).
4 changes: 4 additions & 0 deletions docs/api/model/model_finder.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# model_finder

::: piqtree.model_finder

## Usage

For usage, see ["Find the model of best fit with ModelFinder"](using_model_finder.md).
1 change: 1 addition & 0 deletions docs/quickstart/construct_ml_tree.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,5 @@ tree = build_tree(aln, model, num_threads=4)
## See also

- For how to specify a `Model`, see ["Use different kinds of substitution models"](using_substitution_models.md).
- For selecting the best `Model`, see ["Find the model of best fit with ModelFinder"](using_model_finder.md).
- For fitting branch lengths to a tree topology see ["Fit branch lengths to a tree topology from an alignment"](fit_tree_topology.md).
85 changes: 84 additions & 1 deletion docs/quickstart/using_model_finder.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,86 @@
# Find the model of best fit with ModelFinder

⚠️ This page is under construction ⚠️
IQ-TREE's ModelFinder can be used to automatically find the model of best fit for an alignment using [`model_finder`](../api/model/model_finder.md).
The best scoring model under either **the *Akaike information criterion* (AIC), *corrected Akaike information criterion* (AICc), or the *Bayesian information criterion* (BIC) can be selected.

## Usage

### Basic Usage

Construct a `cogent3` alignment object, then construct a maximum-likelihood tree.

```python
from cogent3 import load_aligned_seqs
from piqtree import model_finder

aln = load_aligned_seqs("my_alignment.fasta", moltype="dna")

result = model_finder(aln)

best_aic_model = result.best_aic
best_aicc_model = result.best_aicc
best_bic_model = result.best_bic
```

### Specifying the Search Space

We expose the `mset`, `mfreq` and `mrate` parameters from IQ-TREE's ModelFinder to specify the substitution model search space, base frequency search space, and rate heterogeneity search space respectively. They can be specified as a set of strings in either `model_set`, `freq_set` or `rate_set` respectively.

```python
from cogent3 import load_aligned_seqs
from piqtree import model_finder

aln = load_aligned_seqs("my_alignment.fasta", moltype="dna")

result = model_finder(aln, model_set={"HKY", "TIM"})

best_aic_model = result.best_aic
best_aicc_model = result.best_aicc
best_bic_model = result.best_bic
```

### Reproducible Results

For reproducible results, a random seed may be specified.
> **Caution:** 0 and None are equivalent to no random seed being specified.

```python
from cogent3 import load_aligned_seqs
from piqtree import model_finder

aln = load_aligned_seqs("my_alignment.fasta", moltype="dna")

result = model_finder(aln, rand_seed=5)

best_aic_model = result.best_aic
best_aicc_model = result.best_aicc
best_bic_model = result.best_bic
```

### Multithreading

To speed up computation, the number of threads to be used may be specified.
By default, the computation is done on a single thread. If 0 is specified,
then IQ-TREE attempts to determine the optimal number of threads.

> **Caution:** If 0 is specified with small datasets, the time to determine the
> optimal number of threads may exceed the time to find the maximum likelihood
> tree.

```python
from cogent3 import load_aligned_seqs
from piqtree import model_finder

aln = load_aligned_seqs("my_alignment.fasta", moltype="dna")

result = model_finder(aln, num_threads=4)

best_aic_model = result.best_aic
best_aicc_model = result.best_aicc
best_bic_model = result.best_bic
```

## See also

- For constructing a maximum likelihood tree, see ["Construct a maximum likelihood phylogenetic tree"](construct_ml_tree.md).
- For how to specify a `Model`, see ["Use different kinds of substitution models"](using_substitution_models.md).
10 changes: 10 additions & 0 deletions docs/quickstart/using_substitution_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,16 @@ sym_discrete_gamma_4 = Model("SYM", rate_model=FreeRateModel())
sym_invar_discrete_gamma_8 = Model("SYM", rate_model=FreeRateModel(8), invariant_sites=True)
```

### Making Model Classes from IQ-TREE Strings

For the supported model types, the Model class can be created by using [`make_model`](../api/model/make_model.md) on the IQ-TREE string representation of the model.

```python
from piqtree import make_model

model = make_model("GTR+FQ+I+R3")
```

## See also

- Use a [`Model`](../api/model/Model.md) to construct a maximum likelihood tree: ["Construct a maximum likelihood phylogenetic tree"](construct_ml_tree.md).
Expand Down
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,9 @@ nav:
- api/tree/random_trees.md
- Substitution Models:
- api/model/model_finder.md
- api/model/ModelFinderResult.md
- api/model/Model.md
- api/model/make_model.md
- api/model/SubstitutionModel.md
- api/model/FreqType.md
- api/model/RateModel.md
Expand Down
2 changes: 2 additions & 0 deletions src/piqtree/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from piqtree._data import dataset_names, download_dataset
from piqtree.iqtree import (
ModelFinderResult,
TreeGenMode,
build_tree,
fit_tree,
Expand All @@ -25,6 +26,7 @@

__all__ = [
"Model",
"ModelFinderResult",
"TreeGenMode",
"__iqtree_version__",
"available_freq_type",
Expand Down
44 changes: 44 additions & 0 deletions src/piqtree/iqtree/_model_finder.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,24 @@ def from_string(cls, val: str) -> "ModelResultValue":

@dataclasses.dataclass(slots=True)
class ModelFinderResult:
"""Data returned by ModelFinder.

Attributes
----------
source: str
Source of the alignment.
raw_data: dict[str, Any]
Raw data returned by ModelFinder.
best_aic: Model
The best AIC model.
best_aicc: Model
The best AICc model.
best_bic: Model
The best BIC model.
model_stats:
Semi-processed representation of raw_data.
"""

source: str
raw_data: dataclasses.InitVar[dict[str, Any]]
best_aic: Model = dataclasses.field(init=False)
Expand Down Expand Up @@ -104,6 +122,32 @@ def model_finder(
rand_seed: int | None = None,
num_threads: int | None = None,
) -> ModelFinderResult | c3_types.SerialisableType:
"""Find the models of best fit for an alignment using ModelFinder.

Parameters
----------
aln : c3_types.AlignedSeqsType
The alignment to find the model of best fit for.
model_set : Iterable[str] | None, optional
Search space for models.
Equivalent to IQ-TREE's mset parameter, by default None
freq_set : Iterable[str] | None, optional
Search space for frequency types.
Equivalent to IQ-TREE's mfreq parameter, by default None
rate_set : Iterable[str] | None, optional
Search space for rate heterogeneity types.
Equivalent to IQ-TREE's mrate parameter, by default None
rand_seed : int | None, optional
The random seed - 0 or None means no seed, by default None.
num_threads: int | None, optional
Number of threads for IQ-TREE 2 to use, by default None (single-threaded).
If 0 is specified, IQ-TREE attempts to find the optimal number of threads.

Returns
-------
ModelFinderResult | c3_types.SerialisableType
Collection of data returned from IQ-TREE's ModelFinder.
"""
source = aln.info.source
if rand_seed is None:
rand_seed = 0 # The default rand_seed in IQ-TREE
Expand Down
12 changes: 12 additions & 0 deletions src/piqtree/model/_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,18 @@ def invariant_sites(self) -> bool:


def make_model(iqtree_str: str) -> Model:
"""Convert an IQ-TREE model specification into a Model class.

Parameters
----------
iqtree_str : str
The IQ-TREE model string.

Returns
-------
Model
The equivalent Model class.
"""
if "+" not in iqtree_str:
return Model(iqtree_str)

Expand Down
Loading