- Mokapot now uses
numpy.random.Generator
instead of the deprecatednumpy.random.RandomState
API. Newrng
arguments have been added to functions and classes that rely on randomness in lieu of setting a global random seed withnp.random.seed()
. Thanks @sjust-seerbio! (#55)
- Added linting with Ruff to tests and pre-commit hooks (along with others)!
- The PepXML reader, which broke due to a Pandas update.
- Potential bug if lowercase peptide sequences were used and protein-level confidence estimates were enabled
- Multiprocessing led to the same training set being used for all splits (#104).
- Cross-validation classes are now detected by looking for inheritance from the
sklearn.model_selection._search.BaseSearchCV
class.
- Fixed backward compatibility issue for Python <3.10.
- Support for plugins, allowing mokapot to use new models.
- Added a custom Docker image with optional dependencies.
- Confidence objects are now picklable.
- Updated GitHub Actions.
- Migrated to a full pyproject.toml setuptools build. Thanks @jspaezp!
- Fixed the reported mokapot score when group FDR is used.
mokapot.Model()
objects now recorded the CV fold that they were fit on. This means that they can be provided tomokapot.brew()
in any order and still maintain proper cross-validation bins.
- Resolved issue where models were required to have an intercept term.
- The PepXML parser would sometimes try and log transform features with
0
's, resulting in missing values.
- Support for previously trained models in the
brew()
function and the CLI using the--load_models
argument. Thanks @sambenfredj!
- Using
clip_nterm_methionine=True
could result in peptides of lengthmin_length-1
. - Links to example datasets in the documentation.
Thanks to @sambenfredj, @gessulat, @tkschmidt, and @MatthewThe for PR #44, which made these things happen!
- A new command line argument,
--max_workers
. This allows the cross-validation folds to be computed in parallel. - The
PercolatorModel
class now has ann_jobs
parameter, which controls parallelization of the grid search.
- Improved speed by using multiple jobs for grid search by default.
- Parallelization within
mokapot.brew()
now usesjoblib
instead ofconcurrent.futures
.
- Improved documentation and added warnings for
--subset_max_train
. Thanks @jspaezp!
- Fixed bug where the
--keep_decoys
did not work with--aggregate
. Also, added tests to cover this. Thanks @jspaezp!
--keep_decoys
option to the command line interface. Thanks @jspaezp!- Notes about setting a random seed to the Python API documentation. (Issue #30)
- Added more information about peptides that couldn't be mapped to proteins. (Issue #29)
- Loading a saved model with
mokapot.load_model()
would fail because of an update to Pandas that introduced a new exception. We've updated mokapot accordingly.
- Updates to unit tests. Warnings are now treated as errors for system tests.
- Updated the build to align with PEP517
- Support for downstream peptide and protein quantitation with
FlashLFQ. This is accomplished
through the
mokapot.to_flashlfq()
function or theto_flashlfq()
method ofLinearConfidence
objects. Note that to support the FlashLFQ format, you'll need to specify additional columns inread_pin()
or use a PepXML input file (read_pepxml()
). - Added a top-level function for exporting confident PSMs, peptides, and
proteins from one or more
LinearConfidence
objects as a tab-delimited file:mokapot.to_txt()
. - Added a top-level function for reading FASTA files for protein-level
confidence estimates:
mokapot.read_fasta()
. - Tests accompanying the support for the features above.
- Added a "mokapot cookbook" to the documentation with helpful code snippets.
- Corresponding with support for new formats, the
mokapot.read_pin()
function and theLinearPsmDataset
constructor now have many new optional parameters. These specify the columns containing the metadata needed to write the added formats. - Starting mokapot should be slightly faster for Python >= 3.8. We were able to
eliminate the runtime call to setuptools, because of the recent addition of
importlib.metadata
to the standard library, saving a few hundred milliseconds.
- Now checks to verify there are no debugging print statements in the code base when linting.
- Removed debugging print statements.
- Parsing Percolator tab-delimited files with a "DefaultDirection" line.
Label
column is now converted to boolean during PIN file parsing. Previously, problems occurred if theLabel
column was of dtypeobject
.- Parsing modifications from pepXML files were indexed incorrectly on the peptide string.
- Support for parsing PSMs from PepXML input files.
- This changelog.
- Parsing a FASTA file previously failed if an entry was not followed by a sequence. Now, missing sequences are tolerated and a warning is given instead.
- When the learned model was worse than the best feature and the lower scores were better for the best feature, assigning confidence would fail.
- Easy access to grouped confidence estimates in the Python API were not working due to a typo.
- Deprecation warnings from Pandas about the
regex
argument. - Sometimes peptides were removed as shared incorrectly when part of a protein group.
- Refactored and added many new unit and system tests.
- New pull-requests must now improve or maintain test coverage.
- Improved error messages.