From a184800b9ce51bff5279b26114aa70eb7eed0cac Mon Sep 17 00:00:00 2001 From: Provost Simon Date: Wed, 15 Jan 2025 15:11:15 -0500 Subject: [PATCH] refactor(docs): visualisation-based updates [cd build] --- README.md | 232 +++++++++++++------------------------------ docs/contribution.md | 6 +- docs/quick-start.md | 72 ++++++++++---- pyproject.toml | 4 + 4 files changed, 132 insertions(+), 182 deletions(-) diff --git a/README.md b/README.md index 21cf71a9..a9c60737 100644 --- a/README.md +++ b/README.md @@ -3,201 +3,106 @@


- Scikit-longitudinal + + Scikit-longitudinal +
Scikit-longitudinal

A specialised Python library for longitudinal data analysis built on Scikit-learn

- - - - - - - - -
-

βš™οΈ Project Status

-
-

☎️ Contacts

-
- - - -
- - - - - - - - -
- - pdm - - - - pytest -
- - Codecov - -
- - flake8 -
- - pylint -
- - pre-commit - -
- - isort -
- - black -
- - autopep8 - -
-
- - - - -
- - Microsoft Outlook -
- - LinkedIn -
- - Stack Overflow -
- - Google Scholar - -
-
- - - -> 🌟 **Exciting Update**: We're delighted to introduce the brand new v0.1 documentation for Scikit-longitudinal! For a -> deep dive into the library's capabilities and features, -> please [visit here](https://simonprovost.github.io/scikit-longitudinal/). - -> πŸŽ‰ **PyPi is available!**: We published Scikit-Longitudinal, [here](https://pypi.org/project/Scikit-longitudinal/)! - -## πŸ’‘ About The Project - -`Scikit-longitudinal` (Sklong) is a machine learning library designed to analyse -longitudinal data (Classification tasks focussed as of today). It offers tools and models for processing, analysing, -and predicting longitudinal data, with a user-friendly interface that -integrates with the `Scikit-learn` ecosystem. - -Please for further information, visit the [official documentation](https://simonprovost.github.io/scikit-longitudinal/). - -## πŸ› οΈ Installation - -To install `Sklong`, take these two easy steps: - -1. βœ… **Install the latest version of `Sklong`**: - -```shell -pip install Scikit-longitudinal -``` -You could also install different versions of the library by specifying the version number, -e.g. `pip install Scikit-longitudinal==0.0.1`. -Refer to [Release Notes](https://github.com/simonprovost/scikit-longitudinal/releases) - -2. πŸ“¦ **[MANDATORY] Update the required dependencies (Why? See [here](https://github.com/pdm-project/pdm/issues/1316#issuecomment-2106457708))** - -`Scikit-longitudinal` incorporates a modified version of `Scikit-Learn` called `Scikit-Lexicographical-Trees`, -which can be found at [this Pypi link](https://pypi.org/project/scikit-lexicographical-trees/). +
-This revised version guarantees compatibility with the unique features of `Scikit-longitudinal`. -Nevertheless, conflicts may occur with other dependencies in `Scikit-longitudinal` that also require `Scikit-Learn`. -Follow these steps to prevent any issues when running your project. + + + PyPI Version + + + pytest + + + Codecov + + + pylint + + + pre-commit + + + black + + + Ruff + + + UV Managed + + +[simonprovostdev.vercel.app](https://simonprovostdev.vercel.app/) -
-🫡 Simple Setup: Command Line Installation +
-Say you want to try `Sklong` in a very simple environment. Such as without a proper `project.toml` file (`Poetry`, `PDM`, etc). -Run the following command: +--- -```shell -pip uninstall scikit-learn scikit-lexicographical-trees && pip install scikit-lexicographical-trees -``` +# πŸ“° Latest News -*Note: Although the main installation command install both, yet it’s advisable to verify the correct versions used is -`Scikit-Lexicographical-trees` to prevent conflicts.* - +- **Updated Workflow**: Now leveraging [UV](https://docs.astral.sh/uv/) for enhanced project management and dependency resolution. +- **Documentation**: Dive into Scikit-longitudinal's features and capabilities in our [official documentation](https://simonprovost.github.io/scikit-longitudinal/). +- **PyPI Availability**: The library is available on [PyPI](https://pypi.org/project/Scikit-longitudinal/). -
-🫡 Project Setup: Using `PDM` (or any other such as `Poetry`, etc.) +--- -Imagine you have a project being managed by `PDM`, or any other package manager. The example below demonstrates `PDM`. -Nevertheless, the process is similar for `Poetry` and others. Consult their documentation for instructions on excluding a -package. +## πŸ’‘ About The Project -Therefore, to prevent dependency conflicts, you can exclude `Scikit-Learn` by adding the provided configuration -to your `pyproject.toml` file. +`Scikit-longitudinal` (Sklong) is a machine learning library designed to analyse +longitudinal data (Classification tasks focussed as of today). It offers tools and models for processing, analysing, +and predicting longitudinal data, with a user-friendly interface that +integrates with the `Scikit-learn` ecosystem. -```toml -[tool.pdm.resolution] -excludes = ["scikit-learn"] -``` +For more details, visit the [official documentation](https://simonprovost.github.io/scikit-longitudinal/). -*This exclusion ensures Scikit-Lexicographical-Trees (used as `Scikit-learn`) is used seamlessly within your project.* -
+--- -### πŸ’» Developer Notes +## πŸ› οΈ Installation -For developers looking to contribute, please refer to the `Contributing` section of the [official documentation](https://simonprovost.github.io/scikit-longitudinal/). +To install Scikit-longitudinal: -## πŸ› οΈ Supported Operating Systems +1. βœ… Install the latest version: + ```bash + pip install Scikit-longitudinal + ``` -`Scikit-longitudinal` is compatible with the following operating systems: + To install a specific version: + ```bash + pip install Scikit-longitudinal==0.1.0 + ``` -- MacOS ο£Ώ -- Linux 🐧 -- Windows via Docker only (Docker uses Linux containers) πŸͺŸ (To try without but we haven't tested it) +See further in the [Quick Start of the documentation](https://simonprovost.github.io/scikit-longitudinal/quick-start) for more details. -## πŸš€ Getting Started +--- -To perform longitudinal analysis with `Scikit-Longitudinal`, use the -`LongitudinalDataset` class to prepare the dataset. To analyse your -data, use the `LexicoGradientBoostingClassifier` _(i.e. Gradient Boosting variant for Longitudinal Data)_ or another -available -estimator/preprocessor. +## πŸš€ Getting Started -Following that, you can apply the popular _fit_, _predict_, _prodict_proba_, or _transform_ -methods in the same way that `Scikit-learn` does, as shown in the example below. +Here's how to analyse longitudinal data with Scikit-longitudinal: ``` py from scikit_longitudinal.data_preparation import LongitudinalDataset from scikit_longitudinal.estimators.ensemble.lexicographical.lexico_gradient_boosting import LexicoGradientBoostingClassifier -dataset = LongitudinalDataset('./stroke.csv') +dataset = LongitudinalDataset('./stroke.csv') # Note this is a fictional dataset. Use yours! dataset.load_data_target_train_test_split( target_column="class_stroke_wave_4", ) # Pre-set or manually set your temporal dependencies -dataset.setup_features_group(input_data="Elsa") +dataset.setup_features_group(input_data="elsa") model = LexicoGradientBoostingClassifier( features_group=dataset.feature_groups(), - threshold_gain=0.00015 + threshold_gain=0.00015 # Refer to the API for more hyper-parameters and their meaning ) model.fit(dataset.X_train, dataset.y_train) @@ -207,11 +112,16 @@ y_pred = model.predict(dataset.X_test) print(classification_report(y_test, y_pred)) ``` -## πŸ“ How to Cite? +See further in the [Quick Start of the documentation](https://simonprovost.github.io/scikit-longitudinal/quick-start) for more details. + +--- + +## πŸ“ How to Cite + +If you find Scikit-longitudinal helpful, please cite us using the `CITATION.cff` file or via the "Cite this repository" button on GitHub. -Paper has been submitted to a conference. In the meantime, for the repository, utilise the button top right corner of the -repository "How to cite?", or open the following citation file: [CITATION.cff](./CITATION.cff). +--- ## πŸ” License -[MIT License](./LICENSE) \ No newline at end of file +Scikit-longitudinal is licensed under the [MIT License](./LICENSE). diff --git a/docs/contribution.md b/docs/contribution.md index e4cedb35..f7d94f86 100644 --- a/docs/contribution.md +++ b/docs/contribution.md @@ -13,8 +13,8 @@ We appreciate contributions from the community and welcome your ideas, bug repor ### Prerequisites Ensure the following tools are installed: -- [Python 3.9.x](https://www.python.org/downloads/release/python-398/) -- [UV](https://docs.astral.sh/uv/) +* [Python 3.9.x](https://www.python.org/downloads/release/python-398/) +* [UV](https://docs.astral.sh/uv/) --- @@ -61,7 +61,9 @@ uv run pytest -sv tests/ ## ❌ Troubleshooting Errors ### General Issues + If you encounter setup errors: + 1. **Deactivate Environment**: ```bash deactivate diff --git a/docs/quick-start.md b/docs/quick-start.md index ef2f2015..5cb92527 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -7,7 +7,7 @@ hide: Longitudinal datasets contain information about the same cohort of individuals (instances) over time, with the same set of features (variables) repeatedly measured across different time points -(also called `waves`) [1,2]. +(also called `waves`) [1,2,3]. `Scikit-longitudinal` (Sklong) is a machine learning library designed to analyse longitudinal data, also called _Panel data_ in certain fields. Today, Sklong focuses on Longitudinal Machine Learning Classification tasks. @@ -143,37 +143,71 @@ Refer to [UV documentation](https://docs.astral.sh/uv/) for further details. --- +### πŸ’» Developer Notes + +For developers looking to contribute, please refer to the `Contributing` section of the [documentation](https://simonprovost.github.io/scikit-longitudinal/). + +--- + ## πŸš€ Getting Started -To use `Sklong`, start by preparing your dataset using the `LongitudinalDataset` class, and then train a model with tools like `LexicoGradientBoostingClassifier`. +To perform longitudinal machine learning classification using `Sklong`, start by employing the +`LongitudinalDataset` class to prepare your dataset (i.e, data itself, temporal vector, etc.). To analyse your data, +you can utilise for instance the `LexicoGradientBoostingClassifier` or any other available estimator/preprocessor. -Here’s a quick example: +> "The `LexicoGradientBoostingClassifier` in a nutshell: is a variant of +> [Gradient Boosting](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html) +> specifically designed for longitudinal data, using a lexicographical approach that prioritises recent +> `waves` over older ones in certain scenarios [1]. -````python -from sklearn.metrics import classification_report +Next, you can apply the popular _fit_, _predict_, _prodict_proba_, or _transform_ +methods depending on what you previously employed in the same way that `Scikit-learn` does, as shown in the example below: + +``` py from scikit_longitudinal.data_preparation import LongitudinalDataset -from scikit_longitudinal.estimators.ensemble.lexicographical import LexicoGradientBoostingClassifier +from scikit_longitudinal.estimators.ensemble.lexicographical.lexico_gradient_boosting import LexicoGradientBoostingClassifier -# Prepare the dataset -dataset = LongitudinalDataset('./stroke.csv') -dataset.load_data_target_train_test_split(target_column="class_stroke_wave_4") -dataset.setup_features_group(input_data="Elsa") +dataset = LongitudinalDataset('./stroke.csv') # Note this is a fictional dataset. Use yours! +dataset.load_data_target_train_test_split( + target_column="class_stroke_wave_4", +) -# Train the classifier -model = LexicoGradientBoostingClassifier(features_group=dataset.feature_groups(), threshold_gain=0.00015) -model.fit(dataset.X_train, dataset.y_train) +# Pre-set or manually set your temporal dependencies +dataset.setup_features_group(input_data="elsa") -# Evaluate the model +model = LexicoGradientBoostingClassifier( + features_group=dataset.feature_groups(), + threshold_gain=0.00015 # Refer to the API for more hyper-parameters and their meaning +) + +model.fit(dataset.X_train, dataset.y_train) y_pred = model.predict(dataset.X_test) -print(classification_report(dataset.y_test, y_pred)) -```` -For more examples, visit the [Examples](https://simonprovost.github.io/scikit-longitudinal/examples) section of the documentation. +# Classification report +print(classification_report(y_test, y_pred)) +``` + +!!! warning "Neural Networks models" + Please see the documentation's `FAQ` tab for a list of similar projects that may offer + Neural Network-based models, as this project presently does not. + If we are interested in building Neural Network-based models for longitudinal data, + we will announce it in due course. ---- +!!! question "Wants to understand what's the feature_groups? How your temporal dependencies are set via pre-set or manually?" + To understand how to set your temporal dependencies, please refer to the `Temporal Dependency` tab of the documentation. + +!!! question "Wants more to grasp the idea?" + To see more examples, please refer to the `Examples` tab of the documentation. + +!!! question "Wants more control on hyper-parameters?" + To see the full API reference, please refer to the `API` tab. # πŸ“š References > [1] Kelloway, E.K. and Francis, L., 2012. Longitudinal research and data analysis. In Research methods in occupational health psychology (pp. 374-394). Routledge. -> [2] Ribeiro, C. and Freitas, A.A., 2019. A mini-survey of supervised machine learning approaches for coping with ageing-related longitudinal datasets. In 3rd Workshop on AI for Aging, Rehabilitation and Independent Assisted Living (ARIAL), held as part of IJCAI-2019 (num. of pages: 5). \ No newline at end of file +> [2] Ribeiro, C. and Freitas, A.A., 2019. A mini-survey of supervised machine learning approaches for coping with ageing-related longitudinal datasets. In 3rd Workshop on AI for Aging, Rehabilitation and Independent Assisted Living (ARIAL), held as part of IJCAI-2019 (num. of pages: 5). + +> [3] Ribeiro, C. and Freitas, A.A., 2024. A lexicographic optimisation approach to promote more recent +features on longitudinal decision-tree-based classifiers: applications to the English Longitudinal Study +of Ageing. Artificial Intelligence Review, 57(4), p.84ibeiro, C. and Freitas, A.A., 2019. A mini-survey of supervised machine learning approaches for coping with ageing-related longitudinal datasets. In 3rd Workshop on AI for Aging, Rehabilitation and Independent Assisted Living (ARIAL), held as part of IJCAI-2019 (num. of pages: 5). \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index 0dde6b15..1fe39c4b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -57,3 +57,7 @@ package = true override-dependencies = [ "scikit-learn ; sys_platform == 'never'", ] + +[tool.setuptools] +py-modules = [] +license-files = []