Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Anomaly detection module: Add informational tags back to estimator docs #2490

Open
inclinedadarsh opened this issue Jan 12, 2025 · 6 comments
Labels
anomaly detection Anomaly detection package documentation Improvements or additions to documentation maintenance Continuous integration, unit testing & package distribution

Comments

@inclinedadarsh
Copy link
Contributor

inclinedadarsh commented Jan 12, 2025

Context

The anomaly detection estimators documented their input data format, output data format, and learning type in a manually maintained capabilities table in the estimator docs.
In #2468, we discussed the need of removing the manually added capabilities table in the Anomaly Detection module in favor of the automatically generated capabilities table from the estimators' tags (#2468 (comment)). The information about the output data format and learning type is now missing. We should add it back.

Please checkout #1430 & #2468 for more information.

Prerequisites

#1430 & #2468

Details

The following information is now missing in the docs:

  • CBLOF

    • Output data format: anomaly scores
    • Learning type: unsupervised or semi-supervised
  • COPOD

    For COPOD, there is no table in the documentations itself, however, while looking at the code base there seems to be a table, which isn't present in the website for some reason. This information was taken from the codebase directly.

    • Output data format: anomaly scores
    • Learning type: unsupervised or semi-supervised
  • DWT_MLEAD

    • Output data format: anomaly scores
    • Learning type: unsupervised
  • IsolationForest

    • Output data format: anomaly scores
    • Learning type: unsupervised or semi-supervised
  • KMeansAD

    • Output data format: anomaly scores
    • Learning type: unsupervised or semi-supervised
  • LOF

    • Output data format: anomaly scores
    • Learning type: unsupervised or semi-supervised
  • LeftSTAMPi

    • Output data format: anomaly scores
    • Learning type: unsupervised
  • MERLIN

    • Output data format: binary classification
    • Learning type: unsupervised
  • OneClassSVM

    • Output data format: anomaly scores
    • Learning type: semi-supervised
  • PyODAdapter

    • Output data format: anomaly scores
    • Learning type: unsupervised or semi-supervised
  • STOMP

    • Output data format: anomaly scores
    • Learning type: unsupervised
  • STRAY

    • Output data format: binary classification
    • Learning type: unsupervised
@inclinedadarsh
Copy link
Contributor Author

I'd love to work on this too. Please let me know how can I move forward with it.

@SebastianSchmidl SebastianSchmidl changed the title [DOC] Add output format and learning type information in classes in the anomaly detection module [DOC] Anomaly detection module: Add informational tags back to estimator docs Jan 12, 2025
@SebastianSchmidl SebastianSchmidl added documentation Improvements or additions to documentation maintenance Continuous integration, unit testing & package distribution anomaly detection Anomaly detection package labels Jan 12, 2025
@SebastianSchmidl
Copy link
Member

We have not decided on a good way to do this yet. I see the following options:

  1. Add the information manually to the docs. ❌ I think that's a bad idea.
  2. Remove it. ❌ Also a non-option for me!
  3. Add the information to the estimator tags. ✅ I think that's the better approach because it re-uses the new documentation utilities and would also easily transfer to other modules. However, we do not have a standard way to do this currently.

We need to agree on a good way to represent the information in the current tags-infrastructure. How do other modules handle this?

I already described my proposal in #2468 (comment):

_tags = {
    [...]
    "capability:unsupervised": True,
    "capability:semi-supervised": False,
    "capability:supervised": False,
    "output_format: "anomaly scores"
}

@baraline, @MatthewMiddlehurst what do you think? Does this conflict with other modules? Should we add namespaces (e.g., "ad:output_format") to module-specific tags?

@MatthewMiddlehurst
Copy link
Member

As mentioned in the PR, I think tags is a fine solution to this. The discussion on my end is more surrounding how we should name the tags, but I do not think that is essential for these to be implemented. The content of the tags above seem fine to me, though it may be worth discussion if it's worth turning "Learning type" into three distinct tags also.

@inclinedadarsh
Copy link
Contributor Author

@SebastianSchmidl if this is accepted, then should I continue adding this information back?

@SebastianSchmidl
Copy link
Member

SebastianSchmidl commented Jan 19, 2025

I will take this into our next dev meeting to discuss with other module owners on a naming-scheme. Once, we have agreed to one, I'll come back to you.

@inclinedadarsh
Copy link
Contributor Author

Sure, thanks for the response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
anomaly detection Anomaly detection package documentation Improvements or additions to documentation maintenance Continuous integration, unit testing & package distribution
Projects
None yet
Development

No branches or pull requests

3 participants