This guide applies to the development within the ML-Commons project
This guide is for any developer who wants a running local development environment where you can make, see, and test changes. It's opinionated to get you running as quickly and easily as possible, but it's not the only way to set up a development environment.
If you're only interested in installing and using this plugin features, you can just install Opensearch and ml-commons plugin will be integrated with Opensearch.
If you're planning to contribute code (features or fixes) to this repository, great! Make sure to also read the contributing guide.
ml-commons is primarily a Java based plugin for machine learning in opensearch. To effectively contribute you need to be familiar with Java.
To develop on ml-commons, you'll need:
- A GitHub account
git
for version controlJava
- A code editor of your choice, configured for Java. If you don't have a favorite editor, we suggest Intellij
If you already have these installed or have your own preferences for installing them, skip ahead to the Fork and clone ml-commons section.
If you don't already have it installed (check with git --version
) we recommend following the git
installation guide for your OS.
Resources to get started with git:
You can install any version of Java starting from 17. Jenv
is a good option to use so that you can have multiple versions of Java.
All local development should be done in a forked repository. Fork ml-commons by clicking the "Fork" button at the top of the GitHub repository.
Clone your forked version of ml-commons to your local machine (replace opensearch-project
in the command below with your GitHub username):
$ git clone [email protected]:opensearch-project/ml-commons.git
You can install Opensearch multiple ways:
- https://opensearch.org/downloads.html#docker-compose
- https://opensearch.org/docs/2.5/install-and-configure/install-opensearch/tar/
opensearch.hosts: ["https://localhost:9200"]
opensearch.username: "admin" # Default username
opensearch.password: "admin" # Default password
This package uses the Gradle build system. Gradle comes with excellent documentation that should be your first stop when trying to figure out how to operate or modify the build. we also use the OpenSearch build tools for Gradle. These tools are idiosyncratic and don't always follow the conventions and instructions for building regular Java code using Gradle. Not everything in this package will work the way it's described in the Gradle documentation. If you encounter such a situation, the OpenSearch build tools source code is your best bet for figuring out what's going on.
./gradlew build
builds and tests,./gradlew build buildDeb buildRpm
build RPM and DEB../gradlew run
launches a single node cluster with ml-commons plugin installed./gradlew integTest
launches a single node cluster with ml-commons plugin installed and runs all integration tests except security. Use./gradlew integTest -PnumNodes=<number>
to launch multi-node cluster../gradlew integTest --tests="<class path>.<test method>"
runs a single integration test class or method, for example./gradlew integTest --tests="org.opensearch.ml.rest.RestMLTrainAndPredictIT.testTrainAndPredictKmeansWithEmptyParam"
or./gradlew integTest --tests="org.opensearch.ml.rest.RestMLTrainAndPredictIT"
./gradlew integTest -Dtests.class="<class path>"
run specific integ test class, for example./gradlew integTest -Dtests.class="org.opensearch.ml.rest.RestMLTrainAndPredictIT"
./gradlew integTest -Dtests.method="<method name>"
run specific integ test method, for example./gradlew integTest -Dtests.method="testTrainAndPredictKmeans"
./gradlew integTest -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9200 -Dtests.clustername="docker-cluster" -Dhttps=true -Duser=admin -Dpassword=admin
launches integration tests against a local cluster and run tests with security. Detail steps: (1)download OpenSearch tarball to local and install by runningopensearch-tar-install.sh
; (2)build ML plugin zip with your change and install ML plugin zip; (3)restart local test cluster; (4) run this gradle command to test../gradlew spotlessApply
formats code. And/or import formatting rules in.eclipseformat.xml
with IDE.
When launching a cluster using one of the above commands logs are placed in /build/cluster/run node0/opensearch-<version>/logs
. Though the logs are tied to the console, in practices it's best to check the actual log file.
Sometimes it's useful to attach a debugger to either the OpenSearch cluster or the integ tests to see what's going on. When running unit tests you can just hit 'Debug' from the IDE's gutter to debug the tests. To debug code running in an actual server run:
./gradlew :integTest --debug-jvm # to start a cluster and run integ tests
OR
./gradlew :run --debug-jvm # to just start a cluster that can be debugged
The OpenSearch server JVM will launch suspended and wait for a debugger to attach to localhost:8000
before starting the OpenSearch server.
To debug code running in an integ test (which exercises the server from a separate JVM) run:
./gradlew -Dtest.debug :integTest
The test runner JVM will start suspended and wait for a debugger to attach to localhost:5005
before running the tests.
Effective October 2, 2024, maintainer approval will be required to run GitHub CI/CD workflow actions when pushing a pull request (PR).
This change is being implemented as part of our enhanced security measures. We appreciate your patience and cooperation.
For a list of current maintainers, please refer to MAINTAINERS.md.
All filenames should use CamelCase
.
Right: ml-commons/common/src/main/java/org.opensearch/ml/common/MLModelGroup.java
Wrong: ml-commons/common/src/main/java/org.opensearch/ml/common/ml_model_group.java
We use a version management system. If a line of code is no longer needed, remove it, don't simply comment it out.
Don't do this. Everything should be wrapped in a module that can be depended on by other modules. Even things as simple as a single value should be a module.
Keep your functions short. A good function fits on a slide that the people in the last row of a big room can comfortably read. So don't count on them having perfect vision and limit yourself to ~25 lines of code per function.