Operationalize AI/ML models built using Apache MADlib and Postgres PL/Python.
RTS-For-MADlib enables data scientists to deploy machine learning workflows built using Apache MADlib on Greenplum(postgres) as REST service. RTS-For-MADlib provides a mechanism to deploy models seemlessley to scalable container management systems like Pivotal Container Services (PKS), Google Kubernetes Engine (GKE) and similar container based systems.
RTS-For-MADlib enables the deployment of a AI/ML model as a workflow with components;
- light weight feature engine, which transsforms the incoming payload to model input
- Apache MADlib or PL/python model component
- An optional cache component for feature lookup
- An orchestrator component.
RT4MADlib is based on docker containers. The package contain below components;
- MLModelservice - MADlibModel application
- FeatureEngine - Transformer for payload to featureset
- FeatureCacheManager - An optional feature cache for lookup
- MLModelflow - An orchestrator component
- MLBatch - A micro batch that runs on platform, used in delayed batch scoring scenarios.
- DockerContainers - Base Docker image(s) with java, python, MADlib, Postgres
- RTS4MADlib - A client application that deploys the ML pipeline to PKS or GKE, etc.
The project provides prebuilt containers to download from Pivotal Data Docker repositories. But if you wish to build and deploy containers
to your organization private docker repo, we provide the scripts to build containers in build folder.
In order to build and upload docker images to registry you need to run the below command and provide the credentials when prompted.
docker login docker.io
After successfully login to registry we will start the build process.
###Steps:
-
Build base containers, the below build 2 containers, jdk 11 container and Postgres96 with MADlib and pl/python. The containers will be tagged and uploaded to the registry specified.
$ cd DockerContainers $ ./buildbaseimages.sh --registry $docker_repo
-
Build the rest of the project containers. This step build containers for MLModelflow, FeatureEngine, FeatureCacheManager and MLModelflow Spring boot applications, and uploads them to specified docker registry. Apart from Docker containers this step also build the client deployment command line tool. All the jar files for Spring boot applications and a RT4MADlib client tool tar files will be copied on to $project_root/dist folder.
$ cd $project_root/build $ ./build all.sh -R docker_repo -T release_tag -P push_image example; ./build_all.sh -R pivotaldata -T 1.2 -P yes:
If you wish to build individual containers then there are scripts available in $project_root/build folder. For example;
$ cd $project_root/build
$ ./build_mlmodelservice.sh -R docker_repo -T release_tag -P push_image
$ ./build_featurescachemanager.sh -R docker_repo -T release_tag -P push_image
$ ./build_featuresengine.sh -R docker_repo -T release_tag -P push_image
$ ./build_mlmodelflow.sh -R docker_repo -T release_tag -P push_image
$ ./build_mlmicrobatch.sh -R docker_repo -T release_tag -P push_image
please run below steps. This will create the client tool to deploy models and move to dist folder.
$ $project_root/build
$ ./build_rts4madlib.sh
To install the RTS4MADlib tooling please run the build commands as mentioned in build section. After the build please follow below steps;
- cp Realtime-scoring-for-MADlib/dist/rts4madlib.tar.gz ~/
- cd ~
- tar -zxvf rts4madlib.tar.gz
- cd ~/RTS4MADlib
- ./setup ;
- source ~/.bash_profile or ~/.bashrc
after this please run rts4madlib and you should see below output.
$ rts4madlib
No arguments passed!
Usage:->
-------------------------------------------------------------------------------------------
rts4madlib --name unique_name --type type --action action --target target --inputJson file
name -> module name
action -> deplopy|undeploy
type -> flow|model|feature-engine|featurecache|batch
target -> docker|kubernetes
inputJson -> path to input json for model **only if action is deploy**
--------------------------------------------------------------------------------------------
We provide a prebuilt archive for convenience. Please download the relase from release tab from this git. Once you download please follow below instructions;
1. wget https://github.com/pivotal/Realtime-scoring-for-MADlib/releases/download/1.5/rts4madlib.tar.gz
2. tar -zxvf rts4madlib.tar.gz -C ~/
2. cd ~/RTS4MADlib
3. ./setup ;
4. source ~/.bash_profile or ~/.bashrc
Now we are ready to start deploying models.
RTS4MADlib let you deploy a MADlib Model to Docker, PKS or Kubernetes environments.
In $RTSMADLIB_HOME/samples/ folder we supply some samples to test the model deployment.
If you like to use Jupyter style samples, please navigate to folder $RTSMADLIB_HOME/samples/jupyter_notebooks/ where you find different samples.
Logical Regression - Jupyter notebook
Random Forest - Jupyter notebook
Creedit Application - Jupyter notebook
Credit Fraud application - Jupiter notebook