Documentation

List of Functions/Methods

https://github.com/arita37/mlmodels/blob/dev/README_index_doc.py

Installation

### On Linux/MacOS
pip install numpy<=1.17.0
pip install -e .  -r requirements.txt
pip install   -r requirements_fake.txt


### On Windows
VC 14   https://www.visualstudio.com/downloads/#build-tools-for-visual-studio-2019
pip install numpy<=1.17.0
pip install torch==1..1 -f https://download.pytorch.org/whl/torch_stable.html
pip install -e .  -r requirements_wi.txt
pip install   -r requirements_fake.txt


### No Deps
# pip install -e .  --no-deps

Steps for : Pull Request, a Fix or add a New Model :

Coding Style

Please use Line = 110 characters Only Loose PEP8, "Pretty Code is better than strict PEP8, which is ugly to read..." Please use BLACK Formatter.

0) Read the examples :

https://github.com/arita37/mlmodels/issues/102

https://github.com/arita37/mlmodels/pull/100

1) Create a branch from DEV branch called : keras_modelXXX

 from mlmodels.util import os_package_root_path, log, 
                        path_norm, get_model_uri, path_norm_dict

 ### Use path_norm to normalize your path.
 data_path = path_norm("dataset/text/myfile.txt") -->   PATH.../mlmodels/dataset/text/myfile.txt

4) Create mlmodels/model_XXXX/yyyy.json , following this template :

https://github.com/arita37/mlmodels/blob/dev/mlmodels/template/models_config.json

5) Run/Test on your local machine

source activate py36
cd mlmodels
python model_XXXX/yyyy.py

6) Check on Github your runs

https://github.com/arita37/mlmodels/actions?query=workflow%3Atest_custom_model

7) Do Pull Request to dev Branc.

How to add a new model

Source code structure as below

docs: documentation
mlmodels: interface wrapper for pytorch, keras, gluon, tf, transformer NLP for train, hyper-params searchi.
- model_xxx: folders for each platform with same interface defined in template folder
- dataset: store dataset files for test runs.
- template: template interface wrapper which define common interfaces for whole platforms
- ztest: testing output for each sample testing in model_xxx
ztest: testing output for each sample testing in model_xxx

How to define a custom model

1. Create a file `mlmodels\model_XXXX\mymodel.py` , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....

Declare below classes/functions in the created file:

Class Model()                                                  :   Model definition
      __init__(model_pars, data_pars, compute_pars)            :   
                            
def fit(model, data_pars, model_pars, compute_pars, out_pars ) : Train the model
def fit_metric(model, data_pars, compute_pars, out_pars )         : Measure the results
def predict(model, sess, data_pars, compute_pars, out_pars )   : Predict the results


def get_params(choice, data_path, config_mode)                                               : returnparameters of the model
def get_dataset(data_pars)                                     : load dataset
def test()                                                     : example running the model     
def test_api()                                                 : example running the model in global settings  

def save(model, session, save_pars)                            : save the model
def load(load_pars)                                            : load the trained model

Infos

model :         Model(model_pars), instance of Model() object
sess  :         Session for TF model  or optimizer in PyTorch
model_pars :    dict containing info on model definition.
data_pars :     dict containing info on input data.
compute_pars :  dict containing info on model compute.
out_pars :      dict containing info on output folder.
save_pars/load_pars : dict for saving or loading a model

2. Write your code and create test() to test your code. **

Declare model definition in Class Model()

    self.model = DeepFM(linear_cols, dnn_cols, task=compute_pars['task']) # mlmodels/model_kera/01_deectr.py
    # Model Parameters such as `linear_cols, dnn_cols` is obtained from function `get_params` which return `model_pars, data_pars, compute_pars, out_pars`

Implement pre-process data in function get_dataset which return data for both training and testing dataset Depend on type of dataset, we could separate function with datatype as below example

    if data_type == "criteo":
        df, linear_cols, dnn_cols, train, test, target = _preprocess_criteo(df, **kw)

    elif data_type == "movie_len":
        df, linear_cols, dnn_cols, train, test, target = _preprocess_movielens(df, **kw)

Call fit/predict with initialized model and dataset

    # get dataset using function get_dataset
    data, linear_cols, dnn_cols, train, test, target = get_dataset(**data_pars)
    # fit data
     model.model.fit(train_model_input, train[target].values,
                        batch_size=m['batch_size'], epochs=m['epochs'], verbose=2,
                        validation_split=m['validation_split'], )
    # predict data
    pred_ans = model.model.predict(test_model_input, batch_size= compute_pars['batch_size'])

Calculate metric with predict output

    # input of metrics is predicted output and ground truth data
    def metrics(ypred, ytrue, data_pars, compute_pars=None, out_pars=None, **kwargs):

Example https://github.com/arita37/mlmodels/tree/dev/mlmodels/template https://github.com/arita37/mlmodels/blob/dev/mlmodels/model_gluon/gluon_deepar.py https://github.com/arita37/mlmodels/blob/dev/mlmodels/model_gluon/gluon_deepar.json

3. Create JSON config file inside /model_XXX/mymodel.json **

Separate configure for staging development environment such as testing and production phase then for each staging, declare some specific parameters for model, dataset and also output
Example

    {
        "test": {

              "hypermodel_pars":   {
             "learning_rate": {"type": "log_uniform", "init": 0.01,  "range" : [0.001, 0.1] },
             "num_layers":    {"type": "int", "init": 2,  "range" :[2, 4] },
             "size":    {"type": "int", "init": 6,  "range" :[6, 6] },
             "output_size":    {"type": "int", "init": 6,  "range" : [6, 6] },

             "size_layer":    {"type" : "categorical", "value": [128, 256 ] },
             "timestep":      {"type" : "categorical", "value": [5] },
             "epoch":         {"type" : "categorical", "value": [2] }
           },

            "model_pars": {
                "learning_rate": 0.001,     
                "num_layers": 1,
                "size": 6,
                "size_layer": 128,
                "output_size": 6,
                "timestep": 4,
                "epoch": 2
            },

            "data_pars" :{
              "path"            : 
              "location_type"   :  "local/absolute/web",
              "data_type"   :   "text" / "recommender"  / "timeseries" /"image",
              "data_loader" :  "pandas",
              "data_preprocessor" : "mlmodels.model_keras.prepocess:process",
              "size" : [0,1,2],
              "output_size": [0, 6]              
            },


            "compute_pars": {
                "distributed": "mpi",
                "epoch": 10
            },
            "out_pars": {
                "out_path": "dataset/",
                "data_type": "pandas",
                "size": [0, 0, 6],
                "output_size": [0, 6]
            }
        },
    
        "prod": {
            "model_pars": {},
            "data_pars": {}
        }
    }

#######################################################################################

③ CLI tools: package provide below tools

https://github.com/arita37/mlmodels/blob/dev/README_usage.md

- ml_models    :  mlmodels/models.py
- ml_optim     :  mlmodels/optim.py
- ml_test      :  mlmodels/ztest.py

#######################################################################################

④ Interface

models.py

   module_load(model_uri)
   model_create(module)
   fit(model, module, session, data_pars, out_pars   )
   metrics(model, module, session, data_pars, out_pars)
   predict(model, module, session, data_pars, out_pars)
   save(model, path)
   load(model)

optim.py

   optim(modelname="model_tf.1_lstm.py",  model_pars= {}, data_pars = {}, compute_pars={"method": "normal/prune"}
       , save_folder="/mymodel/", log_folder="", ntrials=2) 

   optim_optuna(modelname="model_tf.1_lstm.py", model_pars= {}, data_pars = {}, compute_pars={"method" : "normal/prune"},
                save_folder="/mymodel/", log_folder="", ntrials=2)

Generic parameters

   Define in models_config.json
   model_params      :  Relative to model definition 
   compute_pars      :  Relative to  the compute process
   data_pars         :  Relative to the input data
   out_pars          :  Relative to outout data

Sometimes, data_pars is required to setup the model (ie CNN with image size...)

#######################################################################################

⑥ Naming convention

Function naming

pd_   :  input is pandas dataframe
np_   :  input is numpy
sk_   :  inout is related to sklearn (ie sklearn model), input is numpy array
plot_


col_ :  function name for column list related.

#####################################################################################

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_addmodel.md

README_addmodel.md

Documentation

Installation

Steps for : Pull Request, a Fix or add a New Model :

Coding Style

0) Read the examples :

1) Create a branch from DEV branch called : keras_modelXXX

2) Change this file with your MODEL_NAME AND BRANCH NAME :

3) After each commit or PullRequest, automatic test is run on Github :

3) Create mlmodels/model_XXXX/yyyyy.py

4) Create mlmodels/model_XXXX/yyyy.json , following this template :

5) Run/Test on your local machine

6) Check on Github your runs

7) Do Pull Request to dev Branc.

How to add a new model

Source code structure as below

How to define a custom model

1. Create a file `mlmodels\model_XXXX\mymodel.py` , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....

2. Write your code and create test() to test your code. **

3. Create JSON config file inside /model_XXX/mymodel.json **

③ CLI tools: package provide below tools

④ Interface

Generic parameters

⑥ Naming convention

Function naming

Files

README_addmodel.md

Latest commit

History

README_addmodel.md

File metadata and controls

Documentation

Installation

Steps for : Pull Request, a Fix or add a New Model :

Coding Style

0) Read the examples :

1) Create a branch from DEV branch called : keras_modelXXX

2) Change this file with your MODEL_NAME AND BRANCH NAME :

3) After each commit or PullRequest, automatic test is run on Github :

3) Create mlmodels/model_XXXX/yyyyy.py

4) Create mlmodels/model_XXXX/yyyy.json , following this template :

5) Run/Test on your local machine

6) Check on Github your runs

7) Do Pull Request to dev Branc.

How to add a new model

Source code structure as below

How to define a custom model

1. Create a file mlmodels\model_XXXX\mymodel.py , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....

2. Write your code and create test() to test your code. **

3. Create JSON config file inside /model_XXX/mymodel.json **

③ CLI tools: package provide below tools

④ Interface

Generic parameters

⑥ Naming convention

Function naming

1. Create a file `mlmodels\model_XXXX\mymodel.py` , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....