Tabular interpretable classifier based on Dempster-Shafer Theory and  Gradient Descent


This repository contains 3 implementations of the classifier:

  • DSClassifier for binary classification problems
  • DSClassifierMulti for multi-class classification problems
  • DSClassifierMultiQ for multi-class classification problems  that also includes the commonality transformation improvement  which makes computations faster

We always recommend using DSClassifierMultiQ since it is the  most stable and fastest implementation. Multi-class implementations can handle binary problems as well.


pip install git+


Import the module

from dsgd import DSClassifierMultiQ

Read the data

The data can be read using pandas, numpy or other libraries

import pandas as pd
data = pd.read_csv("my_data.csv")

After that, separate them into feature vectors and their corresponding  classes

y = data["class"].values
X = data.drop("class").values

Ensure that feature vectors (X) and their classes (y) are a numpy  matrix and a numpy array, respectively. (In the example we use the property DataFrame.values to convert a pandas dataframe to numpy  elements). And also ensure that classes are integers from 0 to num_classes - 1. Strings are not permitted as class values.

Create the model

DSC = DSClassifierMultiQ(3, max_iter=150, debug_mode=True, 
                        lossfn="MSE", min_dloss=0.0001, lr=0.005,

In this step we create the model and set the configuration, the only required parameter is the first which indicates the number of classes in the problem (3 in our case). The rest of the parameters are optional and are the following:

  • lr : Initial learning rate
  • min_iter : Minimum number of epochs to perform in the training phase
  • max_iter : Maximun number of epochs to perform in the training phase
  • min_dloss : Minium variation of loss to consider convergence
  • optim : ( adam | sgd ) Optimization Method
  • lossfn : ( CE | MSE ) Loss function
  • debug_mode : Enables debug in training (prints and outputs metrics)
  • batch_size : For large datasets, the number of records to be  processed together (batch)
  • precompute_rules : Whether to store the result of the rules  computations for each record instead of computing every time.  It speeds up the training but requires more memory.
  • force_precompute : Speeds up the training process but uses more memory, so use it carefully.
  • device : ( cpu | cuda | mps ) Device to use for computations, cuda and mps use GPU and usually is faster than cpu. To use cuda must have a compatible GPU and CUDA installed.

Rule definition

After the model is defined, we need to define the rules. There are 2 ways to define rules: manually and automatically.

Define a rule manually

from dsgd import DSRule
DSC.model.add_rule(DSRule(lambda x: x[0] > 18, "Patient is adult"))

In this case we use the method add_rule from our defined model. This method accepts a DSRule as an argument. A DSRule can be defined directly using its constructor which requires as first argument a lambda  function which given a feature vector x it must return whether the rule is satisfied (a boolean True or False). The second argument is  optional and provides a meaningful description of the rule. In the  example, if the first column of the feature vector indicates the age of a patient, the lambda x : x[0] > 18 is satisfied when the patient is an adult, which matches the description given as the second argument.

Define rules automatically

The model provides methods to generate rules automatically based on  given parameters and statistics. The main two methods to generate rules are explained below.

DSC.model.generate_statistic_single_rules(X, breaks=3, 

Given a sample of feature vectors (usually the same using for training) and a number of breaks n, the model generates simple one-attribute  rules that separate each variable into n+1 equal-number groups. Columns  names are optional and they are only used to generate the descriptions.

DSC.model.generate_mult_pair_rules(X, column_names=names)

Given a sample of feature vectors (usually the same using for training). It creates a rule for each pair of attributes indicating whether they  are both below their means, above their means, or one above and the  other below.


The method fit given a set of feature vectors X and their  corresponding classes y, performs all the training of the model according to the configuration and the rules defined. When this method finishes, the model is trained so that it can predict new instances  as accurate as possible.

Training process performs a lot of computations, thus this method could take several minutes to finish.

When debug_mode is True this method can also print its progress  (e.g. the loss in each iteration) and it also measures and outputs the  time taken in every step.


y_pred = DSC.predict(X_new)

For predicting a set of new feature vectors X_new, the model provides the method predict which returns an array with the predicted classes for each feature vector (in the same notation as used in the fit method).

y_score = DSC.predict_proba(X_new)

The model also provides the method predict_proba which instead of  returning a single value for each feature vector (the predicted class), it returns the  estimated probability of belonging to each class.



The model can explain the decisions it makes. After training the model can show which of the defined rules are most important for the prediction of each class. The method print_most_important_rules prints a summary if these findings, and the method find_most_important_rules returns this information in a structured way.

Save and Load trained models

As explained before, training is a very costly operation. Then it is not  desirable to train the model every single time we perform a new experiment if we already have trained it. To handle this, the model provides methods  to save and load trained models from disk.

Currently the model only saves the rules (lambdas and adjusted values).  However, the other configurations must be set every time. Note that the  model is created when invoking to load_rules_bin so we have already defined its configuration.

Full example

For a full and simple example please refer to the Iris example.  Uncomment and comment lines to see other features.