-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Binning method to ensure monotonicity for continuous features #150
Comments
After a few hours of work, I have developed a function to merge initial bins to ensure monotonicity, see the code and examples below.Hopefully to get comments from industry peers. If the ESC team could consider optimize this functionality ,and adding to the later version, that will be a great pleasure for me.`#!/usr/bin/env python3 -- coding: utf-8 --""" import pandas as pd def bin_monotonic(table,feature,direction): Parameters:
Returns:
"""
Ex1: Construct a simple DataSet to test functionalitytable = pd.DataFrame({'A':list(range(11)) Ex2:import pandas as pd data = pd.read_csv('/test_data.csv') c = toad.transform.Combiner() Adjusting the precision of split pointsbin_adj= bin_ori c.update(bin_adj) Visualize Binning Plotfrom toad.plot import bin_plot Merge Binsfrom toad.stats import IV, feature_bin_stats Position list for Split Pointpos_list = list(set(ex2_dict.values())) Find the corresponding Split Pointsplit_list = bin_adj[col] Update the rulerule = {col:split_list_merge} bin_plot(c.transform(train_selected[[col,'target']], labels=True), x=col, target='target') ` |
The existing binning method, such as chi-square, decision tree, quantile, cannot guarantee monotonicity for continuous features. While for a scorecard in commercial use, we usually require interpretability, and monotonicity is needed. Here, I suggest adding monotonicity for the existing binning methods, especially for quantile binning method.
Look forward to your reply, thanks a lot.
The text was updated successfully, but these errors were encountered: