Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative Entropy Values and More #29

Open
naji-s opened this issue Sep 27, 2019 · 0 comments
Open

Negative Entropy Values and More #29

naji-s opened this issue Sep 27, 2019 · 0 comments

Comments

@naji-s
Copy link

naji-s commented Sep 27, 2019

Hello. Thanks for this package, but I am running into a lot of troubles with it.

First of all in mi.py you use entropy implementation by Gael Varoquaux which gives negative MI's. I replaced that with sklearn's MI, and got rid of that problem, but still the features end up being chosen don't make sense.

I used iris dataset from sklearn. I replicate a feature, but as you can see the method here ends up picking up the same feature twice which shouldn't be the case. Here is the MWE:

import pandas as pd
import mifs
import pandas as pd

from sklearn import datasets
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
import numpy as np

X = iris_df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)']].values
print (X[:5, :])
X = np.hstack((X[:,2].reshape((-1, 1)), X))
print (X[:5, :])
y = iris_df['petal width (cm)'].values.reshape((1, -1)).squeeze()

# define MI_FS feature selection method
feat_selector = mifs.MutualInformationFeatureSelector(categorical=False, n_features=2)

# find all relevant features
feat_selector.fit(X, y)

# check selected features
print (feat_selector._support_mask)

# check ranking of features
print (feat_selector.ranking_)

# call transform() on X to filter it down to selected features
X_filtered = feat_selector.transform(X)

you can comment or uncomment the appending line.

Also there was no attribute called support for feat_selector and I had to replace that with _support_mask in your example. The code I changed was only the function _get_first_mi
and it is changed to:

def _get_first_mi(i, k, MI_FS):
    n, p = MI_FS.X.shape

    if MI_FS.categorical:
        x = MI_FS.X[:, i].reshape((n, 1))
        MI = _mi_dc(x, MI_FS.y, k)
    else:
        vars = (MI_FS.X[:, i].reshape((n, 1)), MI_FS.y)

        MI = _mi_cc(vars, k)
        from sklearn.feature_selection import mutual_info_regression
        MI_2 = mutual_info_regression(vars[0], vars[1],n_neighbors=k)
    MI = MI_2[0]
    # MI must be non-negative
    if MI > 0:
        return MI
    else:
        return np.nan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant