Skip to content

a little library containing a single class to help me with evaluating discrete data consistency.

License

Notifications You must be signed in to change notification settings

ZoomMan12312/discrete_consistency_info

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

discrete_consistency_info

a little library containing a single class to help me with evaluating discrete data consistency.

mushrooms.csv is a dataset from kaggle https://www.kaggle.com/uciml/mushroom-classification

Instructions: Begin by creating an instance of the ConsistencyInfo() class: ci = ConsistencyInfo(x_vectors, x_colNames, y_vector, *tol=15)

x_vectors(numpy array): a transposed matrix of the x columns.

x_colNames(numpy array/list): an array/list of the x column names.

y_vector(numpy array): a n-dimensional vector with n y elements.

tol(int/float)(optional, default=15): a tolerance value for high number of unique attributes in features. The higher the number, the higher the tolerance, as long as the number is above 1. If 0 < tol < 1, then the tolerance for highly unique attributes will start getting bigger as tol approaches 0.

attributes of the ConsistencyInfo class:

.sorted_pairs: a list of features in the form [[consistency, feature name, original index in x_vectors], ...]

.top_feature: the most consistent feature in the form [consistency, feature name, original index in x_vectors]

About

a little library containing a single class to help me with evaluating discrete data consistency.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages