discrete_consistency_info

a little library containing a single class to help me with evaluating discrete data consistency.

mushrooms.csv is a dataset from kaggle https://www.kaggle.com/uciml/mushroom-classification

Instructions: Begin by creating an instance of the ConsistencyInfo() class: ci = ConsistencyInfo(x_vectors, x_colNames, y_vector, *tol=15)

x_vectors(numpy array): a transposed matrix of the x columns.

x_colNames(numpy array/list): an array/list of the x column names.

y_vector(numpy array): a n-dimensional vector with n y elements.

tol(int/float)(optional, default=15): a tolerance value for high number of unique attributes in features. The higher the number, the higher the tolerance, as long as the number is above 1. If 0 < tol < 1, then the tolerance for highly unique attributes will start getting bigger as tol approaches 0.

attributes of the ConsistencyInfo class:

.sorted_pairs: a list of features in the form [[consistency, feature name, original index in x_vectors], ...]

.top_feature: the most consistent feature in the form [consistency, feature name, original index in x_vectors]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
consistency.py		consistency.py
mushrooms.csv		mushrooms.csv
test_consistency.py		test_consistency.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

discrete_consistency_info

About

Releases

Packages

Languages

License

ZoomMan12312/discrete_consistency_info

Folders and files

Latest commit

History

Repository files navigation

discrete_consistency_info

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages