Should testing data be used for unsupervised feature tranformation or selection #23

dhimmel · 2016-08-01T16:01:59Z

Imagine splitting the data as follows, where X is the complete feature matrix and y is the outcome array (train_test_split doc):

X_train, X_test, y_train, y_test = sklearn.cross_validation.train_test_split(X, y)

The goal of this discussion is to evaluate whether we should apply any operations on X (the union of X_train and X_test). @htcai cautioned against feature selection/transformation on the entire X: #18 (comment).

What are the drawbacks and advantages of selection/transformation on an X that includes X_test?

The text was updated successfully, but these errors were encountered:

htcai · 2016-08-01T18:47:19Z

k-fold cross-validation might be useful for undermining/eliminating biased estimate of the model's performance. This method is also available in sklearn. I'm not sure whether this has already been taken into account. Just in case.

dhimmel · 2016-08-01T18:52:27Z

@htcai my pull request #18 uses GridSearchCV, which performs cross validation behind the scenes. For your reference, the cross validation occurs inside X_train as defined in the comment above.

dhimmel mentioned this issue Aug 1, 2016

Use grid_search in notebook and add visualization #18

Merged

dhimmel mentioned this issue Sep 30, 2016

Finding which features are passed to the final estimator of an sklearn pipeline scikit-learn/scikit-learn#7536

Closed

rdvelazquez closed this as completed Oct 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should testing data be used for unsupervised feature tranformation or selection #23

Should testing data be used for unsupervised feature tranformation or selection #23

dhimmel commented Aug 1, 2016 •

edited

Loading

htcai commented Aug 1, 2016 •

edited

Loading

dhimmel commented Aug 1, 2016 •

edited

Loading

Should testing data be used for unsupervised feature tranformation or selection #23

Should testing data be used for unsupervised feature tranformation or selection #23

Comments

dhimmel commented Aug 1, 2016 • edited Loading

htcai commented Aug 1, 2016 • edited Loading

dhimmel commented Aug 1, 2016 • edited Loading

dhimmel commented Aug 1, 2016 •

edited

Loading

htcai commented Aug 1, 2016 •

edited

Loading

dhimmel commented Aug 1, 2016 •

edited

Loading