You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal of this discussion is to evaluate whether we should apply any operations on X (the union of X_train and X_test). @htcai cautioned against feature selection/transformation on the entire X: #18 (comment).
What are the drawbacks and advantages of selection/transformation on an X that includes X_test?
The text was updated successfully, but these errors were encountered:
k-fold cross-validation might be useful for undermining/eliminating biased estimate of the model's performance. This method is also available in sklearn. I'm not sure whether this has already been taken into account. Just in case.
@htcai my pull request #18 uses GridSearchCV, which performs cross validation behind the scenes. For your reference, the cross validation occurs inside X_train as defined in the comment above.
Imagine splitting the data as follows, where X is the complete feature matrix and y is the outcome array (
train_test_split
doc):The goal of this discussion is to evaluate whether we should apply any operations on
X
(the union ofX_train
andX_test
). @htcai cautioned against feature selection/transformation on the entireX
: #18 (comment).What are the drawbacks and advantages of selection/transformation on an
X
that includesX_test
?The text was updated successfully, but these errors were encountered: