You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Section "Discovering the built-in frameworks in Amazon SageMaker" in ch. 7 illustrates how to use XGBoost to predict houses prices.
In the example the housing dataset is first transformed and then split into training and validation subsets.
This isn't best practice; instead it's better to split first and transform the training dataset on. More importantly -- the one-hot encoding isn't saved anywhere so how can we use the trained model to make predictions for new houses?
Here is the relevant code snippet:
# One-hot encode
data = pd.get_dummies(data)
# Move labels to first column, which is what XGBoost expects
data = data.drop(['y_no'], axis=1)
data = pd.concat([data['y_yes'], data.drop(['y_yes'], axis=1)], axis=1)
# Shuffle and split into training and validation (95%/5%)
data = data.sample(frac=1, random_state=123)
train_data, val_data = train_test_split(data, test_size=0.05)
The text was updated successfully, but these errors were encountered:
Section "Discovering the built-in frameworks in Amazon SageMaker" in ch. 7 illustrates how to use XGBoost to predict houses prices.
In the example the housing dataset is first transformed and then split into training and validation subsets.
This isn't best practice; instead it's better to split first and transform the training dataset on. More importantly -- the one-hot encoding isn't saved anywhere so how can we use the trained model to make predictions for new houses?
Here is the relevant code snippet:
The text was updated successfully, but these errors were encountered: