Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Minor changes in plot_employee_salaries.py #1369

Merged
merged 1 commit into from
Feb 28, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 17 additions & 13 deletions examples/use_cases/plot_employee_salaries.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,10 @@
# Hence, during our feature engineering, we could potentially drop one of them if the
# final predictive model is sensitive to the collinearity.
#
# When looking at the "Stats" tab, we observe that the "division" and
# "employee_position_title" are two features containing a large number of categories. It
# something that we should consider in our feature engineering.
# * When looking at the "Stats" tab, we observe that the ``division`` and
# ``employee_position_title`` are two features containing a large number of
# categories.
# It is something that we should consider in our feature engineering.
#
# We can store the report in the skore project so that we can easily retrieve it later
# without necessarily having to reload the dataset and recompute the report.
Expand Down Expand Up @@ -147,8 +148,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
# %%
# In the diagram above, we can see what how we performed our feature engineering:
#
# * In the diagram above, we can see what we intend to do as feature engineering.
# For categorical features, we use two approaches: if the number of categories is
# * For categorical features, we use two approaches: if the number of categories is
# relatively small, we use a `OneHotEncoder` and if the number of categories is
# large, we use a `GapEncoder` that was designed to deal with high cardinality
# categorical features.
Expand Down Expand Up @@ -181,7 +181,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
#
# To accelerate any future computation (e.g. of a metric), we cache once and for all the
# predictions of our model.
# Note that we don't necessarily need to cache the predictions as the report will
# Note that we do not necessarily need to cache the predictions as the report will
# compute them on the fly (if not cached) and cache them for us.

# %%
Expand All @@ -194,7 +194,8 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
report.cache_predictions(n_jobs=4)

# %%
# To not lose this cross-validation report, let's store it in our skore project.
# To ensure this cross-validation report is not lost, let us save it in our skore
# project.
my_project.put("Linear model report", report)

# %%
Expand Down Expand Up @@ -225,7 +226,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):

# %%
#
# Let's compute the cross-validation report for this model.
# Let us compute the cross-validation report for this model.
report = CrossValidationReport(estimator=model, X=df, y=y, cv_splitter=5, n_jobs=4)
report.help()

Expand All @@ -248,16 +249,19 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
# Investigating the models
# ^^^^^^^^^^^^^^^^^^^^^^^^
#
# At this stage, we might not been careful and have already overwritten the report and
# model from our first attempt. Hopefully, because we stored the reports in our skore
# project, we can easily retrieve them. So let's retrieve the reports.
# At this point, we may not have been cautious and could have already overwritten the
# report and model from our initial attempt.
# Fortunately, since we saved the reports in our skore project, we can easily recover
# them.
# So, let us retrieve those reports.

linear_model_report = my_project.get("Linear model report")
hgbdt_model_report = my_project.get("HGBDT model report")

# %%
#
# Now that we retrieved the reports, we can make further comparison and build upon some
# usual pandas operations to concatenate the results.
# Now that we retrieved the reports, we can make some further comparison and build upon
# some usual pandas operations to concatenate the results.
import pandas as pd

results = pd.concat(
Expand Down
Loading