Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to automatically rename categorical variables in etable #798

Merged
merged 11 commits into from
Feb 7, 2025

Conversation

dsliwka
Copy link
Contributor

@dsliwka dsliwka commented Jan 25, 2025

This adds a feature to etable that automatically renames categorical variables in regression tables. The user can specify a formatting template (default is "{variable}={value}"). It should work either for pandas categoricals (i.e "y ~ x + c" when c is categorical) and with categorical variables generated in the regression formula with the C() or i() operator.

To do this, the existing rename_categoricals function is slightly extended to allow using formatting templates.

Copy link

codecov bot commented Jan 25, 2025

Codecov Report

Attention: Patch coverage is 96.42857% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pyfixest/report/utils.py 95.00% 1 Missing ⚠️
Flag Coverage Δ
core-tests 80.89% <96.42%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pyfixest/report/summarize.py 87.42% <100.00%> (-0.17%) ⬇️
pyfixest/report/utils.py 97.29% <95.00%> (+6.97%) ⬆️

... and 1 file with indirect coverage changes

@s3alfisc
Copy link
Member

pre-commit.ci autofix

@s3alfisc
Copy link
Member

pre-commit.ci autofix

@s3alfisc s3alfisc self-requested a review February 7, 2025 14:46
@dsliwka
Copy link
Contributor Author

dsliwka commented Feb 7, 2025

Hi @s3alfisc,

thanks for reviewing this PR!

I just realized that I forgot to mention that when using "i(x, ref=v)" then having blanks in category (i.e. level) names leads to an error. That is, i(gender, ref="Male") works well but i(ethnicity, ref="Native American") leads to "ValueError: Value 'NativeAmerican' for TreatmentContrasts.base is not among the provided levels." even though "Native American" is in the levels of the categorical column. As it is often convenient to have category names including blanks, I had added set_first_cat as a workaround. But of course I understand that is independent of pyfixest functionality so it makes sense to drop it.
But would it be possible to keep blanks in level names in parsing?

@s3alfisc
Copy link
Member

s3alfisc commented Feb 7, 2025

"i(x, ref=v)" then having blanks in category (i.e. level) names leads to an error.

Ah ok, I see! I think we should treat this as a bug of the i() parser and I might open a separate issue for it?

Btw, I have also changed the default to not apply the relabeling by default - the reason is that I think there should be consistency in internal variable naming and users should explicitly ask for renaming. Else we might run into users who see variables names in etable() but struggle to understand how they might relate to what is included in .coef(), .tidy() or .summary()?

@dsliwka
Copy link
Contributor Author

dsliwka commented Feb 7, 2025 via email

@s3alfisc s3alfisc merged commit 06c88d5 into py-econometrics:master Feb 7, 2025
8 checks passed
@s3alfisc
Copy link
Member

s3alfisc commented Feb 7, 2025

Merged this one for now, thanks Dirk! On the i-ref and white space issue - let's see if I will manage to fix it, if not happy to discuss bringing back the set_first_cat function =)

@dsliwka
Copy link
Contributor Author

dsliwka commented Feb 7, 2025 via email

@dsliwka dsliwka deleted the master branch February 12, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants