/blog/2022/05/20/marginalia/ #48
Replies: 14 comments 10 replies
-
Super interesting! This blog post is helping me a lot with my |
Beta Was this translation helpful? Give feedback.
-
Wonderful blog post, as usual. One may "trick" emmeans in also |
Beta Was this translation helpful? Give feedback.
-
Thanks for this post - extremely useful. However, I still don't know |
Beta Was this translation helpful? Give feedback.
-
Even if you are not super smart (I think you are!), you are a super |
Beta Was this translation helpful? Give feedback.
-
Andrew, this is a wonderful post. I'd like to make a few comments about I'm not sure if comparing the point estimates and 95% CIs of the AME "Conditional and marginal odds ratios (likewise hazard ratios) are like What do you think? |
Beta Was this translation helpful? Give feedback.
-
This really resonated with me. Putting your work out there is always valuable, sometimes in surprisingly positive ways. (Given that it is not in error, that is 😆). Cheers and keep 'em coming. |
Beta Was this translation helpful? Give feedback.
-
Thanks for a super helpful blogpost Andrew! Have used |
Beta Was this translation helpful? Give feedback.
-
Hello. You say "a marginal effect is only a partial derivative". |
Beta Was this translation helpful? Give feedback.
-
Just briefly commenting. I am the developer of emmeans, and a user referred me to this. I have just skimmed it so far. I find it interesting, and at least on the surface I think you are mostly correct in explaining distinctions between the two packages. I am not surprised to find that, with an econ emphasis, you find marginaleffects preferable, and that is for good reason given that it more easily calculates the sorts of things needed in that application area. That's because, as stated in its vignettes, emmeans emphasizes interpretation of experimental data rather than observational data, which I believe is the primary focus of econometric analyses. In the summary, there are several topics where you say such analyses are not supported by emmeans. I think that in at least some of those cases, what you really are saying is that you don't know how to do it with emmeans. For example, the I'm not saying that (emmeans is just as good a choice for your purposes, nor even that I am sure that it supports all of your scenarios. But I do think some things that are "harder to do" are characterized as "impossible." |
Beta Was this translation helpful? Give feedback.
-
Andrew, Thanks for the response and for pointing me to Vincent's interesting comments and comparisons. Those are thoroughly done. The one thing I see not emphasized there is emmeans's default to equally-weighted marginal means of predictions. That is an important distinction and reflects my own emphasis on experimental studies. It really has a lot to do with the types of analyses where we try to characterize a population (observational study) versus what happens if we tweak some dials in the system (experimental study). |
Beta Was this translation helpful? Give feedback.
-
the light switch (categorical) versus dimmer (continuous) makes me so happy |
Beta Was this translation helpful? Give feedback.
-
Andrew - very much appreciate both the open-process spirit and the wonderful examples. In your interpretations of the coefficients produced by the glm() with link = logit, I think you are correct to caution they must be interpreted on the logit scale. However, I wonder if the coefficients in the outputs of the marginaleffects functions may be on the same scale? That is, I think you still need to be cautious and exponentiate the coefficients to then interpret them as odds-ratios... and from there, as percentages. It's a convenient and sometimes confusing quirk of the exponential function that when a coefficient from a logistic regression model is "close" to 0 (on the scale of the linear predictor), it can be read as a close approximation to the percentage increase in the odds of success (if coef is positive) or decrease (if coef is negative). The slope of exp(x) is 1 at x=0, and is "close to linear" for a short range of x values around 0. I struggle with explanations, so here's a few versions of what I'm trying to say. (a) If your estimated beta has coefficients that are "close to 0" (say, between -0.15 and 0.15), exp(beta) will be "close to" the value of (1 - OR)*100%. (b) A specific example: if the coef estimate (on the scale of the linear predictor) was -0.068 --> exp(-0.068) = 0.934. So, we'd often interpret this as a "7% decrease in the odds of success", Similarly, if the coef was 0.125, we could immediate interpret this as "approximately a 13% increase in the odds of success" as exp(0.125) = 13.3% (note the approximation is already getting a bit wonky as we move away from a coef near 0). Two caveats: |
Beta Was this translation helpful? Give feedback.
-
Andrew - thanks for getting back to me. If I recall my state of mind when I made the comment/asked the questions, I was wanting to confirm my understanding of the terminology you used.
Working with clients who sometimes want risk differences from linear probability models, sometimes conditional, sometimes marginal risk differences from a logistic regression model, as well as teaching students the basics of logistic regression and having to explain the "handy but sometimes awkward" coincidence that an logistic regression coefficient of 0.09 means both a 9% increase in the log-odds of "success" and a 9% increase in the odds, as exp(0.09) = 1.09 (approx), has left me inclined to double check that I'm on the same page with them.
Thanks for clarifying.
In any case, I've since been reading some of the tutorials and experimenting a bit with other elements of the package and all I can say is "Amazing work!" The breadth of the package is going to take some time to absorb. A veritable zoo, indeed!
Quick question – with apologies, as I'm sure this is answered elsewhere – but what method is used to achieve SEs for avg_comparisons()? Does the default method depend on the nature of the estimand? For example, in the case of a simple linear model, is this just the SE of the contrast, calculated as we've always done it since the dawn of SAS LSMEANS? And, what about in the case of the g-formula? Bootstrapping? Some form of sandwich estimator or delta method?
That question probably betrays my level of unfamiliarity with the package. RTFM might be a completely reasonable response!
Again, thanks for your response – and your work on this package!
Cheers,
Rob
…________________________________
From: Andrew Heiss ***@***.***>
Sent: September 5, 2024 12:37 AM
To: andrewheiss/ath-quarto ***@***.***>
Cc: Robert Balshaw ***@***.***>; Comment ***@***.***>
Subject: Re: [andrewheiss/ath-quarto] /blog/2022/05/20/marginalia/ (Discussion #48)
Caution! This message was sent from outside the University of Manitoba.
Hi! It turns out that I wasn't getting notifications from GitHub about new comments, so I missed this!
{marginaleffects} returns values on the percentage point scale by default, which is why it's such a powerful and neat package for understanding and interpreting models. There's a detailed logit vignette at the {marginaleffects} documentation<https://marginaleffects.com/vignettes/logit.html> about a bunch of different estimands you can get from logistic regression models.
Here's a quick example with the penguins data, with a binary indicator showing if a penguin is a Gentoo or not:
library(tidyverse)
library(marginaleffects)
library(palmerpenguins)
library(parameters)
penguins <- penguins |>
drop_na(sex) |>
mutate(is_gentoo = species == "Gentoo")
model <- glm(
is_gentoo ~ bill_length_mm + body_mass_g,
data = penguins,
family = binomial(link = "logit")
)
We can get the log odds values and odds ratios from the raw coefficients:
parameters(model)
#> Parameter | Log-Odds | SE | 95% CI | z | p
#> ------------------------------------------------------------------------
#> (Intercept) | -32.40 | 4.71 | [-42.81, -24.21] | -6.88 | < .001
#> bill length mm | 0.09 | 0.06 | [ -0.03, 0.21] | 1.51 | 0.132
#> body mass g | 6.36e-03 | 8.57e-04 | [ 0.00, 0.01] | 7.42 | < .001
#>
#> The model has a log- or logit-link. Consider using `exponentiate =
#> TRUE` to interpret coefficients as ratios.
parameters(model, exponentiate = TRUE)
#> Parameter | Odds Ratio | SE | 95% CI | z | p
#> ----------------------------------------------------------------------
#> (Intercept) | 8.48e-15 | 3.99e-14 | [0.00, 0.00] | -6.88 | < .001
#> bill length mm | 1.09 | 0.06 | [0.97, 1.23] | 1.51 | 0.132
#> body mass g | 1.01 | 8.63e-04 | [1.00, 1.01] | 7.42 | < .001
A 1-mm increase in bill length is associated with a 0.09 increase in the log odds of being a Gentoo (whatever that means), or it is associated with a 9% increase in the likelihood of being a gentoo (exp(0.09) = 1.09 OR). Those two values (β and eβ) are all on weird logit-related scales, and that's all fine and normal.
{marginaleffects} lets you find more interpretable percentage-point-scale values. For instance, if we hold body mass constant and vary bill length, we can see how the probability of being a Gentoo changes across all the values of bill length. The slope is shallow below 40 mmm and is steeper after 50 mm:
plot_predictions(model, condition = "bill_length_mm")
[https://camo.githubusercontent.com/2f0d02cf00c45e57d7ab7147faaadeaaded979dfa653cb4bf9a299dcc02967ce/68747470733a2f2f692e696d6775722e636f6d2f686741566f47622e706e67]<https://camo.githubusercontent.com/2f0d02cf00c45e57d7ab7147faaadeaaded979dfa653cb4bf9a299dcc02967ce/68747470733a2f2f692e696d6775722e636f6d2f686741566f47622e706e67>
We can find the exact slope of that predicted line at different values:
slopes(model, newdata = datagrid(bill_length_mm = c(35, 45, 55)), variables = "bill_length_mm")
#>
#> Term bill_length_mm Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> bill_length_mm 35 0.00602 0.00188 3.21 0.00134 9.5 0.00234 0.00969
#> bill_length_mm 45 0.01200 0.00836 1.44 0.15123 2.7 -0.00439 0.02838
#> bill_length_mm 55 0.01928 0.01706 1.13 0.25847 2.0 -0.01416 0.05272
#>
#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, bill_length_mm, predicted_lo, predicted_hi, predicted, body_mass_g, is_gentoo
#> Type: response
That estimate column shows percentage points. For a penguin with a bill length of 35 mm, a 1-mm increase is associated with a 0.6 percentage point (0.00602 * 100) increase in the probability of being a Gentoo. For a penguin with a bill length of 55 mm, a 1-mm increase is associated with a nearly 2 percentage point (0.01928 * 100) increase in the probability of being a Gentoo. Those three slopes are what you see at the vertical lines here:
plot_predictions(model, condition = "bill_length_mm") +
geom_vline(xintercept = c(35, 45, 55), linetype = "dotted") +
scale_y_continuous(labels = scales::label_percent())
[https://camo.githubusercontent.com/c92bd4da378325a74bfc2ed5b55a29c39b182c4dd7f5efd7d4f00300ed4726a3/68747470733a2f2f692e696d6775722e636f6d2f7039766e73554c2e706e67]<https://camo.githubusercontent.com/c92bd4da378325a74bfc2ed5b55a29c39b182c4dd7f5efd7d4f00300ed4726a3/68747470733a2f2f692e696d6775722e636f6d2f7039766e73554c2e706e67>
—
Reply to this email directly, view it on GitHub<#48 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEXRB77SJLPPB6HWLSUFHSDZU7UYRAVCNFSM6AAAAAASOE4PGOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANJVGMZDSNI>.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
Ah! I should have known you'd have this covered!
In the non-inferiority testing context, people get their shirts in a twist about width of confidence intervals. We're often a bit laissez faire when it comes to potential overestimation of variances when doing superiority testing – it's more conservative that way, after all! - but wider confidence intervals could be seen as anti-conservative when trying to show "pretty much the same".
I'm just wallowing in that literature for a non-inferiority trial with binary outcomes, and a non-inferiority marging expressed as a difference in proportions rather than something more polite like an OR.
Thanks again for the package and the assistance!
Rob
…________________________________
From: Andrew Heiss ***@***.***>
Sent: September 5, 2024 11:47 PM
To: andrewheiss/ath-quarto ***@***.***>
Cc: Robert Balshaw ***@***.***>; Comment ***@***.***>
Subject: Re: [andrewheiss/ath-quarto] /blog/2022/05/20/marginalia/ (Discussion #48)
Caution! This message was sent from outside the University of Manitoba.
Ah, there's a whole page on standard error calculations :) https://marginaleffects.com/vignettes/uncertainty.html
By default all the marginaleffects functions use the delta method, but you can use the vcov argument to change them to to whatever you want (sandwich, robust Stata-like things, bootstrapping, etc)
—
Reply to this email directly, view it on GitHub<#48 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEXRB7ZONBZ4LWI7Y3AYVFDZVEXX7AVCNFSM6AAAAAASOE4PGOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANJWGQ3DCNA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Marginalia: A guide to figuring out what the heck marginal effects, marginal slopes, average marginal effects, marginal effects at the mean, and all these other marginal things are | Andrew Heiss
https://www.andrewheiss.com/blog/2022/05/20/marginalia/
Beta Was this translation helpful? Give feedback.
All reactions