-
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression 2.28 #1055
Comments
(i'm the same rfc from discourse) I suspect it's the diagnostic output. With cmdstan
Could it be possible that cmdstan
cmdstan/src/cmdstan/command.hpp Lines 501 to 504 in bc4bc94
|
That is a very nice catch @fcostin!! You are spot on. Commenting out all the diagnostic writes in the stan submodule gets the performance back. So we just need to figure out how to handle this in command.hpp. |
It looks like the culprit is In the case where the user does not specify a diagnostic_file, Internally, Curiously, removing the the If I add an explicit bool toggle to it looks like the default stan A clean way to do it could perhaps be if Here's my dodgy patch:
|
Thanks @fcostin for the suggested fix. I will let @SteveBronder chime in, as he know more about the decisions that went into this. |
So is the performance regression due to buffering output after all? That's done to avoid cluttered output under threading. Hopefully we can make the buffer more efficient as this thing is there for a good reason. Also: Is the performance regression seen for any model which does some more intense computations? Binomial likelihood with normalisation - how is that (e.g. the Bernoulli example model)? Or more complicated stuff like a negative binomial? I am having the impression that almost trivial models are affected by this... which does not mean we should not fix it, but maybe not at "any cost"? |
Its due to buffering the output for diagnostics even when the diagnostics file is
The more computation there is, the less obvious the performance hit is.
I agree that trivial models probably arent the focal point, but I think the solution here should be trivial. Worst case we add a boolean like @fcostin proposes. The essential problem was that pre-2.28, this line produced a no-op diagnostic writer: cmdstan/src/cmdstan/command.hpp Line 211 in 998c08f
while this line https://github.com/stan-dev/cmdstan/blob/v2.28.1/src/cmdstan/command.hpp#L502 does not do that. |
Sounds like an easy matter then to disable the diagnostic output whenever its not requested. |
i agree with @rok-cesnovar's summary.
That makes sense. The problem is not the buffer itself, but that diagnostic output is getting computed and written to the buffer when a user has not requested any diagnostic output. If the user has requested diagnostic output, performance seems comparable to cmdstan 2.27.0 . When a user does not ask for diagnostic output on this extremely trivial model, cmdstan 2.28.1 spends approximately 50% more CPU instructions than necessary during sampling (!), and these additional CPU instructions are spent string-formatting floating point numbers for diagnostic output, which is then thrown away. i.e. the 50% additional work is pure waste. My example patch is not minimal or the best solution -- there is no need to remove the buffering in It would be cleaner to modify |
Oh dear…I reviewed this pr at the time and did not notice. Hopefully @SteveBronder can sort this out quickly. |
@fcostin think you for the thorough review! Yes I like your solution of making these a pointer up in EDIT: Actually I think we only need a constructor for |
Closed by #1060 |
Description:
A performance regression was identified on Discourse: https://discourse.mc-stan.org/t/speed-difference-between-rstan-and-c
cmdstan-for-a-simple-model/25113
The regression is most evident with a trivial model like:
Execution times with N=10000:
It's much less evident with non-trivial models, but still, something to investigate.
Based on my investigation and investigation by the user
rfc
on Discourse (EDIT: @fcostin on Github) it has to do with writing output samples: https://discourse.mc-stan.org/t/speed-difference-between-rstan-and-cmdstan-for-a-simple-model/25113/23It's probably the changes made to work with multiple chains.
EDIT: #987 was confirmed to be the issue.
Current Version:
v2.28.1
The text was updated successfully, but these errors were encountered: