-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
frontend: effective sample size and related stats #48
Conversation
Hi @magland - sorry for not getting this to you sooner. The Stan computations for rhat and ess can be found at https://github.com/stan-dev/stan/tree/develop/src/stan/analyze/mcmc There are a few options implemented even in those files -- the ones actually used by |
I took a crack at translating compute_effective_sample_size() into typescript. In the table, there are two columns (for the time being) N_Eff1 and N_Eff2. The first is the one ported from bayes_kit and the second is the one ported from Stan C++. I don't know what the chances are that I got it right on the first attempt, but the output is within the range of reasonable... although by no means do the two columns match. I guess next step would be to put an example through stansummary and see if it's consistent. (The other derived columns definitely won't match because I have them based on N_Eff1) |
I made #68 so that we can conveniently compare this N_eff2 with the output of stansummary. Unfortunately they are not quite matching. I carefully double checked the code and it appears like an accurate translation. Are we sure stansummary is using that function you linked? I think the next step would be to either (a) build stansummary from source and put in debug print statements to see the values of the variables at each step; or (b) isolate those C++ functions into a minimal project and feed it sample draws, again with print statements, and make a test with the same input data in SP, with console.log statements. The former would be good because we would verify what exactly stansummary was calling, but the latter would be good because it leads to unit tests. |
Now that I’m done with the stanc worker PR I will take a look at this today. I’ll see if I spot anything in the code and if not I can try the approach you describe to see where they first diverge. |
I found the issue - Stan's code is not very clear about which calculation is a sample variance (divide by N - 1) and which is a population variance (divide by N). I'm honestly not 100% confident that this is not a mistake in Stan, but I'd still argue it's better to match the Stan behavior |
Oops, I pushed it to #68 originally. |
Excellent! I'll update the other columns, rearrange the file naming a bit and remove the bayes_kit stuff (we can restore later from mcmc-monitor if needed). |
I wouldn't be surprised if Rhat has a similar issue when ported over -- for your notes, |
Great! I still think it’s worth writing tests and then possibly spinning the stats functions off into their own package so other people could use them, but matching stansummary is enough for me for now |
Something is slightly off with rhat - it's not obvious with the linear regression example, but if you set the disease transmission example to 100 warmup/25 draws you can see differences. I'm trying to track down the source now |
Maybe I used the wrong variance calc again? |
… odd numbered draws
I think it actually had to do with the split_chains function when the number of chains was odd, which I guess means 25 was a lucky choice of number by me Still need to do a bit more validation |
looks like it is now matching up to 6 decimals on both ESS and Rhat Quantiles don't always agree but there are multiple valid ways to break ties there, and I don't think the stan one is necessarily better than any other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few style comments but otherwise I think this is good to go!
const uniqueChainIds = Array.from(new Set(chainIds)).sort(); | ||
const draws: number[][] = new Array(uniqueChainIds.length).fill(0).map(() => []); | ||
for (let i = 0; i < x.length; i++) { | ||
const chainId = chainIds[i]; | ||
const chainIndex = uniqueChainIds.indexOf(chainId); | ||
draws[chainIndex].push(x[i]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on splitting this into a 'drawsByChain` function or similar? It gets duplicated just below in the rhat function, and I imagine something similar is useful for the multiple csvs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I did that.
// check if chains are constant; all equal to first draw's value | ||
let are_all_const = false; | ||
const init_draw = new Array(num_chains).fill(0); | ||
for (let chain_idx = 0; chain_idx < num_chains; chain_idx++) { | ||
const draw = draws[chain_idx]; | ||
for (let n = 0; n < num_draws; n++) { | ||
if (!isFinite(draw[n])) { | ||
// we can't compute ESS if there are non-finite values | ||
return NaN; | ||
} | ||
} | ||
|
||
init_draw[chain_idx] = draw[0]; | ||
|
||
const precision = 1e-12; | ||
if (draw.every(d => Math.abs(d - draw[0]) < precision)) { | ||
are_all_const = true; | ||
} | ||
} | ||
|
||
if (are_all_const) { | ||
// If all chains are constant then return NaN | ||
// if they all equal the same constant value | ||
const precision = 1e-12; | ||
if (init_draw.every(d => Math.abs(d - init_draw[0]) < precision)) { | ||
return NaN; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, want to pull this into a helper? I realize this is also a suggestion I could be making to the Stan implementation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to try to match the C++ implementation as closely as possible, even with the imperfections. If the goal is to make it cleaner, there's a lot that could be done... but I think the goal here is to reproduce exactly.
@@ -1,5 +1,6 @@ | |||
import { FunctionComponent, useMemo } from "react" | |||
import { computeMean, computePercentile, computeStdDev } from "./util" | |||
import { compute_effective_sample_size, compute_split_potential_scale_reduction } from "./stan_stats/stan_stats" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment didn't post with the others - could we move the code currently in ./util
into this folder and then de-duplicate it? I think we have two mean calculations now, for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep stan_stats as self-contained as possible and exposing as minimum as possible, because we're planning to make this a separate package (I think it would be named something other than stan_stats of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was imagining said other package would also include the quantiles, mean, and standard deviation functions. I guess it doesn’t need to, but they’re related and nice to have somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yeah I think that would be nice... but would require some planning.
This adds new columns to the output Summary table
MCSE, N_eff, N_eff/s, R_hat
The calculation of these entities is subtle. The implementation in this PR was carried over from MCMC Monitor, and that was created to match a version of bayes_kit Python package at some point in time.
These values do not exactly match the output of stansummary.
We'll need to decide on which reference implementation to target, and then how to create automated tests.