New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Adding Documentation Section focused on underlying stats without code #839

Merged

kcormi merged 77 commits into cms-analysis:main from kcormi:user_doc_statsforward

Mar 14, 2024

Collaborator

kcormi commented May 16, 2023 •

edited

Loading

This is (the start of) an attempt to help make more clear to users the underlying model, statistical tests etc... being used by combine.

These pages pages are designed to give users concise but thorough and precise references on the details of what is being done.

The existing documentation includes much of this material spread throughout. But it might be helpful to users to have more complete explanations in one easy to find spot with reminders and references back to that material in other parts of the documentation which are focused around how to run procedures and commands.

Open to suggestions/comments at all levels (overall structure, content, flow, choice of notation etc. ).

For those not familiar with setting up the documentation locally to have a look, please see the instructions in the contributing.md document from #838 (you can see it here: https://github.com/kcormi/HiggsAnalysis-CombinedLimit/blob/contributing/contributing.md). A page which should be identical to the one here has also been put up at: https://kcormi.github.io/HiggsAnalysis-CombinedLimit/ -- the new pages are the ones under the 'what combine does' tab.

kcormi added documentation needs work labels

kcormi removed the needs work label

kcormi marked this pull request as ready for review

June 30, 2023 12:53

This was referenced Jul 6, 2023

Nckw add example teststat plots #845

Merged

Need to improve toy generation documentation #847

Open

kcormi force-pushed the user_doc_statsforward branch from 7629e3b to 8b2bc41 Compare

September 27, 2023 05:36

anigamova mentioned this pull request

Fix the Build ci job for python>=3p8 #863

Merged

kcormi force-pushed the user_doc_statsforward branch from 8b2bc41 to c4575db Compare

October 3, 2023 06:00

Collaborator Author

kcormi commented Nov 30, 2023

I've left this open for an unreasonably long time for no good reason. I just gave it another check, and despite what I'm sure are many flaws, I am happy enough with it to merge it and make it public. Unless there are any loud complaints soon, I will go ahead with the merge.

Closer to the time of releasing the paper, I will go through and try to harmonize some notation etc.

kcormi added 20 commits

March 4, 2024 22:48


          Work on introductory sections

9432a92


          Try to fix some math rendering

e997e15


          more attempted math fix

ce50cfc


          Work on fleshing out likelihood


          Try fixing some math rendering

68be330


          math format test

cd4b6c7


          math format test


          math format test

7f34cd6


          math format test

f985c4c


          math format test

0e1580f


          math format test

5be19fc


          math format test

19c4c01


          change mkdocs

a489941


          Update Model and Likelihood, concisify, start stats tests

884b31d


          Minor fixes to likelihood explanations

e3c3656


          Minor wording fix


          WIP on fitting concepts and statistical tests info

cdd73ed


          rename how --> what for accuracy in intro labels

c2706b6


          Major update on statistical tests. Improve introduction section.

d5dced5


          General improvements to structure and details

43bc53f

kcormi added 11 commits

March 8, 2024 15:45


          Update likelihood equation diagram

4af18df


          Fix some more mathematical conventions

721da22


          Some more notation fixes

1e9963d


          More notation and other fixes

8526fc8


          Try updating some text

f93161a


          Some more wording fixes

e305a9e


           more small changes

5e3db88


          typo fix

5bf8856


          Some fixes here and there

370fe63


          Some more minor fixes


          More accurate wording

ddebb2b

kcormi force-pushed the user_doc_statsforward branch from 65ff9f8 to ddebb2b Compare

March 12, 2024 09:24


          Avoid math in headings

73f5d73

nucleosynthesis reviewed

View reviewed changes

docs/what_combine_does/model_and_likelihood.md Outdated

+              The observation model, $\mathcal{M}_0( \vec{\Phi}_{0})$ defines the probability for any set of observations given specific values of the input parameters of the model $\vec{\Phi}_0$.
+              The probability for any observed data is denoted:
+              $$ p_{\mathcal{M}_{0}}(\mathrm{data}; \vec{\Phi}_0 ) $$

Contributor

nucleosynthesis Mar 13, 2024

Do we really need the _{0}? I think it looks better without this subscript

nucleosynthesis reviewed

View reviewed changes

docs/what_combine_does/model_and_likelihood.md Outdated

+              The event-count portion of the model consists of a sum over different processes.
+              The expected observations, $\vec{\lambda}$, are then the sum of the expected observations for each of the processes, $\vec{\lambda} =\sum_{p} \vec{\lambda}_{p}$.
+              The model can also be composed of multiple channels, in which case the expected observation is the set of all expected observations from the various channels $\vec{\lambda}_{0} = \{ \vec{\lambda}_{c1}, \vec{\lambda}_{c2}, .... \vec{\lambda}_{cN}\}$.

Contributor

nucleosynthesis Mar 13, 2024

See above, we have _{0} here but in the previous paragraph, there's no subscript (prefer without)

nucleosynthesis reviewed

View reviewed changes

docs/what_combine_does/model_and_likelihood.md Outdated

+              For any given model, $\mathcal{M}(\vec{\Phi})$, [the likelihood](https://pdg.lbl.gov/2022/web/viewer.html?file=../reviews/rpp2022-rev-statistics.pdf#section.40.1) defines the probability of observing a given dataset.
+              It is numerically equal to the probability of observing the data, given the model.
+              $$ \mathcal{L}_\mathcal{M}(\vec{\Phi};\mathrm{data}) = p_{\mathcal{M}}(\mathrm{data};\vec{\Phi}) $$

Contributor

nucleosynthesis Mar 13, 2024

Given the amount of time we took over the review of the paper for this, I would really try to stick to the paper (specifically, we never write a likelihood with "; data" , and later in the figure and elsewhere we don't have it so I would drop that here, just keep the parameters.

nucleosynthesis reviewed

View reviewed changes

docs/what_combine_does/model_and_likelihood.md Outdated


		The likelihood in combine takes the general form:

		$$ \mathcal{L} = \mathcal{L}_{\textrm{data}} \cdot \mathcal{L}_{\textrm{constraint}} $$

Contributor

nucleosynthesis Mar 13, 2024

can we use "primary" and "auxiliary" as in the paper? instead of data and constraint ?

nucleosynthesis reviewed

View reviewed changes

docs/what_combine_does/model_and_likelihood.md Outdated

+              Where $\mathcal{L}_{\mathrm{data}}$ is equal to the probability of observing the event count data for a given set of model parameters, and $\mathcal{L}_{\mathrm{constraint}}$ represent some external constraints on the parameters.
+              The constraint term may be constraints from previous measurements (such as Jet Energy Scales) or prior beliefs about the value some parameter in the model should have.
+              Both $\mathcal{L}_{\mathrm{data}}$ and $\mathcal{L}_{\mathrm{constraint}}$ can be composed of many sublikelihoods, for example for observations of different bins and constraints on different nuisance parameters.

Contributor

nucleosynthesis Mar 13, 2024

As before (data->primary, constraint->auxiliary)

nucleosynthesis reviewed

View reviewed changes

docs/what_combine_does/model_and_likelihood.md Outdated

+              While we presented the likelihoods for the template and parameteric models separately, they can also be combined into a single likelihood, by treating them each as separate channels.
+              When combining the models, the data likelihoods of the binned and unbinned channels are multiplied.
+              $$ \mathcal{L}_{\mathrm{combined}} = \mathcal{L}_{\mathrm{data}} \cdot \mathcal{L}_\mathrm{constraint} =  (\prod_{c_\mathrm{template}} \mathcal{L}_{\mathrm{data}}^{c_\mathrm{template}}) (\prod_{c_\mathrm{parametric}} \mathcal{L}_{\mathrm{data}}^{c_\mathrm{parametric}}) \mathcal{L}_{\mathrm{constraint}} $$

Contributor

nucleosynthesis Mar 13, 2024

just one more repeat (data->primary, constraint -> auxiliary)

kcormi added 8 commits

March 13, 2024 13:01


          Some more paper conventions I missed

5e88bdf


          some data/primary constraint/auxiliary spots that were missed

247a3d8


          More minor fixes

edbf95c


          More minor consistency of notation points

50fd734


          More notation fixes

c2028dd


          More notation consistency fixing

76d7ce4


          improve a little wording

aac4767


          Slight improvement in statistical tests description

8a56802

Collaborator Author

kcormi commented Mar 14, 2024

Thanks, good points Nick. I changed those cases, I also tried to update the text to match this primary/auxiliary wording better and found some other instances throughout where I made the notation and wording more consistent with what's in the paper.


          Minor comment updates

a4d22c7

kcormi merged commit 8007ee2 into cms-analysis:main

6 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels