Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Nick Wardle committed Jul 12, 2023
2 parents 9729f55 + 4da5665 commit 7ee02e5
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 25 deletions.
4 changes: 4 additions & 0 deletions contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,10 @@ mkdocs serve

from the main repository directory. mkdocs will then print a link you can open to check the page generated in your browser.

**NOTE:** mkdocs builds which use internal links (or images etc) with absolute paths will work for local deployment, but will break when deployed to the public documentations pages.
Please ensure you use relative paths. Currently, this is the only known feature where the behvaiour differs between local mkdocs and public pages deployment.
If you'd like to test the deployment directly, the suggested method is to set up a docs page using your personal github account; this should mimic the exact settings of the official page.

## Big Contributions

We welcome large contributions to combine.
Expand Down
59 changes: 34 additions & 25 deletions docs/part3/commonstatsmethods.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,8 +186,7 @@ Again, the resulting limit tree will contain the result. You can also save the c

Exclusion regions can be made from the posterior once an ordering principle is defined to decide how to grow the contour (there's infinite possible regions that contain 68% of the posterior pdf). Below is a simple example script which can be used to plot the posterior distribution from these chains and calculate the *smallest* such region. Note that in this example we are ignoring the burn-in (but you can add it by just editing `for i in range(mychain.numEntries()):` to `for i in range(200,mychain.numEntries()):` eg for a burn-in of 200.

<details>
<summary><b>Show example script</b></summary>
/// details | **Show example script**
<pre class="python"><code>
import ROOT

Expand Down Expand Up @@ -250,7 +249,7 @@ lu.Draw()

print " %g %% (%g %%) interval (target) = %g < r < %g "%(trueCL,CL,vl,vu)
</code></pre>
</details>
///

Running the script on the output file produced for the same datacard (including the `--saveChain` option) will produce the following output

Expand All @@ -265,15 +264,15 @@ An example to make contours when ordering by probability density is in [bayesCon

The `MarkovChainMC` algorithm has many configurable parameters, and you're encouraged to experiment with those because the default configuration might not be the best for you (or might not even work for you at all)

##### Iterations, burn-in, tries
#### Iterations, burn-in, tries

Three parameters control how the MCMC integration is performed:

- the number of **tries** (option `--tries`): the algorithm will run multiple times with different ransom seeds and report as result the truncated mean and rms of the different results. The default value is 10, which should be ok for a quick computation, but for something more accurate you might want to increase this number even up to ~200.
- the number of **iterations** (option `-i`) determines how many points are proposed to fill a single Markov Chain. The default value is 10k, and a plausible range is between 5k (for quick checks) and 20-30k for lengthy calculations. Usually beyond 30k you get a better tradeoff in time vs accuracy by increasing the number of chains (option `--tries`)
- the number of **burn-in steps** (option `-b`) is the number of points that are removed from the beginning of the chain before using it to compute the limit. The default is 200. If your chain is very long, you might want to try increase this a bit (e.g. to some hundreds). Instead going below 50 is probably dangerous.

##### Proposals
#### Proposals

The option `--proposal` controls the way new points are proposed to fill in the MC chain.

Expand Down Expand Up @@ -360,8 +359,7 @@ For relatively simple models, the observed and expected limits can be calculated
combine realistic-counting-experiment.txt -M HybridNew --LHCmode LHC-limits
```

<details>
<summary><b>Show output</b></summary>
/// details | **Show output**

<pre><code> <<< Combine >>>
>>> including systematics
Expand Down Expand Up @@ -531,8 +529,9 @@ Fit to 5 points: 1.91034 +/- 0.0388334
-- Hybrid New --
Limit: r < 1.91034 +/- 0.0388334 @ 95% CL
Done in 0.01 min (cpu), 4.09 min (real)
Failed to delete temporary file roostats-Sprxsw.root: No such file or directory</pre></code>
</details>
Failed to delete temporary file roostats-Sprxsw.root: No such file or directory
</pre></code>
///

The result stored in the **limit** branch of the output tree will be the upper limit (and its error stored in **limitErr**). The default behavior will be, as above, to search for the upper limit on **r** however, the values of $p_{\mu}, p_{b}$ and CL<sub>s</sub> can be calculated for a particular value **r=X** by specifying the option `--singlePoint=X`. In this case, the value stored in the branch **limit** will be the value of CL<sub>s</sub> (or $p_{\mu}$) (see the [FAQ](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part4/usefullinks/#faq) section).

Expand Down Expand Up @@ -616,6 +615,7 @@ The resulting output file will contain a canvas showing the distribution of the

///


## Computing Significances with toys

Computation of expected significance with toys is a two step procedure: first you need to run one or more jobs to construct the expected distribution of the test statistic. As with setting limits, there are a number of different configurations for generating toys but we will use the preferred option using,
Expand Down Expand Up @@ -724,11 +724,17 @@ combine -M FitDiagnostics -d combined_card.root -n _fit_CRonly_result --saveShap

By taking the total background, the total signal, and the data shapes from FitDiagnostics output, we can compare the post-fit predictions from the S+B fit (first case) and the CR-only fit (second case) with the observation as reported below:

??? "FitDiagnostics S+B fit"
![](images/result_fitSB.png)
/// details | **FitDiagnostics S+B fit**

![](images/result_fitSB.png)

///

/// details | **FitDiagnostics CR-only fit**

??? "FitDiagnostics CR-only fit"
![](images/result_fitCRonly.png)
![](images/result_fitCRonly.png)

///

To compute a p-value for the two results, one needs to compare the observed goodness-of-fit value previously computed with expected distribution of the test-statistic obtained in toys:

Expand All @@ -739,11 +745,15 @@ To compute a p-value for the two results, one needs to compare the observed good

where the former gives the result for the S+B model, while the latter gives the test-statistic for CR-only fit. The command `--setParameters r=0,mask_ch1=1` is needed to ensure that toys are thrown using the nuisance parameters estimated from the CR-only fit to the data. The comparison between the observation and the expected distribition should look like the following two plots:

??? "Goodness-of-fit for S+B model"
![](images/gof_sb.png)
/// details | **Goodness-of-fit for S+B model**

![](images/gof_sb.png)
///

??? "Goodness-of-fit for CR-only model"
![](images/gof_CRonly.png)
/// details | **Goodness-of-fit for CR-only model**

![](images/gof_CRonly.png)
///

### Making a plot of the GoF test-statistic distribution

Expand Down Expand Up @@ -843,8 +853,7 @@ As an example, lets produce the $-2\Delta\ln{\mathcal{L}}$ scan as a function of
combine toy-hgg-125.root -M MultiDimFit --algo grid --points 2000 --setParameterRanges r_qqH=0,10:r_ggH=0,4 -m 125 --fastScan
```

<details>
<summary><b>Show output</b> </summary>
/// details | **Show output**
<pre><code>
<<< Combine >>>
>>> including systematics
Expand Down Expand Up @@ -880,7 +889,7 @@ Point 220/2025, (i,j) = (4,40), r_ggH = 0.400000, r_qqH = 9.000000

Done in 0.00 min (cpu), 0.02 min (real)
</code></pre>
</details>
///

The scan, along with the best fit point can be drawn using root,

Expand Down Expand Up @@ -924,7 +933,7 @@ If you suspect your fits/uncertainties are not stable, you may also try to run c
For a full list of options use `combine -M MultiDimFit --help`
##### Fitting only some parameters
#### Fitting only some parameters
If your model contains more than one parameter of interest, you can still decide to fit a smaller number of them, using the option `--parameters` (or `-P`), with a syntax like this:
Expand Down Expand Up @@ -993,7 +1002,7 @@ The point belongs to your confidence region if $p_{x}$ is larger than $\alpha$ (
!!! warning
You should not use this method without the option `--singlePoint`. Although combine will not complain, the algorithm to find the crossing will only find a single crossing and therefore not find the correct interval. Instead you should calculate the Feldman-Cousins intervals as described above.

#### Physical boundaries
### Physical boundaries

Imposing physical boundaries (such as requiring $\mu>0$ for a signal strength) is achieved by setting the ranges of the physics model parameters using

Expand All @@ -1017,11 +1026,11 @@ This can sometimes be an issue as Minuit may not know if has successfully conver
As in general for `HybridNew`, you can split the task into multiple tasks (grid and/or batch) and then merge the outputs, as described in the [combineTool for job submission](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#combinetool-for-job-submission) section.


#### Extracting contours
### Extracting contours

As in general for `HybridNew`, you can split the task into multiple tasks (grid and/or batch) and then merge the outputs with `hadd`. You can also refer to the [combineTool for job submission](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#combinetool-for-job-submission) section for submitting the jobs to the grid/batch.

##### 1D intervals
#### Extracting 1D intervals

For *one-dimensional* models only, and if the parameter behaves like a cross-section, the code is somewhat able to do interpolation and determine the values of your parameter on the contour (just like it does for the limits). As with limits, read in the grid of points and extract 1D intervals using,

Expand All @@ -1033,7 +1042,7 @@ The output tree will contain the values of the POI which crosses the critical va

You can produce a plot of the value of $p_{x}$ vs the parameter of interest $x$ by adding the option `--plot <plotname>`.

##### 2D contours
#### 2D contours

There is a tool for extracting *2D contours* from the output of `HybridNew` located in `test/makeFCcontour.py` provided the option `--saveHybridResult` was included when running `HybridNew`. It can be run with the usual combine output files (or several of them) as input,

Expand Down

0 comments on commit 7ee02e5

Please sign in to comment.