Skip to content

Commit

Permalink
update cmssw plots, add ttbar sample to valid, add multiparticlegun a…
Browse files Browse the repository at this point in the history
…nd vbf to training (#330)

* update cmssw plots, add ttbar sample to valid

* update validation notebook

* disable ray for now

* update README [skip ci]

* remove DQM part [skip ci]
  • Loading branch information
jpata authored Jun 13, 2024
1 parent 0791d61 commit 98d59c2
Show file tree
Hide file tree
Showing 14 changed files with 599 additions and 318 deletions.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@ MLPF focuses on developing full event reconstruction based on computationally sc
- results: https://doi.org/10.5281/zenodo.10567397
- weights: https://huggingface.co/jpata/particleflow/tree/main/clic/clusters/v1.6

### Open datasets:
The following datasets are available to reproduce the studies. They include full Geant4 simulation and reconstruction based on the CLIC detector. We have no affiliation with the CLIC collaboration, therefore these datasets are to be used only for computational studies and come with no warranty.

- MLPF-CLIC, raw data: https://zenodo.org/records/8260741 or https://www.coe-raise.eu/od-pfr
- MLPF-CLIC, processed for ML, tracks and clusters: https://zenodo.org/records/8409592
- MLPF-CLIC, processed for ML, tracks and hits: https://zenodo.org/records/8414225

## MLPF development in CMS

<p float="left">
Expand Down
22 changes: 10 additions & 12 deletions mlpf/data_cms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,16 @@ vector<reco::PFCandidate> "particleFlow" ""
To test MLPF on higher statistics, it's not practical to redo full reconstruction before the particle flow step.
We can follow a similar logic as the PF validation, where only the relevant PF sequences are rerun.

We use the following datasets for this:
```
/RelValQCD_FlatPt_15_3000HS_14/CMSSW_14_1_0_pre3-PU_140X_mcRun3_2024_realistic_v8_STD_2024_PU-v2/GEN-SIM-DIGI-RAW
/RelValTTbar_14TeV/CMSSW_14_1_0_pre3-PU_140X_mcRun3_2024_realistic_v8_STD_2024_PU-v2/GEN-SIM-DIGI-RAW
/RelValQQToHToTauTau_14TeV/CMSSW_14_1_0_pre3-PU_140X_mcRun3_2024_realistic_v8_STD_2024_PU-v2/GEN-SIM-DIGI-RAW
/RelValSingleEFlatPt2To100/CMSSW_14_1_0_pre3-PU_140X_mcRun3_2024_realistic_v8_STD_2024_PU-v2/GEN-SIM-DIGI-RAW
/RelValSingleGammaFlatPt8To150/CMSSW_14_1_0_pre3-PU_140X_mcRun3_2024_realistic_v8_STD_2024_PU-v2/GEN-SIM-DIGI-RAW
/RelValSinglePiFlatPt0p7To10/CMSSW_14_1_0_pre3-PU_140X_mcRun3_2024_realistic_v8_STD_2024_PU-v2/GEN-SIM-DIGI-RAW
```

#### MINIAOD with PF and MLPF
The PF validation workflows can be run using the scripts in
```
Expand All @@ -105,17 +115,5 @@ cd particleflow

The MINIAOD output will be in `$CMSSW_BASE/out/QCD_PU_mlpf` and `$CMSSW_BASE/out/QCD_PU_pf`.

#### DQM plots
Now the MINIAOD output can be analyzed with the DQM and PF validation scripts:
```
./scripts/cmssw/run_dqm.sh $CMSSW_BASE/out
```

The outputs will be in:
```
ls plots
```
and can be displayed in a web browser.

## Generating MLPF training samples
TODO.
8 changes: 8 additions & 0 deletions mlpf/data_cms/check_file.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
import pickle
import sys
import bz2

try:
data = pickle.load(bz2.BZ2File(sys.argv[1], "rb"), encoding="iso-8859-1")
except Exception:
print(sys.argv[1])
3 changes: 2 additions & 1 deletion mlpf/heptfds/cms_pf/vbf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@
class CmsPfVbf(tfds.core.GeneratorBasedBuilder):
"""DatasetBuilder for cms_pf dataset."""

VERSION = tfds.core.Version("1.7.0")
VERSION = tfds.core.Version("1.7.1")
RELEASE_NOTES = {
"1.7.0": "Add cluster shape vars",
"1.7.1": "Increase stats to 400k events",
}
MANUAL_DOWNLOAD_INSTRUCTIONS = """
rsync -r --progress lxplus.cern.ch:/eos/user/j/jpata/mlpf/tensorflow_datasets/cms/cms_pf_vbf ~/tensorflow_datasets/
Expand Down
Loading

0 comments on commit 98d59c2

Please sign in to comment.