The code in this repository was developed as a deliverable of the IRIS-HEP fellow project. This code here heavily relies on IRIS-HEP AGC demonstration in Binder repository.
- The primary file is the Jupyter notebook named
main_code.ipynb
, which contains the code, its description and resulting outputs. - The directory
utils
contains the pictures used in the markdown description of themain_code.ipynb
notebook. - The directory
index_files
contains text files with URLs by which code retrieves the data to do analysis. - The file
file_utils.py
contains a utility code for setting up the data retrieval from a remote server. - The file
metadata.csv
contains the metadata of the files that are retrieved from the server during analysis. - The file
cabinetry_config.yml
is the configuration of the cabinetry library used for the analysis.
The code was tested on the coffea-opendata.casa cluster with the coffea 2024 server option, it contains all necessary packages already installed. For code usage open the main_code.ipynb
Jupyter notebook file and start running cells.
The Analysis Grand Challenge (AGC) stand for testing of the software toolkits on a path from the input data to the resulting plots and other types of output. This repository aims to benchmark the compatibility of the new version of the PHYSLITE data format with the existing infrastructure of the high-energy physics Python packages as well as to provide an example of analysis which one can do for the data stored in PHYSLITE format. For this analysis, the Monte Carlo generated open data of the ATLAS collaboration, released together with the physical data, measured on the LHC during 2015 and 2016 was used.
This work was supported by the Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP)
The initial implementation of this project was the work of Denys Klekots's 2024 IRIS-HEP Fellow project: Analysis Grand Challenge with ATLAS PHYSLITE data.