This code uses the Exa.TrkX-HSF
pipeline as a baseline. It uses the traintrack
library to run different stages (Processing, DNN/GNN, and Segmenting) of the STT pipeline. This pipeline is intended for the Straw Tube Tracker (STT) of the PANDA experiment which is part of the Central Tracking System (CTS) located in the Target Spectrometer of the PANDA experiment.
Once a conda environment is successfully created (see envs/README.md
for building a conda environment), one can run the pipeline from the root directory as follows:
# running pipeline
conda activate exatrkx-cpu
export EXATRKX_DATA=path/to/dataset
traintrack configs/pipeline_quickstart.yaml
Follow instructions on NERSC Documentation or see the concise and essential version in NERSC.md
to run pipeline on the Cori cluster at NERSC.
The deep learning pipeline consists of several stages: Processing, Graph Construction, Edge Labelling, and Graph Segmentation. The pipeline assumes that the input data is in CSV format similar to the TrackML data format (See https://www.kaggle.com/c/trackml-particle-identification).
-
Data Processing stage performs data processing on the comma-separated values (CSV) files that contain raw events from the PandaRoot simulation, and store processed data as PyTorch Geometric
Data
object. In this stage, new quantities are derived e.g.$r, \phi, p_t, d_0$ , etc. At the moment, one can't run within a CUDA enabled envrionment, due tomultiprocessing
python library, one needs to run it in CPU-only envrionment. -
Graph Construction stage will construct graphs either using a Heuristic Method or by using Metric Learning or Embedding. At the moment, this stage is not supported instead the graph construction using a Heuristic Method is merged with the Processing stage. Since this stage is not yet supported, one needs to distribute data into
train
,val
andtest
folders by hand as Edge Labelling (GNN/DNN) stage assumes data distributed in these folders [Maybe in future this will change]. -
Edge Labelling stage will finish with
GNNBuilder
callback, storing theedge_score
for all events. One can re-run this step by using e.g.traintrack --inference configs/pipeline_quickstart.yaml
but one needs to putresume_id
in thepipeline_quickstart
. -
Graph Segmentation stage is meant for track building using DBSCAN or CCL. However, one may skip this stage altogether and move to
eval/
folder where one can perform segmenting as well as track evaluation. This is due to post analysis needs, as one may need to run segmenting together with evaluation using different settings. At the moment, it is recommended to skip this stage and directly move toeval/
directory (seeeval/README.md
for more details).
The stttrkx
repo contains several subdirectories containing code for specific tasks. The detail of these subdirectories is as follows:
configs/
contains top-level pipeline configuration files fortraintrack
eda/
contains notebooks for exploratory data analysis to understand raw data.envs/
contains files for building a conda environmenteval/
contains code for track evalution, however, it also contain code for running segmenting stage independently oftraintrack
LightningModules/
contains code for each stage of the pipelinesrc/
contains helper code for utility functions, plotting, event building, etcRayTune/
contains helper code for running hyperparameter tuning using Ray Tune library
Several notebooks are avaialble to inpect output of each stage as well as for post analysis not necessarily intended to run the stages interactively. For example,
stt1_proc.ipynb
inspects the output ofProcessing
stagestt2_gnn_train.ipynb
andstt3_gnn_infer.ipynb
inspects the output ofGNN
stage- etc.