The homepage from partizanos

Welcome to the home of the first-ever Fast Calorimeter Simulation Challenge!

The purpose of this challenge is to spur the development and benchmarking of fast and high-fidelity calorimeter shower generation using deep learning methods. Currently, generating calorimeter showers of interacting particles (electrons, photons, pions, ...) using GEANT4 is a major computational bottleneck at the LHC, and it is forecast to overwhelm the computing budget of the LHC experiments in the near future. Therefore there is an urgent need to develop GEANT4 emulators that are both fast (computationally lightweight) and accurate. The LHC collaborations have been developing fast simulation methods for some time, and the hope of this challenge is to directly compare new deep learning approaches on common benchmarks. It is expected that participants will make use of cutting-edge techniques in generative modeling with deep learning, e.g. GANs, VAEs and normalizing flows.

This challenge is modeled after two previous, highly successful data challenges in HEP – the top tagging community challenge and the LHC Olympics 2020 anomaly detection challenge.

Datasets

The challenge offers three datasets, ranging in difficulty from easy to medium to hard. The difficulty is set by the dimensionality of the calorimeter showers (the number layers and the number of voxels in each layer). The hard Dataset 3 also includes incoming particles at different angles.

Each dataset has the same general format. The detector geometry consists of concentric cylinders with particles propagating along the z-axis. The detector is segmented along the z-axis into discrete layers. Each layer has bins along the radial direction and some of them have bins in the angle α. The number of layers and the number of bins in r and α is stored in the binning .xml files and will be read out by the HighLevelFeatures class of helper functions. The coordinates Δφ and Δη correspond to the x- and y axis of the cylindrical coordinates. The image below shows a 3d view of a geometry with 3 layers, with each layer having 3 bins in radial and 6 bins in angular direction. The right image shows the front view of the geometry, as seen along the z axis.

Each CaloChallenge dataset comes as one or more .hdf5 files that were written with python's h5py module using gzip compression. Within each file, showers of the same incident energy are grouped together to a hdf5-dataset of the name data_XXXX, with XXXX being the incident energy in MeV. Each of these hdf5-datasets has the shape (num_events, num_voxels), where the energy depositions of each voxel (in MeV) are flattened. The mapping of array index to voxel location is done at the order (radial bins, angular bins, layer), so the first entries correspond to the radial bins of the first angular slice in the first layer. Then, the radial bins of the next angular slice of the first layer follow, ...

Dataset 1 can be downloaded from Zenodo with DOI /10.5281/zenodo.6234055. It is based on the ATLAS GEANT4 open datasets that were published here. There are two groups of datasets, one for photons and one for charged pions. Each set consists of 15 energies from 256 MeV up to 4 TeV produced in powers of two. Each dataset contains the voxelised shower information obtained from single particles produced at the calorimeter surface in the η range (0.2-0.25) and simulated in the ATLAS detector. 10k events are available in each sample with the exception of those at higher energies that have a lower statistics. These samples were used to train the corresponding two GANs presented in the AtlFast3 paper SIMU-2018-04 and in the FastCaloGAN note ATL-SOFT-PUB-2020-006. The number of radial and angular bins varies from layer to layer and is also different for photons and pions, resulting in 368 voxels for photons and 533 for pions.
Dataset 2 (COMING SOON) consists of a single file with GEANT4-simulated showers of electrons with energies ranging from 1 GeV to 1024 GeV. The detector has a concentric cylinder geometry with 45 layers, where each layer consists of active (silicon) and passive (tungesten) material. Each layer has 144 readout cells, 9 in radial and 16 in angular direction, yielding a total of 9x16x45 = 6480 voxels.
Dataset 3 (COMING SOON). It consists of 5 files, one for each incident angle (from 50 to 90° in steps of 10°). A 90° angle indicates a particle entering the detector perpendicularly. Each file contains GEANT4-simulated eletron showers with energies ranging from 1 GeV to 1024 GeV. The detector geometry is similar to dataset 2, but has a much higher granularity. Each of the 45 layer has now 18 radial and 50 angular bins, totalling 18x50x45=40500 voxels. This dataset was produced using the Par04 Geant4 example.

Files containing the detector binning information for each dataset as well as Python scripts that load them can be found on our Github page. This Jupyter notebook shows how each dataset can be loaded, how the helper class is initialized with the binning.xml files, how high-level features can be computed, and how showers can be visualized. Further high-level features and histograms might be added in the next weeks.

Metrics

Participants will be scored using a variety of metrics. We will include more detailed descriptions of them in the coming months. Metrics will include:

- A binary classifier trained on truth GEANT4 vs. generated shower images. - A binary classifier trained on a set of high-level features (like layer energies, shower shape variables). - A chi-squared type measure derived from histogram differences of high-level features. - Training time, calorimeter shower generation time and memory usage. - Interpolation capabilities: leave out one energy in training and generate samples at that energy after training.

It is expected that there will not necessarily be a single clear winner, but different methods will have their pros and cons. Scripts to perform the evaluation will eventually be made available on the Github page.

Timeline

The challenge will conclude approximately 1 month before the next ML4Jets conference (currently tentatively scheduled for the week of December 5, 2022). Results of the challenge will be presented at ML4Jets, and the challenge will culminate in a community paper documenting the various approaches and their outcomes.

Please do not hesitate to ask questions: we will use the ML4Jets slack channel to discuss technical questions related to this challenge. You are also encouraged to sign up for the Google groups mailing list for infrequent announcements and communications.

Good luck!

Michele Faucci Gianelli, Gregor Kasieczka, Claudius Krause, Ben Nachman, Dalila Salamani, David Shih and Anna Zaborowska

partizanos / homepage Goto Github PK

homepage's Introduction

Datasets

Metrics

Timeline

homepage's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent