jpata / particleflow Goto Github PK
View Code? Open in Web Editor NEWMachine-learned, GPU-accelerated particle flow reconstruction
License: Apache License 2.0
Machine-learned, GPU-accelerated particle flow reconstruction
License: Apache License 2.0
very low priority, but we could add a pre-commit hook to make run black, flake8, isort, etc on the repo to enforce common style and minimal diffs.
It could also run automatically to lint PRs, etc..
We are out of LFS space, because at some point, I put weight files on github using LFS, but they are there even after removing.
According to the github documentation, the solution is to delete/recreate the repo, but I'd like to avoid it if possible.
Downloading models/acat2022_20221004_model40M/cms-gen_20220923_163529_426249.gpu0.local/logs/train/events.out.tfevents.1663940144.gpu0.local.2696240.0.v2 (3.0 MB)
Error downloading object: models/acat2022_20221004_model40M/cms-gen_20220923_163529_426249.gpu0.local/logs/train/events.out.tfevents.1663940144.gpu0.local.2696240.0.v2 (521a8e0): Smudge error: Error downloading models/acat2022_20221004_model40M/cms-gen_20220923_163529_426249.gpu0.local/logs/train/events.out.tfevents.1663940144.gpu0.local.2696240.0.v2 (521a8e0dd0f705506862114f11c9856f82bae55543a94041dd0665eea5183cb6): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
Errors logged to /home/joosep/test/particleflow/.git/lfs/logs/20230915T104049.046923961.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: models/acat2022_20221004_model40M/cms-gen_20220923_163529_426249.gpu0.local/logs/train/events.out.tfevents.1663940144.gpu0.local.2696240.0.v2: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
https://github.com/pbelcak/fastfeedforward.
Currently we are using a simple matching-based loss based on individual particles/elements.
It might be interesting to try a contrastive event-based loss, such as the one proposed in this paper.
https://openaccess.thecvf.com/content/CVPR2023/papers/Huang_Learning_To_Measure_the_Point_Cloud_Reconstruction_Loss_in_a_CVPR_2023_paper.pdf
Put this in the tfds download instructions:
https://zenodo.org/record/8414225
Add some instructions on how to train the hit-based model here:
https://github.com/jpata/particleflow/blob/main/README_tf.md
the files fail to download from zenodo today.
wget --no-check-certificate -nc https://zenodo.org/record/4559324/files/tev14_pythia8_ttbar_0_0.pkl.bz2
tev14_pythia8_ttbar_0_0.pkl.bz2 1%[> ] 486.98K 35.9KB/s eta 22m 21s
unsure if this is a temporary issue or new rate limitations.
Once cms-sw/cmssw#36963 is available, investigate enabling the MLPFProducer in CMSSW to use GPUs for inference.
This would allow to do an apples-to-apples comparison of PF vs MLPF, given a fully loaded machine (CPU+GPU).
This tfds generation failed at some point halfway through.
tfds build hep_tfds/heptfds/cms_pf/qcd_high_pt
...
Traceback (most recent call last):
File "/usr/local/bin/tfds", line 8, in <module>
sys.exit(launch_cli())
File "/usr/local/lib/python3.8/dist-packages/tensorflow_datasets/scripts/cli/main.py", line 102, in launch_cli
app.run(main, flags_parser=_parse_flags)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/usr/local/lib/python3.8/dist-packages/tensorflow_datasets/scripts/cli/main.py", line 97, in main
args.subparser_fn(args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_datasets/scripts/cli/build.py", line 192, in _build_datasets
_download_and_prepare(args, builder)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_datasets/scripts/cli/build.py", line 342, in _download_and_prepare
builder.download_and_prepare(
File "/usr/local/lib/python3.8/dist-packages/tensorflow_datasets/core/dataset_builder.py", line 481, in download_and_prepare
self._download_and_prepare(
File "/usr/local/lib/python3.8/dist-packages/tensorflow_datasets/core/dataset_builder.py", line 1218, in _download_and_prepare
future = split_builder.submit_split_generation(
File "/usr/local/lib/python3.8/dist-packages/tensorflow_datasets/core/split_builder.py", line 310, in submit_split_generation
return self._build_from_generator(**build_kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_datasets/core/split_builder.py", line 371, in _build_from_generator
for key, example in utils.tqdm(
File "/usr/local/lib/python3.8/dist-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/joosep/particleflow/hep_tfds/heptfds/cms_utils.py", line 191, in generate_examples
X, ygen, ycand = prepare_data_cms(str(fi), pad_size)
File "/home/joosep/particleflow/hep_tfds/heptfds/cms_utils.py", line 162, in prepare_data_cms
jet_constituents = [index_mapping[idx] for idx in constituent_idx[jet_idx]] # map back to constituent index *before* masking
File "/home/joosep/particleflow/hep_tfds/heptfds/cms_utils.py", line 162, in <listcomp>
jet_constituents = [index_mapping[idx] for idx in constituent_idx[jet_idx]] # map back to constituent index *before* masking
IndexError: index 4659 is out of bounds for axis 0 with size 4659
The other ones (ttbar, ztt, qcd) succeeded, so I'm debugging this a bit.
Implement the GNN LSH in pytorch.
For the moment, we concatenate input elements of different types (tracks, clusters, ...) into a single feature matrix. This means that there are features that may be defined for one type, but not defined for other types. Our ML model downstream treats all input elements as nodes of the same type. In principle, splitting up the inputs from a single feature matrix to independent feature matrices for each element type should be better.
I believe Michael has a working version for the SSL studies - let's try it out in the codebase.
As suggested by JK here: https://indico.cern.ch/event/1189908/contributions/5008659/attachments/2494293/4283744/20220818_jk_mlpf.pdf
it might be useful to try to improve the reconstruction of event-level quantities by computing and comparing local averages of energies for each true/reconstructed particle.
Goal: reduce CPU inference time with the ONNX backend
We made some CPU inference performance results public for 2021 in CMS, https://cds.cern.ch/record/2792320/files/DP2021_030.pdf slide 16, “For context, on a single CPU thread (Intel i7-10700 @ 2.9GHz), the baseline PF requires approximately (9 ± 5) ms, the MLPF model approximately 320 ± 50 ms for Run 3 ttbar MC events”.
Now it's a good time to make the ONNX inference as fast as possible, while minimizing any physics impact.
Resources:
Try if we can use these kNN kernels in pytorch:
https://github.com/tklijnsma/pytorch_cmspepr/tree/main/csrc
https://github.com/jpata/particleflow/blob/main/mlpf/pipeline.py#L244-L253
This fails on Nvidia 1080, perhaps it needs to be enabled selectively, or with a different value on different cards.
Kenneth suggested uploading the singularity image to cvmfs:
Currently the latest TF+nvidia image is here: https://hep.kbfi.ee/~joosep/tf-2.13.0.simg
The pytorch image is here: https://hep.kbfi.ee/~joosep/pytorch.simg
Implement approximate EMD loss to compare predicted particle set to target, as in point cloud regression.
https://arxiv.org/pdf/1612.00603.pdf section 4.3
https://github.com/gpeyre/SinkhornAutoDiff/blob/master/sinkhorn_pointcloud.py
First try at https://github.com/jpata/particleflow/blob/endtoend_gnn/test/sinkhorn_pointcloud.py, need to test.
I saw this nice presentation from CHEP2019 about doing HGCAL clustering on a GPU: https://indico.cern.ch/event/773049/contributions/3473275/attachments/1936902/3210151/chep2019.pdf
Seems like this could also be repurposed for clustering the particle flow elements.
This works fine:
CUDA_VISIBLE_DEVICES=5,6,7,8,9 singularity exec --nv /home/software/singularity/base.simg:latest python3 mlpf/launcher.py --model-spec parameters/cms-gnn-skipconn-v2.yaml --action train
...
Model: "pf_net"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_encoding (InputEncodin multiple 0
_________________________________________________________________
sparse_hashed_nn_distance (S multiple 17825
_________________________________________________________________
gnn_id (EncoderDecoderGNN) multiple 2375680
_________________________________________________________________
sequential_2 (Sequential) (5, 6400, 8) 805384
_________________________________________________________________
sequential_3 (Sequential) (5, 6400, 1) 801793
_________________________________________________________________
gnn_reg (EncoderDecoderGNN) multiple 2379776
_________________________________________________________________
sequential_4 (Sequential) (5, 6400, 5) 807941
=================================================================
Total params: 7,188,399
Trainable params: 7,185,199
Non-trainable params: 3,200
________________________________
Epoch 1/500
258/3200 [=>............................] - ETA: 19:18 - loss: 92.0839 - charge_loss: 15.1783 - cls_loss: 29.7701 - cos_phi_loss: 14.1853 - energy_loss: 26.3809 - eta_loss: 51.6222 - pt_loss: 3.7117 - sin_phi_loss: 11.3279 - cls_acc_unweighted: 0.7688
and so does
CUDA_VISIBLE_DEVICES=5,6 singularity exec --nv /home/software/singularity/base.simg:latest python3 mlpf/launcher.py --model-spec parameters/cms-gnn-skipconn-v2.yaml --action train
...
Epoch 1/500
31/8000 [..............................] - ETA: 1:03:18 - loss: 124.7493 - charge_loss: 16.4039 - cls_loss: 42.0418 - cos_phi_loss: 17.1479 - energy_loss: 32.1395 - eta_loss: 130.0834 - pt_loss: 7.7385 - sin_phi_loss: 11.0048 - cls_acc_unweighted: 0.6926
while this doesn't :
CUDA_VISIBLE_DEVICES=5,6,7,8 singularity exec --nv -B /home /home/software/singularity/base.simg:latest python3 mlpf/launcher.py --model-spec parameters/cms-gnn-skipconn-v2.yaml --action train
...
Traceback (most recent call last):
File "mlpf/launcher.py", line 26, in <module>
main(args, yaml_path, config)
File "/home/joosep/particleflow/mlpf/tfmodel/model_setup.py", line 755, in main
fit_result = model.fit(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2942, in __call__
return graph_function._call_flat(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 555, in call
outputs = execute.execute(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 5 root error(s) found.
(0) Invalid argument: Tried to stack elements of an empty list with non-fully-defined element_shape: [?,512]
[[{{node replica_3/pf_net/gnn_id/StatefulPartitionedCall/conv_id0/map/TensorArrayV2Stack/TensorListStack}}]]
[[replica_2/pf_net/sparse_hashed_nn_distance/map/while/body/_1664/replica_2/pf_net/sparse_hashed_nn_distance/map/while/map/while/cond/_8452/replica_2/pf_net/sparse_hashed_nn_distance/map/while/map/while/Less_1/_823]]
(1) Invalid argument: Tried to stack elements of an empty list with non-fully-defined element_shape: [?,512]
[[{{node replica_3/pf_net/gnn_id/StatefulPartitionedCall/conv_id0/map/TensorArrayV2Stack/TensorListStack}}]]
[[replica_1/pf_net/sparse_hashed_nn_distance/SparseTensor/dense_shape/_864]]
(2) Invalid argument: Tried to stack elements of an empty list with non-fully-defined element_shape: [?,512]
[[{{node replica_3/pf_net/gnn_id/StatefulPartitionedCall/conv_id0/map/TensorArrayV2Stack/TensorListStack}}]]
[[replica_1/pf_net/sparse_hashed_nn_distance/SparseTensor/dense_shape/_863]]
(3) Invalid argument: Tried to stack elements of an empty list with non-fully-defined element_shape: [?,512]
[[{{node replica_3/pf_net/gnn_id/StatefulPartitionedCall/conv_id0/map/TensorArrayV2Stack/TensorListStack}}]]
[[pf_net/gnn_reg/StatefulPartitionedCall/conv_reg1/map/while/body/_3801/conv_reg1/map/while/SparseReshape/_2974]]
(4) Invalid argument: Tried to stack elements of an empty list with non-fully-defined element_shape: [?,512]
[[{{node replica_3/pf_net/gnn_id/StatefulPartitionedCall/conv_id0/map/TensorArrayV2Stack/TensorListStack}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_335632]
Function call stack:
train_function -> train_function -> train_function -> train_function -> train_function
Evaluating the dataset number of steps in tensorflow is currently slow when the loop is IO-bound in a single process, because we use
tf.data.Dataset.from_generator
, which uses Python underneath and doesn't release the GIL.
It might require some changes upstream in tfds to support ArrayRecordDataSource.as_dataset
: https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/core/file_adapters.py#L188
We want to improve jet and MET resolution of MLPF, but these cannot directly be written into the loss function.
It was suggested by JD that we could cluster the gen-particles to jets and save the jet clustering index for each particle.
Since we have this one-to-one matching between genparticles, input elements and predicted particles, we can add a loss term where we compare jets from the predicted particles based on the genparticle clustering index to jets from the genparticles.
For the TF model, this would mean running fastjet at the tfds stage, saving the jet clustering index for each particle, propagating it to the loss function and adding an additional term.
Pytorch+tfds now support loading data from the ArrayRecordDataSource, so we can use the same efficient ML input data for both pytorch and tensorflow.
Here's some example code for loading the dataset in pytorch.
#!/usr/bin/env python
# coding: utf-8
import os
import numpy as np
import random
import itertools
import tqdm
import tensorflow_datasets as tfds
import torch
import torch.nn as nn
from torch import Tensor
import ray
from ray.air import session, Checkpoint
from ray import train
from ray.train.torch import TorchTrainer, TorchConfig
from ray.air.config import ScalingConfig
from ray.air.config import RunConfig
from ray.air.config import CheckpointConfig
import ray.data
# In[38]:
def collate_padded_batch(inputs):
num_samples_in_batch = len(inputs)
elem_keys = list(inputs[0].keys())
ret = {}
for elem_key in elem_keys:
batch = [Tensor(i[elem_key]) for i in inputs]
max_seq_len = max([x.shape[0] for x in batch])
padded_batch_data = [nn.functional.pad(x, (0, 0, 0, max_seq_len - x.shape[0])) for x in batch]
ret[elem_key] = torch.stack(padded_batch_data, axis=0)
ret["mask"] = ret["X"][:, :, 0] == 0
return ret
def my_getitem(self, vals):
records = self.data_source.__getitems__(vals)
return [self.dataset_info.features.deserialize_example_np(record, decoders=self.decoders) for record in records]
class Dataset:
def __init__(self, name="clic_edm_ttbar_pf:1.5.0", split="train"):
builder = tfds.builder(name, data_dir="/home/joosep/tensorflow_datasets/")
self.ds = builder.as_data_source(split=split)
#to prevent a warning from tfds about accessing sequences of indices
self.ds.__class__.__getitems__ = my_getitem
def get_sampler(self):
sampler = torch.utils.data.SequentialSampler(self.ds)
return sampler
def get_loader(self, batch_size=20, num_workers=0, prefetch_factor=None):
return torch.utils.data.DataLoader(
self.ds,
batch_size=batch_size,
collate_fn=collate_padded_batch,
sampler=self.get_sampler(),
num_workers=num_workers,
prefetch_factor=prefetch_factor
)
def __len__(self):
return len(self.ds)
def __repr__(self):
return self.ds.__repr__()
from torch.nn import TransformerEncoder, TransformerEncoderLayer
from torch.utils.data import dataset
class TransformerModel(nn.Module):
def __init__(self,
d_in: int,
d_out: int,
d_model: int,
nhead: int,
d_hid: int,
nlayers: int,
dropout: float = 0.5):
super().__init__()
self.linear_in = nn.Linear(d_in, d_model)
encoder_layers = TransformerEncoderLayer(d_model, nhead, d_hid, dropout, batch_first=True)
self.transformer_encoder = TransformerEncoder(encoder_layers, nlayers)
self.linear_out = nn.Linear(d_model, d_out)
def forward(self, src: Tensor, src_key_padding_mask: Tensor) -> Tensor:
src = self.linear_in(src)
output = self.transformer_encoder(src, src_key_padding_mask=src_key_padding_mask)
output = self.linear_out(output)
return output
class InterleavedIterator(object):
def __init__(self, data_loaders):
self.idx = 0
self.data_loaders_iter = [iter(dl) for dl in data_loaders]
max_loader_size = max([len(dl) for dl in data_loaders])
#interleave loaders of different length
self.loader_ds_indices = []
for i in range(max_loader_size):
for iloader, loader in enumerate(data_loaders):
if i<len(loader):
self.loader_ds_indices.append(iloader)
self.cur_index = 0
def __iter__(self):
return self
def __next__(self):
iloader = self.loader_ds_indices[self.cur_index]
self.cur_index += 1
return next(self.data_loaders_iter[iloader])
# Define your train worker loop
def train_loop_per_worker():
ds_train = [
Dataset("clic_edm_ttbar_pf:1.5.0", "train"),
Dataset("clic_edm_qq_pf:1.5.0", "train"),
Dataset("clic_edm_ww_fullhad_pf:1.5.0", "train"),
Dataset("clic_edm_zh_tautau_pf:1.5.0", "train"),
]
ds_test = [
Dataset("clic_edm_ttbar_pf:1.5.0", "test"),
Dataset("clic_edm_qq_pf:1.5.0", "test"),
Dataset("clic_edm_ww_fullhad_pf:1.5.0", "test"),
Dataset("clic_edm_zh_tautau_pf:1.5.0", "test"),
]
for ds in ds_train:
print("train_dataset: {}, {}".format(ds, len(ds)))
for ds in ds_test:
print("test_dataset: {}, {}".format(ds, len(ds)))
batch_size = 50
num_workers = 0
nepochs = 5
prefetch_factor = None
train_loaders = [ds.get_loader(batch_size=batch_size, num_workers=num_workers) for ds in ds_train]
test_loaders = [ds.get_loader(batch_size=batch_size, num_workers=num_workers) for ds in ds_test]
for dl in train_loaders:
print("train_loader: {}, {}".format(dl.dataset, len(dl)))
for dl in test_loaders:
print("test_loader: {}, {}".format(dl.dataset, len(dl)))
model = TransformerModel(17, 8, 128, 4, 128, 3, 0.1)
device = torch.device("cuda")
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Get dict of last saved checkpoint.
initial_global_step_idx = None
initial_model_params = None
#restore from checkpoint
if ray.is_initialized():
checkpoint = session.get_checkpoint()
if checkpoint:
print("checkpoint follows")
data_dict = checkpoint.to_dict()
initial_global_step_idx = data_dict["global_step_idx"]
model.load_state_dict(data_dict["model"])
optimizer.load_state_dict(data_dict["optimizer"])
print("Checkpoint restored at global_step_idx={}".format(initial_global_step_idx))
model_parameters = filter(lambda p: p.requires_grad, model.parameters())
params = sum([np.prod(p.size()) for p in model_parameters])
loss_fn = torch.nn.MSELoss()
if ray.is_initialized():
print("preparing dataset and model for ray")
train_loaders = [train.torch.prepare_data_loader(dl) for dl in train_loaders]
test_loaders = [train.torch.prepare_data_loader(dl) for dl in test_loaders]
model = train.torch.prepare_model(model)
for dl in train_loaders:
print("train_loader: {}, {}".format(dl.dataset, len(dl)))
for dl in test_loaders:
print("test_loader: {}, {}".format(dl.dataset, len(dl)))
data_iterator = InterleavedIterator(train_loaders)
global_step_idx = 0
for epoch in range(nepochs):
model.train()
train_loss_vals = []
steps_per_epoch = sum([len(loader) for loader in train_loaders])
#skip epoch if it was already trained on
if initial_global_step_idx and global_step_idx + steps_per_epoch < initial_global_step_idx:
global_step_idx += steps_per_epoch
print("skipping epoch {}".format(epoch))
continue
for data in data_iterator:
#skip batch if it was already trained on
if initial_global_step_idx and global_step_idx < initial_global_step_idx:
global_step_idx += 1
continue
optimizer.zero_grad()
output = model(data["X"].to(device), data["mask"].to(device))
loss = loss_fn(output, data["ygen"].to(device))
loss.backward()
optimizer.step()
train_loss_vals.append(loss.item())
if global_step_idx>0 and global_step_idx%1000 == 0:
print("checkpoint at global_step={}/{}".format(
global_step_idx, nepochs*steps_per_epoch
))
#save checkpoint
if ray.is_initialized():
checkpoint = Checkpoint.from_dict(
dict(
global_step_idx=global_step_idx,
model=model.state_dict(),
optimizer=optimizer.state_dict()
)
)
session.report(dict(loss=loss.item()), checkpoint=checkpoint)
global_step_idx += 1
# model.eval()
# test_loss_vals = []
# for test_loader in test_loaders:
# for data in test_loader:
# output = model(data["X"].to(device), data["mask"].to(device))
# loss = loss_fn(output, data["ygen"].to(device))
# test_loss_vals.append(loss.item())
scaling_config = ScalingConfig(num_workers=2, use_gpu=True, resources_per_worker={"GPU": 0.5})
run_config = RunConfig(checkpoint_config=CheckpointConfig(num_to_keep=1), verbose=2)
# trainer = TorchTrainer.restore("/home/joosep/ray_results/TorchTrainer_2023-08-25_13-56-05")
trainer = TorchTrainer(
train_loop_per_worker=train_loop_per_worker,
scaling_config=scaling_config,
run_config=run_config,
torch_config=TorchConfig(backend="gloo")
)
result = trainer.fit()
print(result)
# train_loop_per_worker()
cc @farakiko
Currently, we define the ground truth based on the existing PFAlgo, i.e. the whole PF does a translation of the set of PF elements to the set of PF candidates on an event by event basis.
Ultimately, the ground truth should be defined through generator particles suitably matched to PF elements. Need to understand where HGCAL is with this.
Via Kenichi:
A high-performance connected components implementation for GPUs: https://dl.acm.org/doi/10.1145/3208040.3208041
An Optimized Union-Find Algorithm for Connected Components Labeling Using GPUs: https://arxiv.org/abs/1708.08180
Get a tagged version of the code:
git checkout master
git pull
git submodule init
git submodule update
Download the training data:
rsync -r --progress lxplus.cern.ch:/eos/user/j/jpata/mlpf/cms/tensorflow_datasets ~/
Run the training on the full model using just the ttbar sample
CUDA_VISIBLE_DEVICES=... python3 mlpf/pipeline.py train -c parameters/cms.yaml
Copy cms.yaml
to cms-withgun.yaml
and change as follows:
training_datasets:
- cms_pf_ttbar
- cms_pf_single_pi
- cms_pf_single_electron
testing_datasets:
- cms_pf_ttbar
- cms_pf_single_pi
- cms_pf_single_electron
Train again
CUDA_VISIBLE_DEVICES=... python3 mlpf/pipeline.py train -c parameters/cms-withgun.yaml
Note that the training is set currently for 1000 epochs, which you may want to abort early.
Look at the validation plots in experiments/cms*/history/epoch_N/...
, especially the energy correlation plots under cls_2/energy_cls2*
a fast/efficient graph constructor was recently proposed here:
https://github.com/mieskolainen/hypertrack
https://indico.jlab.org/event/459/contributions/11748/attachments/9580/14256/HyperTrack_Mieskolainen_CHEP2023_v1.pdf
we should give a try if this works for graph construction for us.
Provide instructions on how to use the pretrained model from https://zenodo.org/record/8328683.
The steps are:
Currently, the CMS ntuplization script https://github.com/jpata/particleflow/issues/mlpf/data/postprocessing2.py uses uproot3, which is the legacy package. Some minor API updates may be necessary.
QCD_FlatPt_15_3000HS_14
, e.g. 20k eventsGiven a list of tracks and clusters in the event, create a list of pfcandidates using a simple iterative algo. Could be something like
Define and measure the accuracy of this simple algo with respect to the original PFCandidates.
Since this questions comes up often, we should have a running and well-understood benchmark of the baseline MLPF model on different computing devices and different datasets.
It should be a single script that loads an ONNX model and can be rerun easily on any new device.
Once we have clustered the PF elements (tracks, clusters) into "blocks" based on the truth values in PFCandidate::elementsInBlocks
, we have pairs of (block, candidates).
Each block consists usually of a few elements and produces a few candidates, for example:
(TRK, TRK, ECAL) -> (pi, pi)
(TRK, ECAL, HCAL) -> (K)
Therefore, we can run a regression across all blocks from all events to create the reconstructed candidates from the block. I've done a first version of this and it seems to be promising.
As an input, we use all the blocks of size <=3, one-hot encode the element type and standardize the kinematic vectors.
As an output, we have the all the candidates (max 3) produced from the elements, one-hot encoded pdgId, standardized kinematic vectors.
So far, a simple dense net regression is able to generally predict the number of candidates in a block, as well as the first candidate momentum (up to a linear transformation?).
Here's an attempt to put private discussions on ML-PF in public.
For ML-PF studies, an idea was to exploit the fact that in the standard PF algo, PFBlock-> elements -> candidates, such that a small set of candidates is expected to produce a single PF candidate. Looking at the output of the algo, I generally find using PFCandidate::elementsInBlocks
that the association can be many-to-many, in the sense that one element may be associated with multiple candidates, and one candidate with multiple elements.
Just as an example, here is a partial event from the debugging ntuplizer on a RelValTTbar_13 PU25ns_110X_upgrade2018_realistic_v3 file:
cmsRun test/step3.py
python test/ntuplizer.py ./data/ step3_AOD.root
root -l ./data/step3_AOD.root
root [2] pftree->Scan("clusters_npfcands:clusters_ipfcand0:clusters_ipfcand1:clusters_ipfcand2:clusters_ipfcand3")
***********************************************************************************
* Row * Instance * clusters_ * clusters_ * clusters_ * clusters_ * clusters_ *
***********************************************************************************
* 0 * 0 * 3 * 381 * 827 * 974 * 0 *
* 0 * 1 * 1 * 2125 * 0 * 0 * 0 *
* 0 * 2 * 1 * 814 * 0 * 0 * 0 *
* 0 * 3 * 1 * 449 * 0 * 0 * 0 *
* 0 * 4 * 3 * 365 * 626 * 709 * 0 *
* 0 * 5 * 3 * 625 * 637 * 767 * 0 *
* 0 * 6 * 2 * 120 * 484 * 0 * 0 *
* 0 * 7 * 2 * 965 * 1057 * 0 * 0 *
* 0 * 8 * 6 * 307 * 470 * 823 * 890 *
Will need to look in more detail to understand if this is really what PFAlgo is doing, or perhaps some artifact.
cc @jmduarte @lgray @vlimant @pierinim, feel free to include others.
We had to disable ONNX export in tensorflow due to this bug: onnx/tensorflow-onnx#2180
here: 279f6ae#diff-bc5644c81dfb5b7d28e0ddbb1272481f0874306c725b6de3a5f821e0671ce6bcR501
But recently it's fixed in onnx/tensorflow-onnx#2225 and new versions have been released.
In the pipeline.py evaluate
step, we could produce some printouts or plots of badly reconstructed cases for further debugging.
In the evaluation step, add true vs. predicted correlation plots for quantities like:
I found this interesting work on using FPGAs for PF in L1 by Giovanni: http://cds.cern.ch/record/2650974/files/CR2018_401.pdf and https://indico.cern.ch/event/587955/contributions/2935764/attachments/1686948/2713029/L1PF-chep-v2.pdf!
Looks like https://github.com/calad0i/HGQ supports very fine grained quantization.
There's also https://github.com/fastmachinelearning/qonnx
One possible way to progress is to learn to group PF elements based on their their proximity to smallish clusters of elements (miniblocks). We can also use the ground truth clustering based on which elements form disjoint sets based on PFAlgo - ultimately, this is a semi-supervised clustering problem.
The inputs are elements: (nelem, nelem_feat)
containing the raw set of all PF elements in the event, with the ground truth being element_block_id: (nelem)
, which allows to associate elements to disjoint blocks or clusters. The sparse distance matrix between the elements induces an initial graph on the set.
In progress in this issue using GNNs: #2
Recently we implemented saving the basic optimizer state in https://github.com/jpata/particleflow/blob/master/mlpf/tfmodel/model_setup.py#L69.
This works fine for Adam with a constant learning rate, but it needs more work to:
picked up by CI.
eta = awkward.to_numpy(-np.log(tt, where=tt > 0))
File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/awkward/highlevel.py", line 1428, in __array_ufunc__
with ak._errors.OperationErrorContext(name, inputs, kwargs):
File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/awkward/_errors.py", line 67, in __exit__
self.handle_exception(exception_type, exception_value)
File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/awkward/_errors.py", line 82, in handle_exception
raise self.decorate_exception(cls, exception)
RecursionError: maximum recursion depth exceeded in comparison
This error occurred while calling
numpy.log.__call__(
<Array [0.994, 2.12, 3, ..., 2.04, 2.75, 2.92] type='111 * float32'>
where = <Array [True, True, True, ..., True, True] type='111 * bool'>
)
One of the fundamental questions is the computation of the loss function between two sets of particles of different multiplicity.
For example, given the true set, with no natural ordering:
id=211 pt=123.0 eta=... charge=...
id=130 pt=... ...
id=22 ...
and a predicted set
id=22 pt=29.0 eta=...
id=22 ...
how to compute a differentiable loss function? The loss must also be computationally efficient, as we have O(10k) particles in the true and predicted sets.
For the moment, we use the object condensation approach, assigning the true particles to some particular "seed" input element, thus converting the problem to multi-classification with a "no-particle" class.
Other options to investigate include:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.