- a new Boosting Decision Tree method with Systematical Uncertainties into training for High Energy Physics
- reference: https://arxiv.org/abs/1810.08387 (accepted for publication in NIM A)
- An example in High Energy Physics, search for Higgs -> tau tau gamma, under the directory tautaugamma
- To use it, just git clone. No complie. Only python and ROOT are required.
- Ligang Xia, [email protected], [email protected]
- run trainin:
python runbdt.py trees0 0 0 10
- run testing:
python testbdt.py trees0
(after training is done) - I put the training and testing results in trees0/example/. You can have a comparison.
- run training:
python runbdt.py trees1 1 1 10
- run testing:
python testbdt.py trees1 1
(after training is done) - I put the training and testing results in trees1/example/. You can have a comparison.
- command format:
python runbdt.py dir Nsysts Switch Ntrees # see the explanation below
- dir: directory for storing training results
- Nsysts: number of systematics
- Switch: a boolean flag to switch on systematics or not in training. If Nsysts==0, Switch will be always 0.
- Ntrees: number of trees used for training, 100 by default if not specified.
- command format:
python testbdt.py dir Nsysts Ntrees # see the explanation below
- dir: directory for storing training results
- Nsysts: number of systematics, 0 by default if not specified
- Ntrees: number of trees used for testing, 100 by default if not specified.
qbdtmodule.py
: define QBDT class (you do not need to touch it)runbdt.py
: perform trainingtestbdt.py
: test and show performance- root_dir : a directory storing root including nominal and systematic ntuples
- share : a directory storing other scripts, maybe useful, but you do not need to touch it
- AtlasStyle : a config script for plotting, borrowed from ATLAS
- We have to add a branch in the root file to tell the algorithm which events are used for training or testing. In the current example, this branch is "
trainflag
". It is generated randomly and uniformly from 0 to 1. Events with "trainflag<0.5
" are used for training while the other events used for testing. I will try to split the events automatically in the future.
- Add a function to split events for training and testing automatically.
- Try to improve the training speed. I find python is slow. Maybe I should consider rewritting using C++.
I would like to thank my wife, who is always pushing me to publish PRL/Science/Nature papers and I always make her disappointed ...