TreeMaker

Instructions

The following installation instructions assume the user wants to process Summer16 MC (miniAOD v2 format), or Run2016 23Sep ReReco or 03Feb ReMiniAOD data. (Spring16 MC and Run2016 PromptReco data are also supported, but only in limited cases.)

cmsrel CMSSW_8_0_28
cd CMSSW_8_0_28/src/
cmsenv
git cms-init
git remote add btv-cmssw https://github.com/cms-btv-pog/cmssw.git
git fetch btv-cmssw refs/tags/BoostedDoubleSVTaggerV4-WithWeightFiles-v1_from-CMSSW_8_0_21
git cms-merge-topic -u cms-btv-pog:BoostedDoubleSVTaggerV4-WithWeightFiles-v1_from-CMSSW_8_0_21
git cms-merge-topic -u kpedro88:storeJERFactor8028
git cms-merge-topic -u kpedro88:badMuonFilters_80X_v2_RA2
git cms-merge-topic -u kpedro88:FixMetSigData8028
git clone [email protected]:cms-jet/JetToolbox.git JMEAnalysis/JetToolbox -b jetToolbox_80X_V3
git clone [email protected]:TreeMaker/TreeMaker.git -b Run2
scram b -j 8
cd TreeMaker/Production/test

Several predefined scenarios are available for ease of production. These scenarios define various sample-dependent parameters, including:
global tag, collection tag name, generator info, fastsim, signal, JSON file, JEC file, residual JECs, era.
The available scenarios are:

Spring16Fastsig: for Spring16 miniAOD 25ns FastSim MC (signal scans)
Spring16Pmssm: for Spring16 miniAOD 25ns PMSSM MC scan (signal)
Summer16: for Summer16 miniAOD 25ns MC
Summer16sig: for Summer16 miniAOD 25ns MC (signal)
2016H: for 2016H PromptReco 25ns data
2016ReReco23Sep: for 2016 ReReco (23Sep) 25ns data, periods B-G
2016ReMiniAOD03Feb: for 2016 ReMiniAOD (03Feb) 25ns data, periods B-H

Unit Tests (Interactive Runs)

Several predefined run commands (at least one for each scenario) are defined in a script called unitTest.py. It has several parameters:

test: number of the test to run (default=-1, displays all tests)
name: name of the output ROOT and log files for the test (default="", each test has its own default name)
run: run the selected test (default=False)
numevents: how many events to run (default=100)
shell: how to format the command (default="tcsh", also knows "bash")

A few examples of how to run the script:

To see all tests:

python unitTest.py

To run test 2:

python unitTest.py test=2 run=True

Note that all of the background estimation processes (and some processes necessary to estimate systematic uncertainties) are turned ON by default in runMakeTreeFromMiniAOD_cfg.py.

Submit Production to Condor

Condor submission on the LPC batch system is supported. Support for submission to the global pool via CMS Connect is preliminary. The scripts in this section use the Condor Python bindings, which require /usr/lib64/python2.6/site-packages to be in the PYTHONPATH environment variable. For full functionality, the Python packages paramiko and python-gssapi are also required.

To reduce the size of the CMSSW tarball sent to the Condor worker node, there are a few standard directories that can be marked as cached using the script cache_all.sh:

./cache_all.sh

The test/condorSub directory contains all of the relevant scripts. If you copy this to another directory and run the looper.py script, it will submit one job per file to condor for all of the relevant samples. Example:

cp -r condorSub myProduction
cd myProduction
python looper.py -o root://cmseos.fnal.gov//store/user/YOURUSERNAME/myProduction -s

Consult the --help option to view the available options. looper.py can also check for jobs which were completely removed from the queue and make a resubmission list.

The jobs open the files over xrootd, so looper.py will check that you have a valid grid proxy. It will also make a tarball of the current CMSSW working directory to send to the worker node. If you want to reuse an existing CMSSW tarball (no important changes have been made since the last time you submitted jobs), add the argument -k.

When the python file list for a given sample (usually data) is updated, it may be desirable to submit jobs only for the new files. The input dictionary format for looper.py optionally allows a (non-zero) starting number to be placed after the sample name. To get the number of the first new job, just use len(readFiles) from the python file list before updating it.

When submitting jobs for prompt data, each data file will be checked to see if the run it contains is certified in the corresponding JSON file. The JSON file is taken by default from the scenario; an alternative can be specified with the --json option, e.g. if the JSON is updated and you want to submit jobs only for the newly certified runs. (Use compareJSON.py to subtract one JSON list from another, following this twiki.)

Sometimes, a few jobs might fail, e.g. due to xrootd connectivity problems. Failed jobs are placed in "held" status in the Condor queue. This enables the job output and parameters to be examined. The job can be examined and resubmitted using the script manageJobs.py. Consult the --help option for the script to view the available functions.

The scripts looper.py and manageJobs.py can be configured by a file called .tmconfig (using Python ConfigParser syntax). The config parser first looks for .tmconfig in the directory where the script is located (typically Production/test/condorSub), and then looks in the user's $HOME directory. Currently, the allowed values are:

[common]
user = ...
[looper]
input = ...
[manage]
dir = ...

Calculate Integrated Luminosity

Scripts are available to calculate the integrated luminosity from data ntuples (produced with TreeMaker):

python lumiSummary.py
python calcLumi.py

The script lumiSummary.py loops over a list of data samples (by default, a list of Run2015C and Run2015D samples) and creates a JSON file for each sample consisting of the lumisections which were actually processed. Run python lumiSummary.py --help to see the available options. (This script is based on the CRAB3 client job report scripts.)

The resulting JSON file can be run through brilcalc using calcLumi.py to determine the integrated luminosity for the dataset. Run python calcLumi.py --help to see the available options. (NB: this only works on lxplus with brilcalc installed.)

Calculate Pileup Corrections

A script is available to calculate the pileup corrections for MC:

python pileupCorr.py

A ROOT file containing the data, MC, and data/MC histograms (with uncertainty variations) is produced. Run python pileupCorr.py --help to see the available options.

Info for New Samples

The script get_mcm.py can search the McM database for given samples (with wildcard support) to discern the status of the sample (whether it has finished running), the generator cross section, and the full dataset path for the sample ("/X/Y/Z" format). Command line options exist to specify campaign names and other information, which can be viewed with the --help option. An example dictionary of samples and extensions to check can be found at dict_mcm.py. This script requires cern-get-sso-cookie to access the McM database, which is installed on lxplus and cmslpc.

The script get_py.py will automatically create the "_cff.py" python file containing the list of ROOT files for samples specified in a Python ordered dictionary, e.g. dict.py (enabled with -p). For MC samples, it can also automatically generate the appropriate configuration line to add the sample to getWeightProducer_cff.py, if the cross section is specified (enabled with -w). The script can also check to see which sites (if any) have 100% dataset presence for the sample (enabled with -s). (You may also need export SSL_CERT_DIR='/etc/pki/tls/certs:/etc/grid-security/certificates' (bash) or setenv SSL_CERT_DIR '/etc/pki/tls/certs:/etc/grid-security/certificates' (tcsh) to avoid the error SSL: CERTIFICATE_VERIFY_FAILED from urllib2.)

Before running the script for the first time, some environment settings are necessary:

source /cvmfs/cms.cern.ch/crab3/crab.csh

To run the script:

python get_py.py -d dict.py [options]

To check for new samples, use the above script get_mcm.py or query DAS, e.g.:

das_client.py --query="dataset=/*/RunIISpring16MiniAOD*/MINIAODSIM" --limit=0 | & less

Samples with Negative Weight Events

Samples produced at NLO by amcatnlo have events with negative weights, which must be handled correctly. To get the effective number of events used to weight the sample, there is a multi-step process.

Step 1: Get the "_cff.py" files, without generating WeightProducer lines (assuming samples are listed in dictNLO.py).

python get_py.py dict=dictNLO.py wp=False

Step 2: Run NeffFinder, a simple analyzer which calculates the effective number of events for a sample. The analyzer should be submitted as a Condor batch job for each sample (assuming samples are listed in looperNeff.sh), because the xrootd I/O bottleneck is prohibitive when running interactively. Be sure to sanity-check the results, as xrootd failures can cause jobs to terminate early.

cp -r condorSubNeff myNeff
cd myNeff
./looperNeff.sh
(after jobs are finished)
python getResults.py

Step 3: Update dictNLO.py with the newly-obtained Neff values and generate WeightProducer lines.

python get_py.py dict=dictNLO.py py=False

Options

Brief explanation of the options in makeTree.py

scenario: the scenario name, in case of special requirements (default="")
inputFilesConfig: name of the python file with a list of ROOT files for a sample, used for Condor production (automatically appended with "_cff") (default="")
nstart: first file to use in above file list (default=0)
nfiles: number of files to use in above file list, -1 includes all files (default=-1)
dataset: direct list of input files (alternative to above 3 parameters) (default=[])
numevents: number of input events to process, -1 processes all events (default=-1)
outfile: name of the ROOT output file that will be created by the TFileService (appended with outfilesuff below) (default="test_run")
outfilesuff: suffix to append to outfile above (default="_RA2AnalysisTree")
treename: name of output ROOT TTree (default="PreSelection")
lostlepton: switch to enable the lost lepton background estimation processes (default=True)
hadtau: switch to enable the hadronic tau background estimation processes (default=True)
hadtaurecluster: switch to enable the hadronic tau reclustering to include jets with pT < 10 GeV, options: 0 = never, 1 = only TTJets/WJets MC, 2 = all MC, 3 = always (default=1)
doZinv: switch to enable the Z->invisible background estimation processes (default=True)
systematics: switch to enable JEC- and JER-related systematics (default=True)
semivisible: switch to enable variables for semi-visible jets (default=False)
doPDFs: switch to enable the storage of PDF weights and scale variation weights from LHEEventInfo (default=True)
The scale variations stored are: [mur=1, muf=1], [mur=1, muf=2], [mur=1, muf=0.5], [mur=2, muf=1], [mur=2, muf=2], [mur=2, muf=0.5], [mur=0.5, muf=1], [mur=0.5, muf=2], [mur=0.5, muf=0.5]
debugtracks: store information for all PF candidates in every event (default=False) (use with caution, increases run time and output size by ~10x)
applybaseline: switch to apply the baseline HT selection (default=False) The following parameters take their default values from the specified scenario:
globaltag: global tag for CMSSW database conditions (ref. FrontierConditions)
tagname: tag name for collections that can have different tags for data or MC
geninfo: switch to enable use of generator information, should only be used for MC
fastsim: switch to enable special settings for SUSY signal scans produced with FastSim
pmssm: switch to enable special settings for pMSSM signal scans
signal: switch to enable assessment of signal systematics (currently unused)
jsonfile: name of JSON file to apply to data
jecfile: name of a database file from which to get JECs
jerfile: name of a database file from which to get JERs
residual: switch to enable residual JECs for data
era: CMS detector era for the dataset
redir: xrootd redirector or storage element address (default="root://cmsxrootd.fnal.gov/") (fastsim default="root://cmseos.fnal.gov/")

Extra options in runMakeTreeFromMiniAOD_cfg.py:

reportfreq: frequency of CMSSW log output (default=1000)
dump: equivalent to edmConfigDump, but accounts for all command-line settings; exits without running (default=False)
mp: enable igprof hooks for memory profiling (default=False)
threads: run in multithreaded mode w/ specified number of threads (default=1)
streams: run w/ specified number of streams (default=0 -> streams=threads)
tmi: enable TimeMemoryInfo for simple profiling (default=False)

kmei-cms / treemaker Goto Github PK

treemaker's Introduction

TreeMaker

Instructions

Unit Tests (Interactive Runs)

Submit Production to Condor

Calculate Integrated Luminosity

Calculate Pileup Corrections

Info for New Samples

Samples with Negative Weight Events

Options

treemaker's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent