hep-kbfi / hh-bbww Goto Github PK

View Code? Open in Web Editor NEW

1.0 9.0 3.0 1.31 GB

repository for HH->bbWW analysis

C++ 4.41% Python 94.94% C 0.64% Shell 0.01%

hh-bbww's Introduction

hh-bbww

repository for HH->bbWW analysis

For installation instructions, visit: https://twiki.cern.ch/twiki/bin/viewauth/CMS/TTHtautauFor13TeV-Tallinn

hh-bbww's People

Contributors

Stargazers

Watchers

Forkers

sandeepbhowmik1 saswatinandan gleb-bogomol

hh-bbww's Issues

Reduce the verbosity of HME

I now have writing access to the original HME repository but I need to find some time to implement a solution that enables the verbose output only if appropriate flag has been set.

The problem right now is that the log files become extremely huge: one batch of analysis jobs generates up to 15GB of log files. I've used the following patch to minimize the excessive logging but it needs a better solution than that:

diff --git a/src/heavyMassEstimator.cc b/src/heavyMassEstimator.cc
index bdc306d..2276201 100755
--- a/src/heavyMassEstimator.cc
+++ b/src/heavyMassEstimator.cc
@@ -400,24 +400,24 @@ heavyMassEstimator::runheavyMassEstimator(){//should not include any gen level i
 	*htoWW_lorentz = *onshellW_lorentz+*offshellW_lorentz;
 	*h2tohh_lorentz = *htoWW_lorentz+*htoBB_lorentz;
 	if (h2tohh_lorentz->M()<245 or h2tohh_lorentz->M()>3800) {
-			std::cerr <<" heavyMassEstimator h2 mass is too small, or too large,  M_h " <<h2tohh_lorentz->M() << std::endl;
-			std::cerr <<" gen nu eta "<< eta_gen <<" nu phi "<< phi_gen << std::endl;
-			std::cerr <<" from heavyMassEstimator mu_onshell (px,py,pz, E)= ("<< mu_onshellW_lorentz->Px()<<", "<<  mu_onshellW_lorentz->Py()<<", "<< mu_onshellW_lorentz->Pz()<<", "<< mu_onshellW_lorentz->E() <<")"<< std::endl;
-			std::cerr <<" from heavyMassEstimator mu_offshell (px,py,pz, E)= ("<< mu_offshellW_lorentz->Px()<<", "<<  mu_offshellW_lorentz->Py()<<", "<< mu_offshellW_lorentz->Pz()<<", "<< mu_offshellW_lorentz->E() <<")"<< std::endl;
-			std::cerr <<" from heavyMassEstimator nu_onshell (px,py,pz, E)= ("<< nu_onshellW_lorentz->Px()<<", "<<  nu_onshellW_lorentz->Py()<<", "<< nu_onshellW_lorentz->Pz()<<", "<< nu_onshellW_lorentz->E() <<")"<< std::endl;
-			std::cerr <<" from heavyMassEstimator nu_offshell (px,py,pz, E)= ("<< nu_offshellW_lorentz->Px()<<", "<<  nu_offshellW_lorentz->Py()<<", "<< nu_offshellW_lorentz->Pz()<<", "<< nu_offshellW_lorentz->E() <<")"<< std::endl;
-			std::cerr <<" from heavyMassEstimator htoBB, mass "<< htoBB_lorentz->M()<<"(px,py,pz, E)= ("<<htoBB_lorentz->Px()<<", "<< htoBB_lorentz->Py() <<", "<< htoBB_lorentz->Pz() <<", "<< htoBB_lorentz->E()<<")" <<std::endl;
+//			std::cerr <<" heavyMassEstimator h2 mass is too small, or too large,  M_h " <<h2tohh_lorentz->M() << std::endl;
+//			std::cerr <<" gen nu eta "<< eta_gen <<" nu phi "<< phi_gen << std::endl;
+//			std::cerr <<" from heavyMassEstimator mu_onshell (px,py,pz, E)= ("<< mu_onshellW_lorentz->Px()<<", "<<  mu_onshellW_lorentz->Py()<<", "<< mu_onshellW_lorentz->Pz()<<", "<< mu_onshellW_lorentz->E() <<")"<< std::endl;
+//			std::cerr <<" from heavyMassEstimator mu_offshell (px,py,pz, E)= ("<< mu_offshellW_lorentz->Px()<<", "<<  mu_offshellW_lorentz->Py()<<", "<< mu_offshellW_lorentz->Pz()<<", "<< mu_offshellW_lorentz->E() <<")"<< std::endl;
+//			std::cerr <<" from heavyMassEstimator nu_onshell (px,py,pz, E)= ("<< nu_onshellW_lorentz->Px()<<", "<<  nu_onshellW_lorentz->Py()<<", "<< nu_onshellW_lorentz->Pz()<<", "<< nu_onshellW_lorentz->E() <<")"<< std::endl;
+//			std::cerr <<" from heavyMassEstimator nu_offshell (px,py,pz, E)= ("<< nu_offshellW_lorentz->Px()<<", "<<  nu_offshellW_lorentz->Py()<<", "<< nu_offshellW_lorentz->Pz()<<", "<< nu_offshellW_lorentz->E() <<")"<< std::endl;
+//			std::cerr <<" from heavyMassEstimator htoBB, mass "<< htoBB_lorentz->M()<<"(px,py,pz, E)= ("<<htoBB_lorentz->Px()<<", "<< htoBB_lorentz->Py() <<", "<< htoBB_lorentz->Pz() <<", "<< htoBB_lorentz->E()<<")" <<std::endl;
 		if (simulation){
-    			std::cerr <<"following is pure gen level infromation " << std::endl;
-    			std::cerr <<" nu1 px "<<nu1_lorentz_true->Px() << " py " <<nu1_lorentz_true->Py() << " pt "<< nu1_lorentz_true->Pt() 
-			<< " eta "<<nu1_lorentz_true->Eta() << " phi "<< nu1_lorentz_true->Phi() << std::endl;
-    			std::cerr <<" nu2 px "<<nu2_lorentz_true->Px() << " py " <<nu2_lorentz_true->Py() << " pt "<< nu2_lorentz_true->Pt() 
-			<< " eta "<<nu2_lorentz_true->Eta() << " phi "<< nu2_lorentz_true->Phi() << std::endl;
-    			std::cerr <<" onshellW mass "<< onshellW_lorentz_true->M(); onshellW_lorentz_true->Print();  
-    			std::cerr <<"offshellW mass " <<offshellW_lorentz_true->M(); offshellW_lorentz_true->Print();  
-    			std::cerr <<" htoWW mass "<< htoWW_lorentz_true->M(); htoWW_lorentz_true->Print();
-    			std::cerr <<" htoBB mass "<< htoBB_lorentz_true->M(); htoBB_lorentz_true->Print();
-    			std::cerr <<" h2tohh, pz " <<h2tohh_lorentz_true->Pz() << " Mass " << h2tohh_lorentz_true->M() << std::endl;
+//    			std::cerr <<"following is pure gen level infromation " << std::endl;
+//    			std::cerr <<" nu1 px "<<nu1_lorentz_true->Px() << " py " <<nu1_lorentz_true->Py() << " pt "<< nu1_lorentz_true->Pt()
+//			<< " eta "<<nu1_lorentz_true->Eta() << " phi "<< nu1_lorentz_true->Phi() << std::endl;
+//    			std::cerr <<" nu2 px "<<nu2_lorentz_true->Px() << " py " <<nu2_lorentz_true->Py() << " pt "<< nu2_lorentz_true->Pt()
+//			<< " eta "<<nu2_lorentz_true->Eta() << " phi "<< nu2_lorentz_true->Phi() << std::endl;
+//    			std::cerr <<" onshellW mass "<< onshellW_lorentz_true->M(); onshellW_lorentz_true->Print();
+//    			std::cerr <<"offshellW mass " <<offshellW_lorentz_true->M(); offshellW_lorentz_true->Print();
+//    			std::cerr <<" htoWW mass "<< htoWW_lorentz_true->M(); htoWW_lorentz_true->Print();
+//    			std::cerr <<" htoBB mass "<< htoBB_lorentz_true->M(); htoBB_lorentz_true->Print();
+//    			std::cerr <<" h2tohh, pz " <<h2tohh_lorentz_true->Pz() << " Mass " << h2tohh_lorentz_true->M() << std::endl;
    		}	
 		
 		continue;
@@ -1352,7 +1352,7 @@ heavyMassEstimator::bjetsCorrection(){
     b2lorentz = *hme_b2jet_lorentz;
   }
   else {
-    std::cout <<"wired b1jet is not jet with larger pt "<< std::endl;
+    //std::cout <<"wired b1jet is not jet with larger pt "<< std::endl;
     b1lorentz = *hme_b2jet_lorentz;
     b2lorentz = *hme_b1jet_lorentz;
   }
@@ -1381,7 +1381,7 @@ heavyMassEstimator::bjetsCorrection(){
     b1rescalefactor = rescalec1;
     b2rescalefactor = rescalec2;
   }else{
-    std::cout <<"wired b1jet is not jet with larger pt "<< std::endl;
+    //std::cout <<"wired b1jet is not jet with larger pt "<< std::endl;
     b2rescalefactor = rescalec1;
     b1rescalefactor = rescalec2;
   }

Implement DY normalization SF

I'll disable them by default until we some control plots to show.

Add ttH, H->bb + enable a few more samples

Relaxed loose lepton definition in DL analysis

It's very likely that we're going to relax the lepton definition of the signal leptons in the DL analysis that is much looser than our definition of preselected leptons. In order to ensure that the SL and DL analyses don't overlap, we need to veto DL events in the SL analysis using the new lepton definition in the veto. The problem is that our Ntuples are storing only the leptons that pass the preselection cuts which presumably are tighter than the proposed lepton definition. This in effect means that all Ntuples have to be post-processed again if we want to include the missing leptons that would pass the new lepton definition but don't because of the preselection cuts.

Status flag of LHE particles missing

For historical reasons, not all NanoAOD Ntuples have the LHEPart_status branch. Initially (and officially), the NanoAOD FW saves only the status = 1 particles to the Ntuple. At some point I modified our CMSSW fork such that it also saves the status flag of the LHE particles to the Ntuple because some of the LHE Higgses had status = 2. All Ntuples that are produced since this change have the branch, but the Ntuples produced before don't. See HEP-KBFI/tth-htt#99 (comment) for more context.

This in turn brings us to the root cause, that HHGenKinematicsHistManager uses these LHE particles to compute di-Higgs mass and cos(theta*) variables. This is completely redundant because we already have these variables pre-computed in post-production and read at the analysis level (EventInfo::gen_mHH and EventInfo::gen_cosThetaStar).

AK8(LS) Wjj boosted in SL

Recent modifications

pulled out of the function the boosted part

have a clean selection, based only in objects counting
be sure that in the two cases for resolved Wjj reco we use the very same boosted selection

Did not added flags to do Wjj from AK8 / AK8_LS

instead now if you want to get beck to AK8 comment the encapsulated in "// if AK8_LS to Wjj" and uncoment the encapsulated in "// if AK8 to Wjj"
- I just checked the yields table again with [1]
- I do not think we will like to be as flexible as having a flag to it as this chapter needs to be closed at some point soon
  (do not go there if you think so Karl! I go in case)

Not directly related, but slightly related as it is how Wjj boosted subcats are defined

Following ttH way of doing subcategories for datacards. They are implemented in such a way that only appear in evt folder

faster run + prevent future problem regarding run this thing with shape syst explode, that was a ttH problem
subcategories are defined here: https://github.com/HEP-KBFI/hh-bbww/blob/master/bin/analyze_hh_bb1l.cc#L2871-L2980

Also use VBF jets to compute the PU jet ID sF

Before it slips off from my mind: need to include preselected VBF jets when computing the PU jet ID SF. Although it's a bit of an open item as to which jet collection should we use:

VBF jets that pass just the preselection;
VBF jets that pass the preselection and are cleaned wrt leptons;
VBF jets that pass the preselection, and are cleaned wrt leptons and other jets.

I think that 2. is the most accurate choice here because it would be more-or-less on the same footing with the cuts that the central jets are required to pass when entering the SF calculation.

Replace and add new samples

replace NLO DY (mll > 50 GeV) with NLO DY samples binned by jet multiplicity (Njet = 0, 1, 2) to increase the statistics [$];
enable WW->lnuqq, WZ->2l2q and WZ->lnqq samples;
add ttW->ttqq, ttZ->ttqq, W+/-(->lnu)h(->bb) samples [$];
there's also a proposal on the table to drop Z(->ll)H(->bb) for ggZ(->ll)H(->bb), ggZ(->vv)H(->bb) and H(->WW)ZJ [$], but I don't think it is justified. Perhaps adding HZJ would be the compromise, as its XS isn't that small and it wouldn't overlap with the other processes in terms of phase space.

Samples marked with [$] (modulo ggZH) we need to process first. The only complication here are the XS of the new DY samples.

Update DL sample cross sections

Somehow missed that some of the DL samples contain HH->bbZZ events.

add back BM histograms booking in default mode in SL

description on title

Updates regarding samples

So that we don't lose track of the required changes:

separate out VVV;
merge dibosons into VV;
split TH into tHq and tHW;
correct Z->ll BR in HHTo2B2V samples (negligible, 0.3% difference).

Do not use skimmed samples if < 2 leptons+taus required

This feature is already disabled in ~~bb1l1tau~~ bb1l channel, but not in TT1lctrl and in Wctrl. I'll fix it later today.

edit: I meant bb1l channel.

Reweigh down 2016 resonant HH events that have W->tau nu

As discussed in the mattermost channel, we lose a bit of signal yield (eg 14% in SL @ 500 GeV) if we apply a flat SF in order to account for leptonic tau decays in the simulation. Instead, we should scale down only the W->tau nu events because the softer leptons for tau decays are less likely to pass our analysis cuts. In other words, instead of scaling everything down, we should scale down only the portion of our signal that is less likely to contribute to our SR.

Action items:

create a branch for the number of taus that originate from a W or a Z decay. This information is not recoverable from our current Ntuples, which means that we have to rerun post-processing for 2016 resonant signal samples;
given that an event can have up to N such taus (N = 1 in SL and = 2 in DL), apply a SF of 0.3521^N / r to every signal event in the event loop, where r = 0.78403 for SL and r = 0.622253 for DL (see these slides for more details).

uniformize H proc naming

Uniformize H proc (TTH, TH, VH) namings/decays wrt to ttH analysis to apply correctly systematics.
Adapt the HH names to be easier deal with BR uncertainties in the combineHavester step.

New signal samples

Plus a few extra ttbar samples in 2016. Creating the issue in order to keep track of the progress.

Update top pT reweighting method in all HH analyses

Plan to switch to "fit2" described on slide 12 of this presentation because it's more accurate at higher energy scales (that we may reach in resonant searches).

Optimize the hadd step

I think that the copyHistograms step makes the files that enter the hadd stage fairly small, which allows to increase the number of inputs per hadd job significantly. At the moment, the hadd stage generates too many jobs each of which consume a mere 250MB of memory, while the mem cap is at 2GB.

Skim samples for bb1l analysis

Require at least one fakeable lepton in the skimming, no limits on the multiplicity of hadronic taus. Maybe this has the potential to shorten the time needed in producing the datacards.

XS normalizations HH signal (NLO and VBF)

In the non-resonant powheg NLO HH and the non resonant VBF ones ones (anomalous couplings included) we need to have it normalized to the sample cross section as from HXSWG for interpretation (not necessarily the MCM one).
I would find and table of them, but then I find the safest solution is to re-do the samples dictionary to update the normalizations consistently (if that is automatic).

Revisit reweighting in non-resonant analysis

I have a feeling that I didn't update this repository in parallel with the multilepton repository when migrating to the latest HH reweighting scheme. At least lines like these:

hh-bbww/bin/analyze2_hh_bb1l.cc

Lines 596 to 600 in 1036129

 if(apply_HH_rwgt_nlo) 

 { 

 HHWeightNLO_calc = new HHWeightInterfaceNLO(hhWeight_couplings, era); 

 HHWeightNLOonly_calc = new HHWeightInterfaceNLO(hhWeight_couplings, era, true); 

 }

raise eyebrows because we shouldn't reweight to LO and then to NLO. Instead, we should straight up use NLO samples and reweight from NLO-to-NLO.

The second issue reported by @saswatinandan is that the BDT Ntuples are not filled with the reweighting weights. The two problems might be connected.

Encode LBN category name in addSystFakeRate cfgs

Almost forgot about this bug as it was buried in a barrage of Skype messages. Opened the issue just to keep track of things, should be easy to implement though

Script for extracting relative MC closure uncertainties

Or, if it's not too much work, then configure addSystFakeRates to yield the relative uncertainties automatically.

More systematics

Creating another issue in order to keep track of the changes that are related to systematic uncertainties. Currently, on the table are:

assign 10% uncertainty to b-jet energy resolution (right now we have none);
vary JES of AK4 and AK8 jets simultaneously (right now we have separate nuisances for both cases);
apply dedicated b-tagging SF when varying JES (right now we have disentangled the two and use just the b-tagging SF that corresponds to total JES as separate nuisance).

Some of these items will be discussed in tomorrow's HH meeting.

Restrict the list of syst unc in SL and DL

The aim is to reduce the memory consumption by limiting the number of histograms that we book in the analysis, which is proportional to the number of sources of systematic uncertainties that we consider in an analysis job. (Creating the thread to keep track of the progress.)

Prototype to 4-jet assignment

Step 1)

Add the MVA to 4-jet-assigment:

add the 4-jet assignment in the format of MVA file you have available here
the loop in jets is here [1]
- note that the variables filling is now dumb
- three variables are added there:
  1. if the event had the possibility of having a gen matching (hadtruth)
  2. if the gen-matched 4-jet-plet was the chosen one (max_truth_4jet_assigment)
  3. the best MVA output (max_mvaOutput_4jet_assigment)
- with the first two you can compute how many times you got right in the resolved all reco case wrt the events that you could get right in the resolved all reco case
  - note that you need to implement a fuction for that spits out if it is gen matched or not
these selected jets are then ordered in such way the the Wjj-4jet-assigment jets are ordered by pT and the Hbb-4jet-assigment jets are ordered by brag score here
- I do not think we need that, but as it was there before I kept the same logic
The variables with these jet-assignments are computed here -- you can add more.

Run the BDT mode and:

test the variables for different signals (eg SM nones and 900 GeV)
run a MVA using those variables (mentioned in point 4 above)

--> for the non-res case with the sum of BM samples reweighted!

[1] that is very much inspired on the HTT-tagger

==========================

step 2)

Implement the MVA in the format that you have it available here.

the computation is now dumb, see here

As it is now I am assuming

that you computed that test MVA for two cases: with the naming pointing out that they are for SM nones and 900 GeV, see [2]
that your MVA is only for the 4-jet (all-reco) case and other MVA (for the missing jet case) is used for the missing jet category. That is implied here and here

You can, of course change this logic if you want to test the result in other points (change the naming) and/or have a unique MVA to all phase space.

This result is then saved in a dictionary here, that are the list of MVA variables to be saved in the evt histograms in subcategories.

The nice thing of being a dictionary is that when you will have a parametric MVA in masses or shape BMs you can fill that with a loop in masses or shape BMs, and still add only one entry here (you do not need to change anything else on the EvtHistManager)
other nice thing is to keep steady naming convention on those loops

[2] I do suggest you try one simple making of cards (see here) with the working example that is pointed there once, to appreciate how to have MVA/prepareDatacards naming conventions that make sense (eg keep on it the MVA target -- or mass range target) help you in this step.

=====================

step 3)

For realistic limit results in terms of limits do the rebining exercise described here

When the final binning is set see here and what would be the effect of adding the second part of this dictionary and this example here
- I suggest to do that only in the very final stages of analysis, when all is sync and the final MVA to final results is set.
  - One idea I would go for is to make each subcategory in the subcategories dictionary have MVA-type keys that point to the binning choice of that subcategory + that MVA-type
    - that is a bit too far to think about, though

ps.: Appart from this prototype implementation I still left the dumb test of the TF loading separated here and here, so that testing the TF compatibility is detached from this exercise.

Run HME with MEM in bb2l channel

Since running HME is quite an expensive task, it's somewhat prohibitive to run it in the analysis job. Creating dedicated workflow for HME analogous to MEM is out of the question as well because it has a lot of overhead in terms of human time -- setting up the workflow takes substantial amount of time, plus the bookkeeping becomes more complex, because we would have to deal with multiple sets of Ntuples: without HME and MEM, with HME and without MEM and with both HME and MEM.

Compared to MEM, however, HME is relatively fast. Considering that the we need to compute MEM and HME in the same channel, it makes sense to move the HME computation to the sample place where MEM is computed, so that both are computed in one go. The only downside to this are the shape uncertainties: if the JES or JER are varied, it may alter the estimated HH mass computed by HME. There are multiple options to handle it:

simply ignore the effect of JetMET shape systematics on HME results and compute it from central values only;
compute HME with central values, but introduce additional shape systematics pertaining to HME in the analysis job that varies the di-Higgs mass of HME by X% which is determined by analyzing the effect of shape systematics on HME in a bbWW sample;
compute HME mass for all JetMET shape systematics in one job. It obviously increases the runtime spent on HME substantially.

The MEM is implemented with the 3rd option in mind, but in practice we effectively use the 1st option and run MEM on the central values while ignoring the effects of shape systematics, in order to save some computing time. I think it's worth to:

benchmark HME: estimate how much the runtime increases in MEM+HME job when running with central values and with full systematics for HME (while keeping MEM at central values only);
compare HME results for different shape systematics. This way we can gain some insight how much the HME mass is affected by shape uncertainties, plus it also tests the stability of the HME algorithm and its implementation;
if the runtime doesn't substantially increase, then stick to the 3rd option; otherwise, implement either 1st or 2nd option.

The task itself can be broken down into following steps:

move everything HME-related from analyze_hh_bb2l.cc to addMEM_hh_bb2l.cc;
analogously to MEMOutputReader_hh_bb2l and MEMOutputWriter_hh_bb2l, create classes that read and write HME masses from/to a TTree. The HME branch names should at the very least encode the systematics name (as is the case with MEM). The writer class should be used in HME+MEM jobs and reader class in bb2l analysis jobs;
provide a boolean or two to tell MEM+HME job that the given list of systematics applies to HME, MEM, or both. This has the benefit that if the 3rd option becomes viable, we necessarily don't want to compute MEM score for all shape systematics.

The testing & validation should be done using sync Ntuple:

produce sync Ntuple for all JetMET systematics in bb2l channel;
implement the solution as described in the previous paragraph;
produce a new sync Ntuple and compare the results;
estimate the effect of shape systematics on HME and optimize for speed if necessary.

Reskim samples with the relaxed lepton definition

Currently, none of the skimmed samples are usable because they're skimmed by the multiplicity of leptons that pass the "old" ttH lepton definition. We should probably consider skimming SL and DL separately, depending on its effectiveness.

Index-based gen-matching and object cleaning

As opposed to dR-based gen-matching and object cleaning, we should probably consider using the index-based approach as we already do in ttH analysis. This is more consistent with current/future analyses that are migrating to NanoAOD Ntuples.

in bbWW we use only one part of HH sample for training

We do not need two trainings.

By now, the half of the HH non-res LO events (== events that need reweighing) not used in application is the odd one.
In the BDT mode it loads all.

That is followed in this commit and this commit

One needs to pay attention

add resonant to the logic in default mode
In training code explicitly throw away the part to be used on application

ps.: as no training is done in HH non-res NLO that does not need to be added to this logic

Make LBN work with CMSSW

The problem is that embedded Python doesn't quite work when loading tensorflow, and the problem appears to be coming from tensorflow itself. Thus, the only realistic options that could work for us are:

make the evaluation of the LBN model work during the runtime:
i. create a Python script that reads the LBN model and listens to stdin;
ii. spawn the script from the analysis;
iii. in the event loop of the analysis, pass necessary arguments to the script (using the PID or some other handle returned from the previous point) which evaluates the model for a given event;
iv. make sure to clean up the auxiliary process once the program dies.
evaluate the LBN model using nanoAOD-tools and embed the results in the Ntuples;

In the first solution, we would have to spawn the script in a separate process because we want to construct the input model only once per analysis job. Constructing the model for every selected event is prohibitive due to timing constraints.

The second option is disfavored because it requires new (or modifying the existing) software that manages Ntuples and jobs outside CMSSW environment. It also adds another step between Ntuple production and analysis, and requires more human time because of additional Ntuple management.

bbWW dilepton now doing only resolved

to bookeeping,
--> we are doing that to compare at data card level with Florian/Agni

recent modification, related with the making of data cards for the above mentioned comparison

Following ttH the subcategories for datacards are implemented in such a way that only appear in evt folder

faster run + prevent future problem regarding run this thing with shape syst explode, that was a ttH problem
current submits to that defined here https://github.com/HEP-KBFI/hh-bbww/blob/master/bin/analyze_hh_bb2l.cc#L2090-L2118

Missing systematics

After yesterday's discussion it seems that we're still missing some systematic uncertainties that other groups have implemented:

JMS, JMR -- we already have them, just need to re-enable them again;
Uncertainty associated with correcting LO W+jets spectrum to NLO using LHE info -- more details at AN2019/229v6, section 4.4.3;
~~PS uncertainties on other backgrounds besides ttbar -- will have to check if it's possible to add them, and if not then need to understand why. Correlation still up for debate (imo);~~ (see my next post)
Uncertainty associated with b-tagging SFs on AK8 subjets.

The first two points are easy enough; the third requires some investigation. The fourth item is the most challenging, because our Ntuples simply lack the information needed to compute these SFs. Will need to discuss what our options are, because I don't think we can just ignore them since they rank relatively high in terms of impact.

Adding more plots on data/MC mode

Even if this is not necessary to be done I add here description on how to add more plots when asking for additional plots than the signal extraction ones (see this issue)

As promised, three lines, I will exemplify them following one plot making

and a forth to book the prepareDatacards making

here

@saswatinandan, when you get it, close the issue

b-jets missing PU jet ID cuts

Looks like the functionality of applying PU jet ID cuts was implemented only in the central AK4 jet selector class, but not in the b-jet selector class. I think we should set up the classes such that the b-jet selector class inherits from the central jet selector class. Should've done this long time ago, as there hasn't been any real advantage in keeping them separate.

Update the single-lepton trigger SFs

The plan is to update single-electron and single-muon trigger SFs in 2016. Changing the list of HLT paths is also on the table.

in DL make flag to do or data/MC only runs or signal extraction runs

And not do so many histograms if we want to do signal extraction only

	if(apply_HH_rwgt_nlo)
	{
	HHWeightNLO_calc = new HHWeightInterfaceNLO(hhWeight_couplings, era);
	HHWeightNLOonly_calc = new HHWeightInterfaceNLO(hhWeight_couplings, era, true);
	}