bdrum / cern-physics Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 1.0 14.02 MB

My analysis workflow

CMake 0.02% C++ 0.10% C 0.89% Jupyter Notebook 98.59% Python 0.40%

cern-physics's Introduction

cern-physics's People

Contributors

Stargazers

Watchers

Forkers

abylinkin

cern-physics's Issues

Optimize memory allocations and add multithreading

Now In case of large set of data e.g. ccup8 or ccup31 scripts works slowly and take all memory.

Migrate drawing to pandas

Now I use such constructions for drawing:

bw, = ax.plot(np.linspace(x.min(), x.max(),1000), fit.bw(x=np.linspace(x.min(), x.max(),1000), M = result.best_values['bw_M'], G=result.best_values['bw_G'], amp=result.best_values['bw_amp'] ), 'g--',label=r'Breit-Wigner')
bkg, = plt.plot(np.linspace(x.min(), x.max(),1000), fit.polyn(np.linspace(x.min(), x.max(),1000), result.best_values['bckg_c0'], result.best_values['bckg_c1'], result.best_values['bckg_c2'], result.best_values['bckg_c3'], result.best_values['bckg_c4'],0,0,0, result.best_values['bckg_amp']), 'b--', label='Bckgr fit')
# plt.plot(bcg_x, bcg_y, 'r+', label='bkg data')
result.plot_fit(datafmt='ko',fitfmt='r-',numpoints=100,data_kws={'xerr':0.022})
ax.set_title('')
_ = ax.set_ylabel(r'Counts per 20 Mev/$c^2$')
_ = ax.set_xlabel(r'M$(\pi^+\pi^-\pi^+\pi^-)$ (GeV/$c^2$)')
_ = plt.legend([dat,ft, bw, bkg],['Data','Fit','Breit-Wigner','Bkg'])

Instead of this, I can combine data to DataFrames and use pandas plot for drawing purposes

Good tracks distribution by run

What run has more good events and why?

Cross section

Now I have number of events with $\rho'$ for each runs and I can get luminosity of these runs.

This means that I can get cross section of the $\rho'$

In fact this is not true, but why?

$$\sigma=\frac{N}{L}$$

Obviously this should be flat. But for such definition I can't confirm that:

Actually in case we will take std as $\frac{1}{\sqrt{nev}}$ such calculation looks strange because of scale doesn't match.

Perhaps, we have to normalize the data. In this case we can see flat curve(last picture):

Overlapping problem

The same track can be triggered with different names (CCUP2 and CCUP9 and so on).

I have to estimate this influence.

What is trending_merged_PbPb.root and how to work with them?

I know that to get luminosity data per run I have to work with trending_merged_PbPb.root file. I can find it e.g. here for each year.

Dalitz plot for four pions events

The Dalitz plot should help to see kinematics of the resonance

Interference of two resonances

Concerning of fitting by two B-Ws in #44

How to calculate interference term of two resonances?

Here is the some explanation about interference of BW with C

For fitting in #44 I have standard $\chi^2$ test for deterring quality of the fit, but for area with small statistics (Puasson) it's wrong.
E.g. I've calculated error as $\sqrt(N)$ for each bin, but in case of small count of events in bin it provide us the situation when 2 events in bin and 1.4 sigma.

Here is the fragment from lection of Cambridge prof. Mark Thomson about the question:

Influence of ITS only tracks to 4 tracks events

Let's see how does further parameters for tracks are distributed for TPC+ITS and ITS only

full momentum
pt
pseudorapidity

Mass based on ITS only tracks should be biased to the soft side.

Repeat mass distributions including PWA

Some time ago I already have got such distribution here

But it has incorrect numbers.

Repeat this results. This will be topic of the next presentation.

Events losses after splitting by trigger

I faced a problem that after splitting event to triggered and untriggered I have a difference between origin total event count and
triggered event plus untriggered:

ind	nTPC	total_cnt	trig_cnt	untrig_cnt	trig %	untrig %	trig_cnt_plus_untrig_cnt	diff
0	0	19194	10349	8631	53.9	45	18980	214
1	1	17641	10138	7334	57.5	41.6	17472	169
2	2	16506	9572	6780	58	41.1	16352	154
3	3	12952	7583	5249	58.5	40.5	12832	120
4	4	5916	3506	2359	59.3	39.9	5865	51

Definitely this is wrong

What is the best way to access UPC events?

In the slides of the one presentation I've seen:

We have two types of UPC events Triggered by MUON Triggered in the Central Barrel In both cases the events are almost empty The average size of AOD UPC events in the central barrel for 2015 PbPb is around 0.5 kB We have some 30 M triggered events, which means some 15 GB of data. The purity of this sample is small, meaning that a simple pre-selection may reduce this number by a large factor (between 2 and 3).

In the past we have used a LEGO train to access the events and produced trees to perform local analyses. The code in the LEGO train is quite stable Latchezar commented that this approach may not be optimal for such small data set distributed over all the PbPb data set and recommended that we had a look at nanoAODs.

The code to select only those branches we use and create with them ‘a nanoAOD’ is ready. As it is so general, we do not expect the code to be changed frequently. Latchezar mentioned that there exist AOD trains in the central framework.

Could it be a good option to use them to produce UPC AODs?

In Summary, what is the best way to run over the UPC triggers in the PbPb sample so that we do not have to run over all the PbPb events to find our few gigas of data?

Comparison with Valeri data

Now me and Valeri have discrepancies in the data.

He has less number of processed runs, but more statistics than me.

What is the reason of such discrepancies?

He'll provide to me file with event info and we can find out the difference...

ITS vs TPC for mass

In continuation of #21 let's compare Mass tracks and event

Estimation of random tracks influence to the stats

Possible the situation when I have 6 tracks, but only 4 from them satisfy some criteria (TPC + total charge).

What influence of such events?

Mass from pairs

I've got mass from 4 tracks, but what if I will calculate it from pion subsystems?

4tr mass distr

Plot the Mass for 4 prongs events with zero and non-zero total charge for different pt intervals:

(0,150]
(150,800]
(800,2000)

ITS vs TPC for mass

Repeat pt distribution with demonstration of amplifying signal for mass:

ZDC energy distribution

Plot histogram of zdc energy distribution.

I've to see smth like abs(sinc)....

Also demonstrate that basic part (let say first half of the plot) contains events that form Mass spectra meanwhile second half mass spectra contains background events.

How to get data from TObjArray with custom objects in .root file?

Solving #20 I've seen that I can't get data from TObjArray of AliTriggerClass objects that contains in the trending_merged_PbPb.root file with usage of uproot4.

It really impossible or any way to do it exist?

Add information about the analysis to UPC twiki page

During one of the last meetings we decided to update twiki page of our group.

I have to add names (me and pvn) and also information about ongoing analysis

Comparison runs quality

I've processed 125 runs, but only ~80 were recommended by DPG.

I can compare the quality of such runs and the rest via visible c-s.

Add parametrized functions

Now here
I just play with data and get correct pt, but now I have to get pt for different data for comparison and so on

Histogramming problems

ROOT has excellent histogram tools with different visualizations modes (error bars, markers), operations for histogram such as ratio or difference.

What I need is

Stats table, such as entries, mean, std, overflow, integral
Operations with histograms such as division and difference
Histogram with Error bars
Visualization with comparison purposes it could be n hist at one plot, subplots, so on

I've seen many of histogramming tools including ROOT, e.g.
scikit-hep/hist
scikit-hep/aghast
scikit-hep/boost-histogram
numpy.histogram

Perhaps many of these tick could be easy solved via let say standard packages, e.g. stats could be provided by pd.describe (except overflow and integral)

Trigger influence investigation

Estimate triggers influence: selected triggers are CCUP9, CCUP2, CCUP4, C1ZED, CCUP11. what difference between them according for my events?

Move data preparation from pandas to numpy

Till this moment I've used uproot4 for reading root files and output result was pandas dataframe.
Now my file has grown from 500mb to 1.5gb, so current version of script doesn't work because it takes forever and takes all memory.

I've tested parsing to numpy arrays and it works fine.

Now I have to move the main script form pandas usage to numpy.

How spd chips represented in AliRoot

Here I have description of SPD
but I have to clarify connection between SPD and AliRoot.

What is ITSModule? (ITS Module / 1000000) is detector number? What does it mean?
What is fF0Modules in SPD?

Reorganizing project dir

Now in my project dir all things are together.

I prefer split notebooks, cpp project, and readme should be related with main notebook

What eta distribution for pion subsytems?

The idea is that we can check $\eta$ distribution for the pairs and check, perhaps lightest mass condition for selecting pair is to hard and we can increase accuracy if make condition for $\eta$

Track duplicates with only ITS criteria

I see in my events plenty of duplicated tracks:

There are only in case of ITS selection

((data['T_HasPointOnITSLayer0']) + (data['T_HasPointOnITSLayer1'])) * \
data['T_ITSRefit']  * (np.abs(data['T_NumberOfSigmaITSPion']) < 3)

In case of adding TPC criteria looks like (cause of I can't get it) that such tracks will be thrown...

Influence of ITSNCls >=3 or >3

I've seen that expanding condition from >3 to >=3 for ITSNCls parameter gives me about 600 events to statistics.

What are these events, background or signal?

Produce Monte Carlo data

Ask Michal Broz to run simulation for the:

Run anchored: Get Run number with maximum of 'good' events
Period: LHC15o
Process: rho(1720) → rho pi+ pi- → 4pi
Number of generated events: 100 000

GeneratorCode

AliGenerator* GeneratorCustom() 
{
  AliGenCocktail* genCocktail = GeneratorCocktail("StarlightEvtGen");
  
  AliGenStarLight* genStarLight = (AliGenStarLight*)GeneratorStarlight();
  genStarLight->SetParameter("PROD_PID     =   999    #Channel of interest (not relevant for photonuclear processes)");
  genStarLight->SetParameter("rho0PrimeMass     =   1.720");
  genStarLight->SetParameter("rho0PrimeWidth     =   0.249");
  genStarLight->SetParameter("W_MAX     =   5.0");
  genCocktail->AddGenerator(genStarLight, "Starlight", 1.);

  AliGenSLEvtGen *genEvtGen = new AliGenSLEvtGen();
  TString decayTable="RHOPRIME.RHOPIPI.DEC"; // default
  genEvtGen->SetUserDecayTable(gSystem->ExpandPathName(Form("$ALICE_ROOT/STARLIGHT/AliStarLight/DecayTables/%s",decayTable.Data())));
  genEvtGen->SetEvtGenPartNumber(30113);

  genEvtGen->SetPolarization(1);           // Set polarization: transversal(1),longitudinal(-1),unpolarized(0)

  genCocktail->AddGenerator(genEvtGen,"EvtGen",1.);
  return genCocktail;
}

Here is the question about decay channel. Now it specified as 999 that correspond to 4 pions final state, but in startlight one more exists - 913 that produces a final state consisting of two pions, also it adds the amplitudes (with interference) for photoproduction of the rho, and for photoproduction of direct pi^+pi^-., but again the final state is only two pions.

Investigate other decay modes for $\rho'$

Until now I was studying only one decay mode:

$\rho' \rightarrow 4 \pi$

In PDG I've seen also other interesting modes for $\rho'$:

1. $\rho' \rightarrow \eta_0 \rho_0$ | ?
- 1.1. $\rho_0 \rightarrow 4 \pi$ | $2*10^{-5}%$
- 1.2. $\rho_0 \rightarrow \pi^+ \pi^-$ | $10^{-2}%$
- 1.3. $\eta_0' \rightarrow \pi^+ \pi^- \gamma$ | $4%$
- 1.4. $\eta_0' \rightarrow \pi^+ \pi^- \pi^0$ | $23%$
1. $\rho' \rightarrow 4 \pi$ | ?

What about $\rho' \rightarrow \rho_0 \rho_0$ is it possible?

Fit mass of resonance

Here is ρ' resonance.

According with PDG description
The best fit could be obtained with the hypothesis of ρ-like resonances at 1420 and 1770 MeV, with widths of about 250 MeV.

I can create two BW model and fit resonance by them.

Trigger quality

I can check the quality of trigger via handle checks of fired fastors.

How to do it I can see here, here and here

$\rho_0$ decay angle for the rest frame of $\rho_0$

During 4$\pi$ photoproduction in reaction of the $\rho'$ decay:

$$\rho'\rightarrow \pi^+\pi^-\pi^+\pi^-$$

I have a intermediate two-body decay state when $\rho_0\rightarrow \pi^+\pi^-$.

This allows us to check $\rho_0$ via determining of the decay angle for rest frame of $\rho_0$.

Aliases for most common visualizing scenarios

I have many common scenarios for data visualizing.

E.g.

vertical comparison or horizontal comparison on the different plots
adding data to the same plots with labels
histogramming with errors bars
custom styles for histograms

How can I simplify usage of such behavior?

First of all I know that in matplotlib I can specify default parameters for some functions. I can use this e.g. for changing default type of histograms to 'step', change defalut colors, applying root style as default.

Then I can prepare functions that can be wrap described common scenarios with additional parameters like arrays of colors and labels, and extend standard arguments of matplotlib via args and kwargs.

Discrepancies in the data

In my analysis two the same criteria, but exposed on the different manner gives difference result:

selectOnlyStandard    = data['T_TPCRefit']  * (data['T_TPCNCls'] > 50) * \
            (data['T_ITSNCls'] > 3) * \
            (~newT_ITSsa) * (np.abs(data['T_NumberOfSigmaTPCPion']) < 3)
GoodEventsStandard = np.argwhere(selectOnlyStandard.sum()==4).flatten() # get events with 4 good tracks
GoodEventsStandard = GoodEventsStandard[np.argwhere(data['T_Q'][selectOnlyStandard][GoodEventsStandard].sum()==0).flatten()].flatten()  

pxstd = data['T_Px'][selectOnlyStandard][GoodEventsStandard]
pystd = data['T_Py'][selectOnlyStandard][GoodEventsStandard]
    
ptstd = np.sqrt(pxstd.sum()**2  + pystd.sum()**2)

fig = plt.figure(figsize=(15, 7))
ax = fig.add_axes([0,0,1,1])
fig.suptitle(f'4pr pt standard criteria', fontsize=32)
plt.style.use(hep.style.ROOT)
counts, bins = np.histogram(ptstd, bins=100, range=(0,2))
_ = ax.hist(ptstd, bins=bins, color='black', histtype='step', label=f'Q = 0;Entries {np.sum(counts)}')
plt.xlabel('Pt, GeV')
plt.ylabel('# events')
ax.legend()

GoodEvents = GetGoodEvents(WithGoodNTpcTracks=4)
counts, bins = np.histogram(GetPt(Draw=False), bins=100, range=(0,2))
_ = ax.hist(GetPt(Draw=False), bins=bins, color='red', histtype='step', label=f'4 TPC tracks;Entries {np.sum(counts)}', linewidth=2)
plt.xlabel('Pt, GeV')
plt.ylabel('# events')
ax.legend()

gives:

First criteria direct combination of second criteria.

Obviously for direct criteria condition to total charge doesn't work:

We can see the sum of zero charge events and non zero:

Picture for the second case, but the meaning is understood

New github action for converting active ipynb to README.md with pictures

Right now I have legacy README.md.

What if I can use github action to convert current notebook to md and put it into README.md?

Test the uniform of ratio between good events number for each run to luminosity of run

I can get the ration between the number of good events for each run and luminosity of each run.

Luminosity for last 115 runs:
Alice/analysis/dev/pvn/lumi115runs.hist.root

The ratio curve should be constant because of the number of events is equal cross section multiply by luminosity.

Generalize computations

Now, when I want to plot mass with total charge zero and non-zero, I can't do it because I don't have a functions or objects that could change the state. So I should repeat a code. This is bad solution.

I have to parametrize calculations and states.

Move code from cell to scripts

Keeping a code in jupyter notebook cells make notebook oversized.
I think the good solution is keeping scripts in the 'scripts' folder and use their as modules.

Fitting with RooFit and comparing quality with lmfit

There is a point that minuit based framework roofit is better then lmfit based on scipy.
Let's compare the quality of the fits.

Influence of limitation for tracks number

In the selection script PVN limits the number of THE GOOD(passed by some criteria) TRACKStracks and fill predefined arrays.

I have chosen another strategy and fill vectors with dynamic size.

That's interesting to compare data losses in PVN case.

Turn on line numbers to cells by default
Turn off suppress error-output

split analysis notebook to analysis and debug

Now I have one notebook and this lead to some problem after changing criteria for debugging and so on.

I have to split it