bdrum / cern-physics Goto Github PK
View Code? Open in Web Editor NEWMy analysis workflow
My analysis workflow
Now In case of large set of data e.g. ccup8 or ccup31 scripts works slowly and take all memory.
Now I use such constructions for drawing:
bw, = ax.plot(np.linspace(x.min(), x.max(),1000), fit.bw(x=np.linspace(x.min(), x.max(),1000), M = result.best_values['bw_M'], G=result.best_values['bw_G'], amp=result.best_values['bw_amp'] ), 'g--',label=r'Breit-Wigner')
bkg, = plt.plot(np.linspace(x.min(), x.max(),1000), fit.polyn(np.linspace(x.min(), x.max(),1000), result.best_values['bckg_c0'], result.best_values['bckg_c1'], result.best_values['bckg_c2'], result.best_values['bckg_c3'], result.best_values['bckg_c4'],0,0,0, result.best_values['bckg_amp']), 'b--', label='Bckgr fit')
# plt.plot(bcg_x, bcg_y, 'r+', label='bkg data')
result.plot_fit(datafmt='ko',fitfmt='r-',numpoints=100,data_kws={'xerr':0.022})
ax.set_title('')
_ = ax.set_ylabel(r'Counts per 20 Mev/$c^2$')
_ = ax.set_xlabel(r'M$(\pi^+\pi^-\pi^+\pi^-)$ (GeV/$c^2$)')
_ = plt.legend([dat,ft, bw, bkg],['Data','Fit','Breit-Wigner','Bkg'])
Instead of this, I can combine data to DataFrames and use pandas plot for drawing purposes
What run has more good events and why?
Now I have number of events with
This means that I can get cross section of the
In fact this is not true, but why?
Obviously this should be flat. But for such definition I can't confirm that:
Actually in case we will take std as
Perhaps, we have to normalize the data. In this case we can see flat curve(last picture):
The same track can be triggered with different names (CCUP2 and CCUP9 and so on).
I have to estimate this influence.
I know that to get luminosity data per run I have to work with trending_merged_PbPb.root file. I can find it e.g. here for each year.
The Dalitz plot should help to see kinematics of the resonance
Concerning of fitting by two B-Ws in #44
How to calculate interference term of two resonances?
Here is the some explanation about interference of BW with C
For fitting in #44 I have standard
E.g. I've calculated error as
Here is the fragment from lection of Cambridge prof. Mark Thomson about the question:
Some time ago I already have got such distribution here
But it has incorrect numbers.
Repeat this results. This will be topic of the next presentation.
I faced a problem that after splitting event to triggered and untriggered I have a difference between origin total event count and
triggered event plus untriggered:
ind | nTPC | total_cnt | trig_cnt | untrig_cnt | trig % | untrig % | trig_cnt_plus_untrig_cnt | diff |
---|---|---|---|---|---|---|---|---|
0 | 0 | 19194 | 10349 | 8631 | 53.9 | 45 | 18980 | 214 |
1 | 1 | 17641 | 10138 | 7334 | 57.5 | 41.6 | 17472 | 169 |
2 | 2 | 16506 | 9572 | 6780 | 58 | 41.1 | 16352 | 154 |
3 | 3 | 12952 | 7583 | 5249 | 58.5 | 40.5 | 12832 | 120 |
4 | 4 | 5916 | 3506 | 2359 | 59.3 | 39.9 | 5865 | 51 |
Definitely this is wrong
In the slides of the one presentation I've seen:
We have two types of UPC events Triggered by MUON Triggered in the Central Barrel In both cases the events are almost empty The average size of AOD UPC events in the central barrel for 2015 PbPb is around 0.5 kB We have some 30 M triggered events, which means some 15 GB of data. The purity of this sample is small, meaning that a simple pre-selection may reduce this number by a large factor (between 2 and 3).
In the past we have used a LEGO train to access the events and produced trees to perform local analyses. The code in the LEGO train is quite stable Latchezar commented that this approach may not be optimal for such small data set distributed over all the PbPb data set and recommended that we had a look at nanoAODs.
The code to select only those branches we use and create with them ‘a nanoAOD’ is ready. As it is so general, we do not expect the code to be changed frequently. Latchezar mentioned that there exist AOD trains in the central framework.
Could it be a good option to use them to produce UPC AODs?
In Summary, what is the best way to run over the UPC triggers in the PbPb sample so that we do not have to run over all the PbPb events to find our few gigas of data?
Now me and Valeri have discrepancies in the data.
He has less number of processed runs, but more statistics than me.
What is the reason of such discrepancies?
He'll provide to me file with event info and we can find out the difference...
In continuation of #21 let's compare Mass tracks and event
Possible the situation when I have 6 tracks, but only 4 from them satisfy some criteria (TPC + total charge).
What influence of such events?
I've got mass from 4 tracks, but what if I will calculate it from pion subsystems?
Plot the Mass for 4 prongs events with zero and non-zero total charge for different pt intervals:
Plot histogram of zdc energy distribution.
I've to see smth like abs(sinc)....
Also demonstrate that basic part (let say first half of the plot) contains events that form Mass spectra meanwhile second half mass spectra contains background events.
Solving #20 I've seen that I can't get data from TObjArray of AliTriggerClass objects that contains in the trending_merged_PbPb.root file with usage of uproot4.
It really impossible or any way to do it exist?
During one of the last meetings we decided to update twiki page of our group.
I have to add names (me and pvn) and also information about ongoing analysis
I've processed 125 runs, but only ~80 were recommended by DPG.
I can compare the quality of such runs and the rest via visible c-s.
Now here
I just play with data and get correct pt, but now I have to get pt for different data for comparison and so on
ROOT has excellent histogram tools with different visualizations modes (error bars, markers), operations for histogram such as ratio or difference.
What I need is
I've seen many of histogramming tools including ROOT, e.g.
scikit-hep/hist
scikit-hep/aghast
scikit-hep/boost-histogram
numpy.histogram
Perhaps many of these tick could be easy solved via let say standard packages, e.g. stats could be provided by pd.describe (except overflow and integral)
Estimate triggers influence: selected triggers are CCUP9, CCUP2, CCUP4, C1ZED, CCUP11. what difference between them according for my events?
Till this moment I've used uproot4 for reading root files and output result was pandas dataframe.
Now my file has grown from 500mb to 1.5gb, so current version of script doesn't work because it takes forever and takes all memory.
I've tested parsing to numpy arrays and it works fine.
Now I have to move the main script form pandas usage to numpy.
Here I have description of SPD
but I have to clarify connection between SPD and AliRoot.
What is ITSModule? (ITS Module / 1000000) is detector number? What does it mean?
What is fF0Modules in SPD?
Now in my project dir all things are together.
I prefer split notebooks, cpp project, and readme should be related with main notebook
The idea is that we can check
I see in my events plenty of duplicated tracks:
There are only in case of ITS selection
((data['T_HasPointOnITSLayer0']) + (data['T_HasPointOnITSLayer1'])) * \
data['T_ITSRefit'] * (np.abs(data['T_NumberOfSigmaITSPion']) < 3)
In case of adding TPC criteria looks like (cause of I can't get it) that such tracks will be thrown...
I've seen that expanding condition from >3 to >=3 for ITSNCls parameter gives me about 600 events to statistics.
What are these events, background or signal?
Ask Michal Broz to run simulation for the:
Run anchored: Get Run number with maximum of 'good' events
Period: LHC15o
Process: rho(1720) → rho pi+ pi- → 4pi
Number of generated events: 100 000
GeneratorCode
AliGenerator* GeneratorCustom()
{
AliGenCocktail* genCocktail = GeneratorCocktail("StarlightEvtGen");
AliGenStarLight* genStarLight = (AliGenStarLight*)GeneratorStarlight();
genStarLight->SetParameter("PROD_PID = 999 #Channel of interest (not relevant for photonuclear processes)");
genStarLight->SetParameter("rho0PrimeMass = 1.720");
genStarLight->SetParameter("rho0PrimeWidth = 0.249");
genStarLight->SetParameter("W_MAX = 5.0");
genCocktail->AddGenerator(genStarLight, "Starlight", 1.);
AliGenSLEvtGen *genEvtGen = new AliGenSLEvtGen();
TString decayTable="RHOPRIME.RHOPIPI.DEC"; // default
genEvtGen->SetUserDecayTable(gSystem->ExpandPathName(Form("$ALICE_ROOT/STARLIGHT/AliStarLight/DecayTables/%s",decayTable.Data())));
genEvtGen->SetEvtGenPartNumber(30113);
genEvtGen->SetPolarization(1); // Set polarization: transversal(1),longitudinal(-1),unpolarized(0)
genCocktail->AddGenerator(genEvtGen,"EvtGen",1.);
return genCocktail;
}
Here is the question about decay channel. Now it specified as 999 that correspond to 4 pions final state, but in startlight one more exists - 913 that produces a final state consisting of two pions, also it adds the amplitudes (with interference) for photoproduction of the rho, and for photoproduction of direct pi^+pi^-., but again the final state is only two pions.
Until now I was studying only one decay mode:
In PDG I've seen also other interesting modes for
What about
Here is ρ' resonance.
According with PDG description
The best fit could be obtained with the hypothesis of ρ-like resonances at 1420 and 1770 MeV, with widths of about 250 MeV.
I can create two BW model and fit resonance by them.
I have many common scenarios for data visualizing.
E.g.
How can I simplify usage of such behavior?
First of all I know that in matplotlib I can specify default parameters for some functions. I can use this e.g. for changing default type of histograms to 'step', change defalut colors, applying root style as default.
Then I can prepare functions that can be wrap described common scenarios with additional parameters like arrays of colors and labels, and extend standard arguments of matplotlib via args and kwargs.
In my analysis two the same criteria, but exposed on the different manner gives difference result:
selectOnlyStandard = data['T_TPCRefit'] * (data['T_TPCNCls'] > 50) * \
(data['T_ITSNCls'] > 3) * \
(~newT_ITSsa) * (np.abs(data['T_NumberOfSigmaTPCPion']) < 3)
GoodEventsStandard = np.argwhere(selectOnlyStandard.sum()==4).flatten() # get events with 4 good tracks
GoodEventsStandard = GoodEventsStandard[np.argwhere(data['T_Q'][selectOnlyStandard][GoodEventsStandard].sum()==0).flatten()].flatten()
pxstd = data['T_Px'][selectOnlyStandard][GoodEventsStandard]
pystd = data['T_Py'][selectOnlyStandard][GoodEventsStandard]
ptstd = np.sqrt(pxstd.sum()**2 + pystd.sum()**2)
fig = plt.figure(figsize=(15, 7))
ax = fig.add_axes([0,0,1,1])
fig.suptitle(f'4pr pt standard criteria', fontsize=32)
plt.style.use(hep.style.ROOT)
counts, bins = np.histogram(ptstd, bins=100, range=(0,2))
_ = ax.hist(ptstd, bins=bins, color='black', histtype='step', label=f'Q = 0;Entries {np.sum(counts)}')
plt.xlabel('Pt, GeV')
plt.ylabel('# events')
ax.legend()
GoodEvents = GetGoodEvents(WithGoodNTpcTracks=4)
counts, bins = np.histogram(GetPt(Draw=False), bins=100, range=(0,2))
_ = ax.hist(GetPt(Draw=False), bins=bins, color='red', histtype='step', label=f'4 TPC tracks;Entries {np.sum(counts)}', linewidth=2)
plt.xlabel('Pt, GeV')
plt.ylabel('# events')
ax.legend()
gives:
First criteria direct combination of second criteria.
Obviously for direct criteria condition to total charge doesn't work:
We can see the sum of zero charge events and non zero:
Picture for the second case, but the meaning is understood
Right now I have legacy README.md.
What if I can use github action to convert current notebook to md and put it into README.md?
I can get the ration between the number of good events for each run and luminosity of each run.
Luminosity for last 115 runs:
Alice/analysis/dev/pvn/lumi115runs.hist.root
The ratio curve should be constant because of the number of events is equal cross section multiply by luminosity.
Now, when I want to plot mass with total charge zero and non-zero, I can't do it because I don't have a functions or objects that could change the state. So I should repeat a code. This is bad solution.
I have to parametrize calculations and states.
Keeping a code in jupyter notebook cells make notebook oversized.
I think the good solution is keeping scripts in the 'scripts' folder and use their as modules.
There is a point that minuit based framework roofit is better then lmfit based on scipy.
Let's compare the quality of the fits.
In the selection script PVN limits the number of THE GOOD(passed by some criteria) TRACKStracks and fill predefined arrays.
I have chosen another strategy and fill vectors with dynamic size.
That's interesting to compare data losses in PVN case.
Can I try to get decay by the channel
Can I represent tree arrays via matrix?
shape(entries, ntracks)?
In the solution process #22 I realized that a problem of structure is not only in oversized. Also reinitializing of variables in case of restarting the kernel could be a headache. Moreover jumping from cell to cell without tags is really slowly.
I've a plan to create package for my analysis based on python modules and in notebooks just import modules and call functions.
Figure out how to:
Now I have one notebook and this lead to some problem after changing criteria for debugging and so on.
I have to split it
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.