Tutorials and tips that may come in useful

This is still a massive work in progress. At some point I will copy out some of the notes I have made which have been useful and put them here.

Setting up ROOT on Windows

My short tutorial on configuring ROOT on Windows.

ROOT

Using log scales and stacking in ROOT

To create a stack of histograms typically uses the THStack class. To create a stack of TGraphs you use the TMultiGraph class which works in a similar way.

THStack* stack = new THStack("Name","Title");

stack->Add(histogram_name);

Setting up a log scale is easy enough in ROOT. Assuming you have a TCanvas you apply the following:

TCanvas* canvas = new TCanvas("Name","Title",10,10,900,900)

canvas->SetLogy(); //You can also do SetLogx();

stack->SetMinimum(1);

This final line is important because if you are stacking some histograms, the canvas will typically default to showing 3 or 4 orders of magnitude down from the largest maximum. It is a bit strange but it means that if you have a large background plot and a tiny signal plot, it may not appear straight away from just stacking and setting the scale to log. Setting the minimum to 1 forces the scale to go down to zero (ie log(1) = 0) and the maximum will remain the largest value from the histogram. This was particularly useful for me when I was plotting signal and background scaled number of events passing an MVA cut because to begin with the background events dwarf the amount of signal and only when the cuts progress towards the end of the x scale did the background get cut to the point where there were similar amounts of signal and background events.

Macro for stacking TGraphs

This is a simple little macro but it took a little bit of time to work out all the options to draw the lines properly. The alternative would be to use Glen's script to read out a text file of numbers which plot multiple TGraphs on the same canvas which is slightly different to this. My macro takes two already made TGraphs and will scale them to whatever required luminosity and plot them on the same canvas in the same way THStack does.

tgraphstack(){
    
    TFile* f = new TFile("f_hist.root", "OPEN");
    
    TGraph* signal = (TGraph*)f->Get("scaled signal;1");
    TGraph* background = (TGraph*)f->Get("scaled background;1");
    TMultiGraph* multiG = new TMultiGraph("multiG", "Evolution of number of events with BDT cut scaled to 20fb^{-1}");
    
    TCanvas* c = new TCanvas("c","c", 2000, 2000);
    c->SetLogy();
    c->RangeAxis(-1,1,0,1000000);
    multiG->Add(background);
    multiG->Add(signal);
    //Interesting behaviour where TMultiGraph just sets the minimum y axis to 3 orders of magnitude less than the maximum point being plotted.
    //Using SetMinimum forces the minimum point to be fixed
    //Here use 1 because log(1) = 0
    multiG->SetMinimum(1);
    multiG->Draw("AL");
    //This option draws axes and plots as a line
    //This has to come after the call to draw
    multiG->GetXaxis()->SetTitle("t_{BDT} cut"); 
    multiG->GetYaxis()->SetTitle("Log(scaled number of events)");     //Use Draw("AL") to say plot axis and line

return 0;
}

Other macros

See my webpage for additional scripts and macros I have written as it is generally more convinient to keep them there.

Merging ROOT files

This isn't something I have used properly yet, but I have just come across it. In addition to the TFileMerger class available in ROOT, when you set up your enviroment, you also get a program called hadd which uses the syntax:

hadd <targetfile.root> <input files.root>
The program will add the histograms and ttree entries together. Whilst this can be used to combine, say, all your background samples into one single sample, I realise that it could be used as a simple alternative to PROOF and parallel processing.

My particular analysis program accepts as input the directory of a sample in which are a list of ROOT files which get added to a TChain and then my program, created using MakeClass, will run over the TChain. If I have a large number of ROOT files though, I can now split the same sample into a number of dataset directories, and run the analysis over smaller directories and then merge the results once all processing has been achieved.

In the long run it will probably be better to understand and use PROOF as, from what I have read, that dynamically alters the work load on available nodes to optimise the processing and then merges them together at the end. I think you can also use PROOF-lite to take advantage of multicore processors in your laptop/desktop if you are processing it locally which you wouldn't be able to do with hadd.

Using with Root dictionaries

CINT is able to produce libraries for specific ROOT objects. The main example is wanting to save a vector of TLorentzVectors to a TTree. By default this will not work. However, you can generate libraries to allow this to work. They take the command (in ROOT CINT):

>> root
>> gInterpreter->GenerateDictionary("vector<TLorentzVector>","TLorentzVector.h,vector")

Then to compile with this library you use the advice given above.

BASH Shell (Scripting and tips)

Ctrl-z in terminal - Pausing a process

If you have a process running, typing ctrl+z will pause that process allowing you to use the command line prompt.

If you then type bg it is equivalent to having originally set your program running with an & at the end - ie it tells it to resume running in the background.

You can also type fg to bring it back to running in the foreground where you can see output, but cannot access the command line.

Apparently typing jobs will tell you what is running in the background and allow you to call them to the foreground if need be.

Getting the full path of multiple files

In terminal you can use:

find . 
to retrieve the list of all files and folders in the current directory, much like ls.

However, this is useful if you want to get hold of the full path directory of a file, for example:

find `pwd` -type f
will return the list of all files in the current directory with their full path (as using pwd instead of . gives the static/global path to the files).

This can save some time if you need to list them in file to run over with a script.

Merging Multiple .ps files into one .pdf

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=merged.pdf \
*.ps

Error checking bash script commands

As per the link here, commands in terminal will have a success/fail result in the special character $?. For example:

ls *.root
RESULT=$?
if [ $RESULT -ne 0 ]; then
    echo "ERROR in the command"
fi

Emacs - How to make backspace key delete backwards

Whether it be an issue with my emacs setup on lxplus, it seems that my default backspace behaviour in

emacs -nw
is to be the same as the delete character. To change this in each session do:
M-x (ie Esc-x)
normal-erase-is-backspace-mode

BASH Scripting : Looping through two arrays simultaneously

I wanted to have two arrays, one with a filename and the other with an output file name. I needed to use shell scripting to manipulate the file names because of some systematics which needed renaming. I orginially set up a loop over a list of filenames which can be performed as:

FILES=(file1.root file2.root textfile.txt)
for F in ${FILES[@]}; do
    echo $F
done

However you cannot quite use this as you would in C++ as you are not using a counter, per se, to control it.

An alternative method taken from here explains how to simultaneously loop through two distinct lists:

# First have your lists in two files (listA.txt, listB.txt here)
while read -u 5 aVAR && read -u 6 bVAR; do
    echo $aVAR "   " $bVAR
done 5<listA.txt 6<listB.txt

This makes use of file handlers (here assigned to 5 and 6) to loop through the file (just make sure each item is on a new line in the files) and voila!

Using Pythia

Pythia is an event generator which simulates particle collisions and the uses the Lund string model to propagate the hadronisation of the interaction. Pythia 8 is currently the latest incarnation and is written in C++. It is possible to set up a local installation of Pythia on your own laptop and assuming you have ROOT installed also, you can use both sets of libraries to carry out a parton level analysis of anything you might want to model. Obviously Pythia does not have any detector level effects (as it is parton level) so you are basically working at a "even-better-than-best-case-scenario" with your results. That said, the distributions found should be similar to the proper MC and data sets. It is worth pointing out that currently pile-up is not well modelled in Pythia 8, not that it has been something I have had to worry about so far, but that was the reason from switching from Pythia 8 back to Pythia 6 in the generation of MC11b and MC11c.

You can get Pythia from here.

Once installed you can compile Pythia and ROOT libraries using the following Makefile. In addition, you will need to edit a line in one of the xml files. I forget which, but on the first compile an error will come up which should point you in the direction of it. The reason being that the file uses a relative path directory (ie ../folder) instead of an absolute one. This means if you try and compile outside of the folder that Pythia generates the example codes, you will get this error, so bear this in mind.

# Glen Cowan, RHUL Physics, February 2012
# Editted for use by Ian Connelly, Feb 2012

PROGNAME      = signal
SOURCES       = signal.cc
INCLUDES      = 
OBJECTS       = $(patsubst %.cc, %.o, $(SOURCES))
ROOTCFLAGS   := $(shell root-config --cflags)
ROOTLIBS     := $(shell root-config --libs)
ROOTGLIBS    := $(shell root-config --glibs)
ROOTLIBS     := $(shell root-config --nonew --libs)
ROOTINCDIR   := $(shell root-config --incdir)
CFLAGS       += $(ROOTCFLAGS)
LIBS         += $(ROOTLIBS)
#  Not sure why Minuit isn't being included -- put in by hand
#
LIBS         += -lMinuit
LDFLAGS       = -O

#  Now the Pythia stuff
#

# Location of Pythia8 directories 
# (default is parent directory, change as necessary)
PY8DIR=/Users/Ian/Documents/pythia8160
PY8INCDIR=$(PY8DIR)/include
PY8LIBDIR=$(PY8DIR)/lib
PY8LIBDIRARCH=$(PY8DIR)/lib/archive


# Include Pythia and Pythia examples config files
-include $(PY8DIR)/config.mk
-include $(PY8DIR)/examples/config.mk

LIBS += -lpythia8
LIBS += -llhapdfdummy

$(PROGNAME):  $(OBJECTS) 
      g++ -o $@ $(OBJECTS) $(LDFLAGS) -L$(PY8LIBDIRARCH) $(LIBS)

%.o : %.cc $(INCLUDES) 
   g++  ${CFLAGS} -c -I$(ROOTINCDIR) -I$(PY8INCDIR) -g -o $@ $<

test:
   @echo $(PY8LIBDIRARCH)

clean:   
   -rm -f ${PROGNAME} ${OBJECTS}

The example workbook which is available on the main site is very good at introducing the syntax that Pythia uses. The standard procedure is: set up the collision->decide what parton modelling is required (ie hadronisation? multiple interactions?)->run event loop->include analysis in event loop->fin.

You are able to access every single parton which is created in the hard interaction and also in the soft hadronisation afterwards using something like this (for HZ->bbvv):

int main() {
    
    //Set up generation
    
    Pythia pythia;
    pythia.readString("HiggsSM:ffbar2HZ = on"); //Higgsstrahlung production
    pythia.readString("Beams:eCM = 8000."); //eCM = 8TeV
    pythia.readString("25:m0 = 125"); //Higgs mass 125
    pythia.readString("25:onMode = off"); //Turn off all Higgs decay modes
    pythia.readString("25:onIfAny = 5 -5"); //Allow H->bb
    pythia.readString("23:onMode = off"); //Turn off all Z decay mdoes
    pythia.readString("23:onIfAny = 12 14 16 -12 -14 -16"); //Allow Z->vv
    pythia.readString("PartonLevel:MI = off ! no multiple interactions");
    pythia.readString("PartonLevel:ISR = off ! no initial-state radiation");
    pythia.readString("PartonLevel:FSR = off ! no final-state radiation");
    pythia.readString("HadronLevel:Hadronize = off");
    
    //Initialise any TFiles, TTrees, TH1s, vectors before event loop
    pythia.init();

    //Event loop
    for(int iEvent = 0; iEvent < 30000; ++iEvent){
    pythia.next();
        for(int i = 0; i < pythia.event.size(); ++i){
            //Then these are the annihilated hardest incoming partons
            if(pythia.event[i].status() == -21){ 
                
            }

        }
    }
return 0;
}

The status codes allow you to find partons which are final state or intermediate state partons. In addition, there is another method called pythia.process[i] which apparently only accesses the hard partons whereas .event[i] accesses all of them. Note though that if you turn off too many effects so you only have the hard interaction, .process[i] stops functioning (from what I have heard) so you can only access through .event[i].

The status codes (and more information about the particle classes and extra things) are available on the html document. This is installed locally or can be accessed online at the Lund hosted version here.

Analysis Specific

Extracting the luminosity of data files

This is going here as I struggled far more than I should have to get the correct options. You use atlas-lumicalc to extract the luminosity associated with a particular good runs list. The GRL is used as a mask to only use data which has been taken at a time when all the detector functions were working correctly.

At the moment I am using dataset containers which have been made by the Top group to collect together the runs by period of data taking. This means on the face of it, the runs which are inside it are not explicitly clear. There are however DQ2 commands to extract these out. The following is a script I wrote which (when VOMS and DQ2 is set up) will list the datasets and extract the run number and write it to the terminal screen. This uses bash string manipulation which is a bit cumbersome but does the job (# means after the first thing listed, % means before the thing listed).

# To list the datasets inside a container
# @param DATASET=list.txt // Put in list.txt the data containers you want to check
# From https://twiki.cern.ch/twiki/bin/viewauth/Atlas/DQ2ClientsHowTo#DatasetsContainerCommands
# Ian Connelly 20 Nov 2012

DATASET=list.txt

while IFS= read -r DATASET; do
  echo $DATASET
  for LINE in `dq2-list-datasets-container $DATASET`; do
    #data12_8TeV.00207589.physics_Muons.merge.NTUP_TOPMU.f467_m1191_p1104_p1141_tid00929888_00
    RUN=${LINE#data12_8TeV.*}
    RUN=${RUN%*.*.*.*.*}
    RUN=${RUN#00*}
    echo -n "$RUN,"
  done  
  echo ""
  echo ""
done < $DATASET

Once you run this, you will have on terminal screen the data container name and then the list of runs inside it. If you go to atlas-lumicalc, upload your GRL and in the extra options line put:

-r "comma-separated-string-of-runs"

Make sure there are no spaces between runs and commas. You can also use (it seems):

-r="datarun-datarun,datarun,datarun-datarun..."
This syntax indicates to take all runs between the ones hyphenated, but the equals sign seems to work in this case but not the one above.

Interesting thing with C dynamic libraries

    // Wierd code to load the dynamic library and extract function
    // This would work if it was a C library but it is not
    // C++ does name mangling which stops you from just using
    // the name of the function to get it from the library
    // This is because C++ allows fn overload but C does not.
    // Try using ROOT to load instead.
    /*
    void* mv1cLib = dlopen("../libMV1c_cxx.so", RTLD_LAZY);
    float (*mv1cFcn)(float,float,float,float,float,float,float);
    *(void **)(&mv1cFcn) = dlsym(mv1cLib, "mv1cEval");
    */
    //gROOT->LoadMacro("MV1c_cxx.so");

Some notes on compiling against dynamic libraries/ libraries which are not listed in LD_LIBRARY_PATH

The issue with this is two fold:

  1. ) When compiling and linking, the makefile/gcc needs to know where libraries reside in order to include them.
  2. ) At run time, these libraries also need to be identified.

This means there are two things to include in the makefile:

-Wl, -rpath $(abspath./directory)

This adds a hardcoded part to the executable which says at runtime "check this path for libraries" (nb -Wl is gcc saying "pass this option to the linker").

Then one needs:

-L$(abspath ./directory) -lMyLibrary

This is used to link the library at compilation so it knows to include it.

Sidenote: Ideally, a library needs to be called

 lib<Name>.so
in order to include it as
-l<Name>

Both of these terms (which one could set to a makefile variable) need to be listed as library inclusions in the building phase of the compilation and linking phase.

Diagnostics

 readelf -d <Executable> 
will list the shared objects and the rpath.
 ldd <Executable> 
will display the libraries that are needed to run.
Edit | Attach | Watch | Print version | History: r42 | r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions...

Physics WebpagesRHUL WebpagesCampus Connect • Royal Holloway, University of London, Egham, Surrey TW20 0EX; Tel/Fax +44 (0)1784 434455/437520

Topic revision: r11 - 11 Jan 2013 - IanConnelly

 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding RHUL Physics Department TWiki? Send feedback