Difference: JoshuhaThomasWilsker (15 vs. 16)

Revision 1620 Aug 2015 - JoshuhaThomasWilsker

Line: 1 to 1
 
META TOPICPARENT name="TWikiUsers"

Joshuha Thomas-Wilsker's Homepage

Line: 17 to 17
 This page is mostly used as a place to put useful tid-bits but hopefully others may find something useful here as well.


Useful lxplus aliases:

setupATLAS

This allows you to see all aliases for ATLAS software:

Source ROOT : localSetupROOT

Added:
>
>

Setting Up Your Grid Certificate

Before being able to use the grid, you must set up your grid certificate. If your certificate is installed in your browser you need to export it first.

To be able to use your certifiacte with the Globus toolkit (toolkit for grid computing), you need to convert your .p12 certificate into a PEM format Key pair in separate files.

One of these files will be for the key itself, the other for the certificate.

No matter whether you are running on lxplus/linappserv/locally, you will need to copy the .p12 certificate to the machine where you will be running voms-proxy-init from. In this case, I will assume you called the copy on the new machine CertName.p12

Once you have done this you should extract your certificate which contains the public key, and the private key. The public key can be extracted using the follwing command:

$> openssl pkcs12 -in CertName.p12 -clcerts -nokeys -out $HOME/.globus/usercert.pem

The private key is extracted using the command:

$> openssl pkcs12 -in CertName.p12 -nocerts -out $HOME/.globus/userkey.pem

You need to then set the permissions on the userkey.pem otherwise voms-proxy-init will not use it:

$>chmod 400 ${HOME}/.globus/userkey.pem

One should then delete any CertName.p12 copies made on the new machine for security purposes.

Now you can run voms-proxy-init without any problems . . . . until next year when you do this all again with a new certificate . . . .

 

RHUL Faraday Cluster:

To check on jobs (only accessible at RHUL):
http://gfm02.pp.rhul.ac.uk/cgi-bin/pbswebmon.py

To kill a job on the Faraday cluster use the command:
$> qdel $job_number

To kill all jobs :
$> qselect -u $USER | xargs qdel

Line: 32 to 59
 
1.) source /afs/cern.ch/atlas/offline/external/GRID/DA/panda-client/latest/etc/panda/panda_setup.sh

2.) export PATHENA_GRID_SETUP_SH=/afs/cern.ch/project/gd/LCG-share/current_3.2/etc/profile.d/grid_env.sh

3.) source /afs/cern.ch/project/gd/LCG-share/current_3.2/etc/profile.d/grid-env.sh

4.) voms-proxy-init -voms atlas

Making MC12 Ntuples

Changed:
<
<
The following steps are used for RHUL servers running SLC5/6.

Once the grid environment has been established and you have setup the ATLAS env. setup athena:

Used the following for the truth matching study:

$>export AtlasSetup=/afs/cern.ch/atlas/software/dist/AtlasSetup
$>alias asetup='source $AtlasSetup/scripts/asetup.sh'
$>asetup 17.2.13.13.2,MCProd,64,here --cmtconfig=x86_64-slc5-gcc43-opt

(also tried >asetup 17.2.13.13,slc5,here --testarea=/home/username/ATLAS/testarea)

You should also export the variable:

>export JOBOPTSEARCHPATH=/cvmfs/atlas.cern.ch/repo/sw/Generators/MC12JobOptions/latest/common:$JOBOPTSEARCHPATH

This will allow you to access the most recent MC12JobOptions files on cvfms.

The first example here shows how to go from an LHE (Les Houches) file to an EVGEN file.

Les Houches files are the result of an accord made in 2001 which standardised the interface between matrix element and event generator software. Seeing as there are several commonly used programs (e.g. Madgraph, Powheg etc) that are capable of performing the matrix element calculation, it is important to ensure uniformity amongst their output files so that physicist may pick and choose the best calculation for the process they want to look at. Hence, LHE files are the output of the matrix element software and come before the implementation of parton showering tools such as Herwig and Pythia.

The following Python script takes LHE files as the input and transforms them into EVGEN files.

Generate_trf.py ecmEnergy='8000' runNumber='195848' firstEvent='1' maxEvents='5000' randomSeed='1234' jobConfig='/scratch5/jwilsker/ttH_files/LHE_PowHegPythia8_ttH125_8TeV_AZNLO_main31_mc12/mc12_ttH125_lhe_evgen_jobopts.py' outputEVNTFile='/scratch5/jwilsker/ttH_files/EVGEN_PowHegPythia8_ttH125_8TeV_AZNLO_main31_mc12/user.jthomasw.mc12.8TeV.EVGEN_PowHegPythia8_ttH125_AZNLO_main31.pool.root' inputGeneratorFile='/scratch5/jwilsker/ttH_files/LHE_PowHegPythia8_ttH125_8TeV_AZNLO_main31_mc12/group.phys-gener.Powheg_CT10.169887.PowHel-ttH_125_8TeV.TXT.mc12_v1_i11/group.phys-gener.Powheg_CT10.169887.PowHel-ttH_125_8TeV.TXT.mc12_v1._00050.tar.gz'



Notes on options (these can be seen by doing $>Generate_trf.py -h ) :

1 ecmEnergy (float) # Center of mass energy parameter in GeV e.g. 8000.
2 runNumber (int) # each run number corresponds to one physics process. If this is already defined for the process, then use the assigned number. Otherwise make one up.
3 firstEvent (int) # number of the first event in the output data file.
4 maxEvents (int) default=-1 # Maximum number of events to process.
5 randomSeed (int) # random seed for physics generators.
6 jobConfig (list) # jobOptions fragment containing the physics and the configuration settings which is passed to the generators. If the process is already cached, you can probably find the Job options file by listing the files in this directory:

$>ls /cvmfs/atlas.cern.ch/repo/sw/Generators/MC12JobOptions/latest/common/

7 outputEvgenFile (str) # Output file that contains generated events
[ 8 histogramFile] (str) default='NONE' # Output file that contains histograms.
[ 9 ntupleFile] (str) default='NONE' # Output file that contains ntuples.
[10 inputGeneratorFile] (str) default='NONE' # Input file used by the particle generator to generate events
[11 EvgenJobOpts] (str) default='NONE' # Tarball containing the MC09JobOptions to use


The maximum number of events that can be generated is 5000. If you exceed this number, the event generation will fail. When this happenms it will retry 9 times before returning an error which will be fatal for the job.

The following is output by the python script and can be found in (inputDictionary.pickle):

Py:Generate INFO .maxeventsstrategy = 'INPUTEVENTS' # (String) what to do if number of input events is less than maxEvents. Possible values: ['IGNORE', 'INPUTEVENTS', 'ABORT']

This tells you that if you give as an argument, a number that is more than the number of input events, the maxeventstrategy will be employed. In this case, the max events will be set to INPUTEVENTS.

Going from LHE to EVGEN using the grid:

pathena --trf "Generate_trf.py ecmEnergy=8000 runNumber=195848 firstEvent=1 maxEvents=1000 randomSeed=%RNDM:1234 jobConfig=MC12.XXXXXX.py outputEVNTFile=%OUT.pool.root inputGeneratorFile=%IN" --inDS inputDS.TXT.v1/ --outDS user.jthomasw.outputdataset --nFiles 3 --nFilesPerJob 1


One can then use the Reco_trf.py script which is a tool based on D3PDMaker classes. It converts between Ntuple types applying the basic skimming/trimming/slimming procedures set out by the ATLAS MC group.

e.g.

To go from an EVTGEN file to a NTUP_TRUTH you can use the general reconstruction job transform Reco_trf.py , using following command:

Reco_trf.py inputEVNTFile='<pathtoinput>.EVGEN.pool.root' outputNTUP_TRUTHFile='<pathtooutput>NTUP_TRUTH_test.root'

Note , one can use the preExec option with D3PD maker flags to alter your output file. To include ALL truth information, you could use something like:

preExec='from D3PDMakerFlags import D3PDMakerFlags;D3PDMakerFlags.TruthWriteEverything.set_Value_and_Lock(True)'

Along with the rest of the command line.

You can submit these jobs to the grid like:

pathena --trf "Reco_trf.py inputEVNTFile=%IN outputNTUP_TRUTHFile=%OUT.NTUP_TRUTH.root" --outDS user.jthomasw.mc12_8TeV.189628.PowhegJimmy_AUET2CT10_PowhelttH125inc_ljets_NTUP_TRUTH --inDS mc12_8TeV.189628.PowhegJimmy_AUET2CT10_PowHelttH125inc_ljets.evgen.EVNT.e2833/

This will run your job on the grid. For other types of file, you can change the job options.

For more information go to the D3PD Data Reduction section of the ATLAS Software Tutorial.

(2014 tutorial link here: https://indico.cern.ch/event/295572/other-view?view=standard )

>
>
The following steps are used for RHUL servers running SLC5/6.

Once the grid environment has been established and you have setup the ATLAS env. setup athena:

Used the following for the truth matching study:

$>export AtlasSetup=/afs/cern.ch/atlas/software/dist/AtlasSetup
$>alias asetup='source $AtlasSetup/scripts/asetup.sh'
$>asetup 17.2.13.13.2,MCProd,64,here --cmtconfig=x86_64-slc5-gcc43-opt

(also tried >asetup 17.2.13.13,slc5,here --testarea=/home/username/ATLAS/testarea)

You should also export the variable:

>export JOBOPTSEARCHPATH=/cvmfs/atlas.cern.ch/repo/sw/Generators/MC12JobOptions/latest/common:$JOBOPTSEARCHPATH

This will allow you to access the most recent MC12JobOptions files on cvfms.

The first example here shows how to go from an LHE (Les Houches) file to an EVGEN file.

Les Houches files are the result of an accord made in 2001 which standardised the interface between matrix element and event generator software. Seeing as there are several commonly used programs (e.g. Madgraph, Powheg etc) that are capable of performing the matrix element calculation, it is important to ensure uniformity amongst their output files so that physicist may pick and choose the best calculation for the process they want to look at. Hence, LHE files are the output of the matrix element software and come before the implementation of parton showering tools such as Herwig and Pythia.

The following Python script takes LHE files as the input and transforms them into EVGEN files.

Generate_trf.py ecmEnergy='8000' runNumber='195848' firstEvent='1' maxEvents='5000' randomSeed='1234' jobConfig='/scratch5/jwilsker/ttH_files/LHE_PowHegPythia8_ttH125_8TeV_AZNLO_main31_mc12/mc12_ttH125_lhe_evgen_jobopts.py' outputEVNTFile='/scratch5/jwilsker/ttH_files/EVGEN_PowHegPythia8_ttH125_8TeV_AZNLO_main31_mc12/user.jthomasw.mc12.8TeV.EVGEN_PowHegPythia8_ttH125_AZNLO_main31.pool.root' inputGeneratorFile='/scratch5/jwilsker/ttH_files/LHE_PowHegPythia8_ttH125_8TeV_AZNLO_main31_mc12/group.phys-gener.Powheg_CT10.169887.PowHel-ttH_125_8TeV.TXT.mc12_v1_i11/group.phys-gener.Powheg_CT10.169887.PowHel-ttH_125_8TeV.TXT.mc12_v1._00050.tar.gz'



Notes on options (these can be seen by doing $>Generate_trf.py -h ) :

1 ecmEnergy (float) # Center of mass energy parameter in GeV e.g. 8000.
2 runNumber (int) # each run number corresponds to one physics process. If this is already defined for the process, then use the assigned number. Otherwise make one up.
3 firstEvent (int) # number of the first event in the output data file.
4 maxEvents (int) default=-1 # Maximum number of events to process.
5 randomSeed (int) # random seed for physics generators.
6 jobConfig (list) # jobOptions fragment containing the physics and the configuration settings which is passed to the generators. If the process is already cached, you can probably find the Job options file by listing the files in this directory:

$>ls /cvmfs/atlas.cern.ch/repo/sw/Generators/MC12JobOptions/latest/common/

7 outputEvgenFile (str) # Output file that contains generated events
[ 8 histogramFile] (str) default='NONE' # Output file that contains histograms.
[ 9 ntupleFile] (str) default='NONE' # Output file that contains ntuples.
[10 inputGeneratorFile] (str) default='NONE' # Input file used by the particle generator to generate events
[11 EvgenJobOpts] (str) default='NONE' # Tarball containing the MC09JobOptions to use


The maximum number of events that can be generated is 5000. If you exceed this number, the event generation will fail. When this happenms it will retry 9 times before returning an error which will be fatal for the job.

The following is output by the python script and can be found in (inputDictionary.pickle):

Py:Generate INFO .maxeventsstrategy = 'INPUTEVENTS' # (String) what to do if number of input events is less than maxEvents. Possible values: ['IGNORE', 'INPUTEVENTS', 'ABORT']

This tells you that if you give as an argument, a number that is more than the number of input events, the maxeventstrategy will be employed. In this case, the max events will be set to INPUTEVENTS.

Going from LHE to EVGEN using the grid:

pathena --trf "Generate_trf.py ecmEnergy=8000 runNumber=195848 firstEvent=1 maxEvents=1000 randomSeed=%RNDM:1234 jobConfig=MC12.XXXXXX.py outputEVNTFile=%OUT.pool.root inputGeneratorFile=%IN" --inDS inputDS.TXT.v1/ --outDS user.jthomasw.outputdataset --nFiles 3 --nFilesPerJob 1


One can then use the Reco_trf.py script which is a tool based on D3PDMaker classes. It converts between Ntuple types applying the basic skimming/trimming/slimming procedures set out by the ATLAS MC group.

e.g.

To go from an EVTGEN file to a NTUP_TRUTH you can use the general reconstruction job transform Reco_trf.py , using following command:

Reco_trf.py inputEVNTFile='<pathtoinput>.EVGEN.pool.root' outputNTUP_TRUTHFile='<pathtooutput>NTUP_TRUTH_test.root'

Note , one can use the preExec option with D3PD maker flags to alter your output file. To include ALL truth information, you could use something like:

preExec='from D3PDMakerFlags import D3PDMakerFlags;D3PDMakerFlags.TruthWriteEverything.set_Value_and_Lock(True)'

Along with the rest of the command line.

You can submit these jobs to the grid like:

pathena --trf "Reco_trf.py inputEVNTFile=%IN outputNTUP_TRUTHFile=%OUT.NTUP_TRUTH.root" --outDS user.jthomasw.mc12_8TeV.189628.PowhegJimmy_AUET2CT10_PowhelttH125inc_ljets_NTUP_TRUTH --inDS mc12_8TeV.189628.PowhegJimmy_AUET2CT10_PowHelttH125inc_ljets.evgen.EVNT.e2833/

This will run your job on the grid. For other types of file, you can change the job options.

For more information go to the D3PD Data Reduction section of the ATLAS Software Tutorial.

(2014 tutorial link here: https://indico.cern.ch/event/295572/other-view?view=standard )
 

Using the Cluster (Faraday Farm RHUL)

54 SLC5 (12 SLC6) nodes each with 8Gb (16GB) of memory and ~50GB temporary disk space (in /data).

Can run one job per cpu core.

Batch system using Torque & Maui software to manage farm of worker nodes.

You can log onto a node directly from pbs1 or pbs2 by simply using the command:

$>ssh nodeX

Where X is a valid worker node number. You can check which nodes are running on pbs webmon (if you are on campus).

Some useful commands:

$>qsub <script> : submits job by script.
$>qsub -N <name> -1 long <script> Submit job to long queue with specified name.
$>qsub -v VARIABLE=value <script> submit job script to be run with env var set to value. can comma separate variable list (VAR1=v1, VAR2=v2,…)

$>qstat -Q : show status of all queues, including no. of jobs queued and running.
$>qstat -u $USER : lists only users jobs.

$>qdel <jobid> : deletes job.

$>showq lists jobs in order they are scheduled to run.

$>qstat -n1 : lists extra stat info (including node your job is running on)

Running jobs in parallel

You may want to run jobs in parallel.

Torque sets some environment variables when your job script is run:

PBS_JOBNAME : name of job.
PBS_JOBID : job id
hostname : node on which job is running

For job output, your script should create a unique output dir on /data using jobid.

JOBDIR=/data/$USER/job_$PBS_JOBID
mkdir -p $JOBDIR
cd $JOBDIR
# Run some program, which writes output to the current directory
# NB It is good practice to make your own log file, not rely on the batch system to return stdout.
$HOME/bin/myprog > log 2>&1

Now one must copt he files you need back to the directory you want them in at the end of your job:

# copy output and check it worked
cp $JOBDIR/output.txt /home/$USER/joboutput/output_${PBS_JOBID}.txt
if [ $? = 0 ]; then
echo "copy failed"
exit 1
fi
cp $JOBDIR/log /home/$USER/joboutput/log_${PBS_JOBID}
# etc.
# if copy succeeded, tidy up
rm -fr $JOBDIR

This could be a recursive copy if you want to save multiple files.

Monitoring Jobs
The following command can be used in the terminal to monitor your jobs:

$>qstat

Using the -n option allows you to see which node you are running on.

While your jobs are running you can check the error / output files in the following directories:

SLC5 machine:

$>less /var/spool/pbs/spool/<PBS_ID>.ER
$>less /var/spool/pbs/spool/<PBS_ID>.OU

SLC_6 machine:

$>less /var/lib/torque/spool/<PBS_ID>.OU
$>less /var/lib/torque/spool/<PBS_ID>.ER

Line: 45 to 72
 

PBOOK

Changed:
<
<
pbook : Grid job management
“pbook” can be used to manage your grid jobs. Type pbook to open the program and pbook will automatically sync, getting all your current jobs and the relevant information.

Inside the pbook environment you can use the commands:

retry(jobID) : Retries any failed tasks inside the job.
kill(jobID) : Kills the job.
help(<command>) : gives you options on a <command> in pbook.

and much more.
>
>
pbook : Grid job management
“pbook” can be used to manage your grid jobs. Type pbook to open the program and pbook will automatically sync, getting all your current jobs and the relevant information.

Inside the pbook environment you can use the commands:

retry(jobID) : Retries any failed tasks inside the job.
kill(jobID) : Kills the job.
help(<command>) : gives you options on a <command> in pbook.

and much more.
 

Valgrind


Got a leaky memory? Call valgrind on:

>$ valgrind --tool=memcheck --leak-check=full --show-reachable=yes -v ./ttH_Dilepton.exe /scratch5/connelly/ttHFiles/ProductionML_14_00_20/Simulation/Combined/181087_AFII.ML_7B.root subChan 2 2 0 10000

by adding a "-g" option into the GNUMakefile CFlags when compiling, all line numbers will get passed to to objects as well so Valgrind will also see said line numbers.

MemChack will also report errors like "Conditional jump or move depends on uninitialised value(s)". These are hard to solve, but for additional help (and sacrificing a bit more CPU time) you can run with the option

--track-origins=yes

To help you resolve the issue.

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding RHUL Physics Department TWiki? Send feedback