> > |
META TOPICPARENT |
name="Public.PedroTeixeiraDias" |
Notes on using Multivariate Analysis methods
Introduction
This page is a repository of some useful information for applying multivariate (MVA) methods to data analysis in particle physics. It is provided as support for students carrying out projects in the context of the ATLAS experiment at CERN, using the ROOT TMVA framework.
ROOT TMVA resources
- The ROOT TMVA framework Users Guide
- List of "TMVA tutorials": examples of C++ code to use the TMVA toolkit for eg simple signal vs. background classification, for categorisation or for regression.
General references on multivariate analysis methods
Possible lines of investigation
Type of classifier
Compare a simple cut-by-cut analysis performance with that of the following multivariate methods (all using the same input variables):
- a linear discriminant
- a Boosted Decision Tree
- a Neural Network
Weak input variables
Some classifiers (such as BDTs) are known to be more robust than others (such as NNs), with respect to the inclusion of weak input variables (i.e. variables with very low signal-background discriminating power). Test this by training a classifier using, for instance, 4 strong variables and 4 very weak variables. Train a new classifier using only the 4 strongest variables as inputs, and compare their performance. Do this for eg an NN, and then for a BDT.
Impact of architecture and training
How much does the performance of a given classifier type depend on the configuration of its internal parameters, or of the training steps? To investigate this for eg an NN, use as reference its performance out-of-the-box (i.e. using the default TMVA options).
- How important is the NN architecture (eg different number or nodes in the hidden layer, or multiple layers, etc).
- What is the impact of the training sample size?
- How important is the specific form of the input variables used? For instance, for a classifier that takes as its only inputs the four vectors of the jets, leptons and missing energy, does it matter if the four vectors inputs are given as (px, py, pz, E) or as (η, φ, pT, E)?
-- Pedro Teixeira Dias - 24 Apr 2021
|