Difference: PTD_MVA_topic ( vs. 1)

Revision 124 Apr 2021 - PedroTeixeiraDias

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="Public.PedroTeixeiraDias"

Notes on using Multivariate Analysis methods

Introduction

This page is a repository of some useful information for applying multivariate (MVA) methods to data analysis in particle physics. It is provided as support for students carrying out projects in the context of the ATLAS experiment at CERN, using the ROOT TMVA framework.

ROOT TMVA resources

  • The ROOT TMVA framework Users Guide
  • List of "TMVA tutorials": examples of C++ code to use the TMVA toolkit for eg simple signal vs. background classification, for categorisation or for regression.

General references on multivariate analysis methods

Possible lines of investigation

Type of classifier

Compare a simple cut-by-cut analysis performance with that of the following multivariate methods (all using the same input variables):

  • a linear discriminant
  • a Boosted Decision Tree
  • a Neural Network

Weak input variables

Some classifiers (such as BDTs) are known to be more robust than others (such as NNs), with respect to the inclusion of weak input variables (i.e. variables with very low signal-background discriminating power). Test this by training a classifier using, for instance, 4 strong variables and 4 very weak variables. Train a new classifier using only the 4 strongest variables as inputs, and compare their performance. Do this for eg an NN, and then for a BDT.

Impact of architecture and training

How much does the performance of a given classifier type depend on the configuration of its internal parameters, or of the training steps? To investigate this for eg an NN, use as reference its performance out-of-the-box (i.e. using the default TMVA options).

  • How important is the NN architecture (eg different number or nodes in the hidden layer, or multiple layers, etc).
  • What is the impact of the training sample size?
  • How important is the specific form of the input variables used? For instance, for a classifier that takes as its only inputs the four vectors of the jets, leptons and missing energy, does it matter if the four vectors inputs are given as (px, py, pz, E) or as (η, φ, pT, E)?

-- Pedro Teixeira Dias - 24 Apr 2021

<--/commentPlugin-->
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding RHUL Physics Department TWiki? Send feedback