Background Estimation
5.2 Background Samples
5.2.8 Fake Lepton
We define leptons, which originate fromW andZ boson decays, as real leptons. On the other hand, we define leptons, which originate from photon conversions and B hadron semileptonic decays or misidentified jets, as fake leptons. The background process consists of fake leptons and multiple jets. We use a data-driven method known as the matrix method to determine the normalisation of the fake lepton events.
The matrix method for the event with a single lepton is roughly described as follows. We define two criteria, loose and tight, and prepare two data sets corresponding to the criteria related with isolation and identification. The data set of the loose selection contains the larger number of fake leptons than that of the tight selection. On the other hand, the data set of the tight selection contains the larger number of real leptons than that of the loose selection. Equation (5.4) shows the number of events passing the loose selection,NL, and events passing the tight selection,NT,
NL=NfakeL +NrealL , (5.4)
NT =NfakeT +NrealT , (5.5)
whereNfakeL is the number of fake lepton events after the loose selection,NrealL is the number of real lepton events after the loose selection,NfakeT is the number of fake lepton events after the tight selection,NrealT is the number of real lepton events after the tight selection.NT is a subset ofNL andNT < NL. Then, we define the ratio ofNrealL toNrealT ,εr, and the ratio ofNfakeL toNfakeT ,εf.
εr= NrealT
NrealL , εf = NfakeT
NfakeL . (5.6)
εrindicates the reduction rate of real leptons between the loose selection and the tight selection.
εf indicates the reduction rate of fake leptons between the loose selection and the tight selection.
We measure theεrand theεf by collecting two data sets. The estimation of the real efficiencyεr and the fake efficiencyεf is different between the electron and the muon channel. The number of events for fake leptons passing the tight selection,NfakeT , is estimated by
NfakeT = εf
εr−εf(εrNL−NT). (5.7)
Both efficiencies εr and εf depend on variables associated with lepton kinematics like pleptonT and event characteristics like the number ofb-tagged jets. The efficiencies are parametrised as functions of various object kinematics. Then, an event weight is expressed by
wi = εf
εr−εf(εr−δi), (5.8)
whereδiequals unity if the loose eventipasses the tight event selection and 0 otherwise. The sum ofwiover all events in a given bin of the final observable is the number of the fake leptons in that bin.
The matrix method for events on the dilepton selection is roughly described as follows. We label the numbers of observed events with two tight leptons asNtt, of observed events with one loose and one tight lepton asNtl andNlt, of observed events with two loose leptons asNll. The leading leptons in theNtl region are classified into tight leptons and the leading leptons in theNlt region are classified into loose leptons. By using the efficienciesεr andεf, linear equations are obtained for the observed yields as a function on the number of events with two real leptonsNrr, of events with one real lepton and one fake leptonNrf andNf r, of events with two fake leptons Nf f:
Nrr Nf r Nrf Nf f
=M−1
Ntt Ntl Nlt Nll
, (5.9)
whereMis a4×4matrix written in terms ofεrandεf. The matrix is expressed by
M=
εr,1εr,2 εr,1εf,2 εf,1εr,2 εf,1εf,2 εr,1εr,2 εr,1εf,2 εf,1εr,2 εf,1εf,2 εr,1εr,2 εr,1εf,2 εf,1εr,2 εf,1εf,2 εr,1εr,2 εr,1εf,2 εf,1εr,2 εf,1εf,2
, (5.10)
where the indexes 1, 2 onεrandεf refer to the leading or sub-leading lepton in the event, respec-tively, andεstands for(1−ε). We obtain four event weights;wrr,wrf,wf r, andwf f. An event with two loose leptons contains at least one fake lepton, therefore, a probability for the event is given bywrf +wf r+wf f. The event weight with two tight leptons is expressed by
wtt =εr,1εf,2wrf+εf,1εr,2wf r+εf,1εf,2wf f. (5.11) We evaluate the efficiencies by measuring the contribution of the fake lepton to the top quark pair production and the single top quark production processes in pp collision events at√
s = 8 TeV [56]. In the event selections, we require the single electron trigger, which is labelled as e24vhi and e60, or the single muon trigger, which is labelled as mu24i and mu36. We define the signal region, which is used for the validation and the estimation of systematic uncertainties, and the control region, which is used for the estimation of the efficiencies. For the validation, we use the simulation samples of the signal processes and the background processes, which are Z/W+jets, diboson and dijet. Both regions have the lepton plus jets channel, which is classi-fied into e+jets and µ+jets channels, and the dilepton channel. By applying cuts to the num-bers of jets andb-tagged jets, the signal region for the lepton plus jets channel is separated into two regions, at least 4 jets and pretag, at least 4 jets and at least oneb-tagged jet, where “pre-tag” indicates that there is no criteria on the number of b-tagged jets. In order to suppress the
background from fake leptons in the two signal regions, we also require exactly one tight lep-ton and the criteria on the missing transverse energyETmissand the transverse mass mWT ; where mWT =
q
2pleptonT ETmiss(1−cos ∆φ)with ∆φ, which is a difference in an azimuthal angle be-tween the lepton andETmiss. The selected events in the e+jets channel satisfy ETmiss > 30 GeV and mWT > 30 GeV. The selected events in the µ+jets channel satisfy ETmiss > 20 GeV and ETmiss+mWT >60GeV. The dilepton channel is separated into the same flavour channel,e−e+ andµ−µ+, and the different flavour channel,eµ. All regions require exactly two opposite-sign charge (OS) leptons and exactly two jets. The event selection for the same flavour channel re-quires the cuts on the dilepton invariant massmll,mll >15GeV and|mll−mZ|>10GeV. In addition to this,ETmiss >60GeV is applied. The event selection for the different flavour channel requires the cut on the scalar sum of the transverse energy of leptons and jetsHT,HT>130GeV.
The signal region in each channel is classified into the pretag region and at least oneb-tagged jet region.
In the control region, we measure the efficienciesεrandεf. Theεf is derived from the control regions in thee+jets and theµ+jets channels. The control region for the fake efficiency is denoted as CRf. Theεris derived from the control regions in thee−e+and theµ−µ+channels. In theεr measurement, we utilize the tag-and-probe method, which produces an unbiased sample of loose leptons from particle decays (probe leptons) by applying the tight selection requirement on the other leptons from the decays (tag leptons), by usingZ → e−e+ andZ → µ−µ+ events. We determine the efficiencyεr by applying the tight selection to the probe leptons. For each pair, we require that the tag and the probe leptons have opposite-sign charges and the dilepton invariant mass is80< mll<100GeV. In theεf measurement, we require selections to make a sample with many fake leptons. The event in the CRf has only one loose lepton and at least one jet. For the e+jets channel, we also require thatmWT <20GeV andETmiss+mWT <60GeV. For theµ+jets channel, we also require that|dsig0 |>5wheredsig0 is the muon impact parameter significance and calculated bydsig0 =d0/p
err(d0). The fake efficiency is derived from the ratio of the number of events with tight leptons and the number of events with loose leptons in the CRf.
Both efficienciesεrandεf are measured as functions of different variables such as the lepton pT, lepton |η|, the angular distance between the lepton and the closest jet min∆R(l, jet), the number ofb-tagged jets, the trigger options. Figures 5.5 and 5.6 show the real efficiency εr and the fake efficiencyεf in thee+jets channel as functions of the different variables. Figures 5.7 and 5.8 show the real efficiencyεrand the fake efficiencyεf in theµ+jets channel as functions of the different variables.
Figure 5.5: Real efficiency εr and fake efficiency εf in the e+jets channel as functions of the different variables and the trigger options. The variables are electron cluster eta |η|e, electron transverse energy peT, and the minimum ∆R between electron and jets. e60 indicates high pT trigger, e24vh indicates lowpT trigger without the isolation cut, e24vhi indicates lowpT trigger with the isolation cut.
Figure 5.6: Real efficiency εr and fake efficiency εf in the e+jets channel as functions of the different variables and the trigger options. The variables are leading jetpTpleading jet
T , the number of jetsnjet, the number ofb-tagged jetsnb-jet, and the angle in the transverse plane between the electron and the MET∆φ(e, ETmiss). e60 indicates highpTtrigger, e24vh indicates lowpTtrigger without the isolation cut, e24vhi indicates lowpTtrigger with the isolation cut.
Figure 5.7: Real efficiency εr and fake efficiency εf in theµ+jets channel as functions of the different variables and the trigger options. The variables are muon eta |η|µ, muon transverse momentumpµT, and the minimum∆Rbetween muon and jets. mu36 indicates highpT trigger, mu24 indicates lowpT trigger without the isolation cut, mu24i indicates lowpT trigger with the isolation cut.
Figure 5.8: Real efficiency εr and fake efficiency εf in theµ+jets channel as functions of the different variables and the trigger options. The variables are leading jetpTpleading jet
T , the number of jetsnjet, the number ofb-tagged jetsnb-jet, and the angle in the transverse plane between the muon and the MET∆φ(µ, ETmiss). mu36 indicates highpTtrigger, mu24 indicates lowpTtrigger without the isolation cut, mu24i indicates lowpTtrigger with the isolation cut.
The efficiencies expressed by functions of different variables are used for the weight calculations in Equation (5.8). There is correlation between the variables used for the efficiency measurements.
Therefore, the efficiency is expressed as a function of the different combinations of the variables:
εk(x1, ..., xN;y1, ..., yM) = 1
εk(x1, ..., xN)M−1 · YM
j=1
εk(x1, ..., xN;yj). (5.12)
whereεk(x1, ..., xN)represents the efficiency measured as a function of all thexvariables, and εk(x1, ..., xN;yj) represents the efficiency measured as a function of all thex variables and of the variableyj. The variablesx are typically discrete variables and the variablesy are typically continuous variables. In Equation (5.12), we assume that there is no correlations between the variablesy.
The main source of systematic uncertainties on the fake lepton background estimation comes from the measurement of the real efficiency, the use of MC simulation to correct the efficiency measurements, the different background composition in the signal regions, the treatment of the dependence of the efficiencies on lepton and event properties.