Section 3.4), Q is the configuration index and r is the individual identity index
3.4 Fixed Sample Number per Subject Constraint Relax-
Relax-ation
3.4.1 Rigidity due to Fixed Sample Number per Subject
As discussed for the joint probabilistic method, it is assumed that the number of images per subjects, m, is the same given template Q. This is hardly the case and a troublesome constraint to oblige in practical sense. It is much more convenient if domains can easily be constructed from individuals or situations without such imposition, and that it can easily be assembled according to the task at hand. Thus, an alternative is to standardize m to increase robustness.
For explanation purpose, it is assumed that there is only one subject (N = 1) with re sample images. Specific for this sub-section, let us consider this as sample images to build template Q. To standardize m sample images, (3.16) and (3.19) can be re-written as:
X = i Xi(3.27)
M, i=1
37
CHAPTER 3. JOINT PROBABILISTIC OPEN SET FACE IDENTIFICATION WITH
TRANSFER LEARNING
m X =--->2
m~,Xi„T(3.28)
m i=1
This can be seen as artificially increasing or decreasing the number of sample images to a constant m (where calculation of G and A will be based on). Experimental result shows that by artificially increasing the number of sample images, the accuracy will increase while variance drops [129], thus, giving a stable performance as shown in experiments in Section
3.7.2.
Every domain template therefore can be represented by Cf2= mQ,C2=TTEm1 xi andC3=m~E2lxi(xi)T, which overall will be denoted as C.C2andC3are needed
to obtain the covariance, whereas Cl is crucial for incremental learning when one needs to manipulate C2 and C. . Intuitively, CQ belongs to an identity. With standardization in (3.27) and (3.28), CQ from various identities can be merged through (3.16)-(3.19).
The size of the template is 0.5d2 + 1.5d + 1, where d is the dimension of the image representation. Large number of training samples will not affect the computational load and speed. Apart from that, with CQ, template can be easily generated and merged for transfer learning from different domains, depending on the application.
3.4.2 Number per Subject Variation Induced Distortion
In this sub-section, analysis is done of the effects caused by using arbitrary number of images per subject, a, compared to the actual, m. It is assumed that the 0 mean condition is met. Lets denote:
E =
m2X(3.29)
If a is the new parameter for number of images per class (instead pf m), then Eq. 3.23, ignoring coefficients, can be re-written as:
a
2
a2HaEHa= a2Ha(-)HT=HaXHa(3.30)
mm
where Ha means a is the parameter that replaces m. Now, we need to find out the difference as:
2
A2 =a2HaX HXT — HmHm(3.31)
m
Expanding Eq. 3.31, we obtain:
1
A = [(a — m)I — (a2Sµ(aSi,, + SE)-' — m2Sµ(mSµ + S5)-1)] SlIS6 (3.32)
38
CHAPTER 3. JOINT PROBABILISTIC OPEN SET FACE IDENTIFICATION WITH
TRANSFER LEARNING
It can be observed that distortion on Sp, is introduced by SE as parameter m varies. But the larger the m and a, the less effect S5 has on S. Large rn, meaning more images per class, is crucial for overall reduction of the difference. More analysis needs to be performed to obtain features that will induce more optimal Si, and SE.
For S5, Eq. 3.26, ignoring coefficients and X (since m doesn't have effect on X), can be
re-written as: 223
m2XGa+m2GaXT+m0 GaXGa(3.33)
The difference can be obtained as
First component (Similar to second component transposed) of Eq. 3.33:
A =S2(m2(mSµ + S5)-1 — a2(aSµ+SE)-1)S,1S; 1(3.34)
Third component of Eq. 3.33:
A =~S~(m(mSµ+SEW — a(aSµ+S,)-1)S1,571(3.35)
M,
As can be observed from Eq. 3.34 and 3.35, larger m can reduce distortion due to more sample images. At the same time, larger a can also reduce the effects of S5. Therefore, given sufficient samples (means sufficient m), and a very large a, the difference will approach 0.
3.4.3 Modifications for Transfer Learning
For transfer learning to occur, in M-step as shown by [27], target domain with scarce samples can be merged directly with the information rich source domain to obtain new covariance shown in (3.36) and (3.37) given that and are from target domain samples:
T,1=wSµ+(1—w)n-1(3.36)
T5=w55+(1—w)k-1E(3.37)
w E (0, 1) acts as a weight to determine how much the source domain and target domain
will impose its effect on the final covariance, and k and n are number of samples for thefactors. Source domain provides the appropriate base line for the target domain to work on.
With transfer learning, this method achieves state-of-the-art accuracy. It has been shown that the transfer learning method gives a similar performance for a large-scale database (20 times larger) trained system, meaning that it is an extremely efficient method.
39
CHAPTER 3. JOINT PROBABILISTIC OPEN SET FACE IDENTIFICATION WITH
TRANSFER LEARNING
The joint probabilistic method as described is rigid due to the fact that it does not directly cater for arbitrary number of training samples per subject during template training. This is important because if the number of samples does not oblige to a single m, one may need to perform the E-step separately for the different m. Apart from that, the way the samples are trained is not efficient as every samples need to be stored and looped through during training. This problem is worsened if continual accumulation of samples occurs.
Fortunately, the joint probabilistic method is reformulated as described in Section 3.3.3 to enable more efficient training and storage, yet easy to store, categorize and merge be-tween domains. E-step and M-step as described can be combined.
With the new representation for database (as configurations Sl) and the conditions to apply different images per class m, transfer learning can be implemented. Let's denote:
Ex=
X
(Nm)2
(3.38)Given configurations (Q = 1, 2...Q), where they have the same rn but different N, with Eq. 3.29 and 3.38, the combination of the two configurations (similar to traning the two sets of database) can be performed as:
Xnew
= E(NC2=im)2EX
Qc2=2
i=1
(3.39)
Q
Xnew = Em2E~—Z(3.40)
i=1
Q
N—EN~=2(3.41)
new —
i=1
For the case where m is different for different SZ, there are two alternatives, which are 1) apply different H and K (since they are parameterized by m) respectively of the config-urations, and, 2) H and K have the same m, but X and X are re-scaled to a standard m with Eq. 3.29 and 3.38.
For alternative 1, the method is similar to that of Eq. 3.13 to 3.15, but with additional process of constructing the different H and K matrices. For alternative 2, there will be
distortions due to difference in E(D oc E1 — E2) of the same class given different m of
each X. To obtain the new configurations, the method is the same as Eq. 3.39 to 3.41, but with m according to its repective database. If m is large, according to the law of large
numbers, the distortion will be negligible.
40
CHAPTER 3. JOINT PROBABILISTIC OPEN SET FACE IDENTIFICATION WITH
TRANSFER LEARNING
3.5
Domain Mergence
As shown in Figure 3.5, prototype images (images used to compare with the input) are stored according to identities. Across the identities, they can be separated by domains (For example, what type of medium the image is taken, the type of lighting, what room they are taken etc.). Transfer learning is applied, where one applies the information from the target domain (often times, scarce in sample image) of other identities with the source domain (built from large amount of non-specific sample images) to facilitate recognition.
From the figure, target domain A molds the source domain B (which provides a baseline) to further suit it for recognition for a human subject currently captured under the condition of domain A. Algorithm 1 and 2 show the implementation of domain specific recognition.
Cl is referring to one subject with CP number of sample images, meaning domain refers to
a specific identity. Whether the identity should be unique or not, and also its effects, are not studied in this work.
For domain merging shown in Figure 3.5, input consists of the templates for the domains needed to be merged, and the output is a new template that will be used for subsequent training for Sp, and S5 (as shown in Algorithm 2. To increase the effects of certain domain, we multiply the data using w f. By default, it is set to 1. This parameter is used when we want to multiply the data samples of the target domain. From experiments shown in Section
3.7.3.2, it is shown to be able to increase accuracy.
Sµ and SS are used to obtain A and G via Algorithm 1, which is termed the template that is to be used for face recognition. Both the matrix A and G are applied to (3.5) to compare prototype images with input image. The class where the prototype image achieves the highest value is the identity of the input. To achieve an open-set recognition, one may evaluate the maximum value of the winner, where it is considered unknown if the value is below certain threshold.