Conclusion - A Study on Interaction between Human and Digital Content

Clip name Person Face False Face

appearing detected detected detection frames frames frames rate [%]

Let it be / The Beatles 7073 3610 1 99.97

Hey Jude / The Beatles 7119 3871 1 99.97

Get Back / The Beatles 4957 935 1 99.89

Two of us / The Beatles 6136 1717 0 100.00

Sum of 4 clips of 25285 10133 3 99.97

The Beatles

Can You Keep A Secret? 4208 1486 11 97.27

/ Utada

Wait & See / Utada 3974 1446 39 97.37

For You / Utada 5447 2404 98 96.08

Final Distance / Utada 3905 553 6 98.93

Sum of 4 clips of Utada 17527 5889 154 97.45

Table 8.3: Comparison of face identification rate (IDR).

Clip Face detected Identified Identified IDR with IDR frames after frames with frames each with

tracking each frame with FTC frame FTC

4 clips of 10136 2871 6489 28.3 64.0

The Beatles

4 clips of 6043 1960 5778 32.4 95.6

Utada

8.5 Conclusion

We have described a face annotation method that automatically identifies the person’s name in video clips. To improve the robustness, the method com-bines face recognition, mouth aperture detection, and vocal activity detection.

In our implementation, face images that are manually prepared in advance are necessary to identify the face of a person. In the future, we aim to support face images whose name labels as well as images are extracted from the web.

Although it is diﬃcult to identify a person from the back view in general, we plan to annotate scenes of the back view of a person by improving the face tracking method of extending the facial-temporal continuum. Our method will also be extended to other video objects.

Chapter 9 FaceXplorer: A Video Browsing Interface based on Face Recognition

This chapter describes FaceXplorer, an interface that enables a user to browse video clips by retrieving and playing back scenes including faces of interest. Given the huge amount of video clips on the Web, it is diﬃcult to find those in which a particular person such as a favorite actor and artist appears. Although text-based video retrieval can find video clips whose tex-tual descriptions include the name of a person of interest, the face of the person might not appear in those video clips, and it is still diﬃcult to find and watch scenes including his or her face. FaceXplorer therefore analyzes video clips by using face recognition to find scenes including faces and visualizes those scenes for interactive browsing. In addition to typical text-based title retrieval, a user can give a query with facial parameters, such as brightness of face and degree of smile, to retrieve corresponding faces. To achieve this, we propose a method that detects the face of the same person in each video clip and extracts its facial parameters. Scenes with these detected faces are then visualized to enable the user to playback the scenes. When the user selects a face of interest, FaceXplorer visualizes its scenes along the timeline. This helps to keep watching the face of a favorite person.

The face recognition method for the FaceXplorer is based on the method in Chapter 8.

9.1. INTRODUCTION

9.1 Introduction

Since no one can watch all of ever-increasing video clips on the Web, it is indispensable to provide an appropriate user interface that can find video clips of interest to a user. The goal of this research is to develop an interface that enables a user to find scenes of interest by leveraging the appearance of favorite actors or artists for video retrieval. In fact, some people watch movies or TV programs simply because of the appearance of their favorite actors or artists. Information about who is appearing in a video clip is thus important.

Most video sharing services on the Web provide a text-based video re-trieval interface. In this case, a user query is matched with a video title, tags, and descriptions of video clips. The descriptions sometimes include the name of main actors or artists, but usually do not include the name of actors or artists who are not famous. Detailed descriptions of facial features are also rarely included and cannot be used for retrieval. Furthermore, text-based retrieval does not focus on temporal aspects of clips, such as several diﬀerent scenes of the same person appearing in a clip. To overcome this problem, an approach that analyzes time-synchronous user comments on a video clip is reported [92]. Although this approach assumes a scene containing a character will receive many comments pertaining to them, the scene does not always contain a suﬃcient number of comments. There exists an interface based on video scene annotation from social activities on the Web as well [102]. This interface enables a user to retrieve video scenes based on tags. However, tags do not always reflect content because a word has limitation in expressing what we see. It is still diﬃcult to retrieve a clip that cannot easily be described by words.

A retrieval system based on video analysis can compensate for these limi-tations of text-based retrieval. If computers can analyze video clips and show only key parts, it becomes possible to watch only key parts in which the user is interested. Though the name of a person is representative textual information, what we actually see in a video clip is how the person appears, such as their facial impression or expression. Face is one of the most representative parts of human which includes high dimensional information about the person. Not only we can identify a person from the face, but we can also expect informa-tion such as ones emoinforma-tion, age, etc. Sometimes facial informainforma-tion plays more important role than the person name. For example, when watching a video collection of specific person, e.g. family video, the name is not important than how they look or how old they are.

For these reasons, we propose FaceXplorer (Figure 9.1), an interface that enables a user to browse video clips by retrieving and playing back scenes including faces of interest by using a face recognition approach. We also propose a face recognition method for video clips. FaceXplorer consists of two

Playback Retrieval

+ facial parameters Display faces that match the requirements

Selected face is visualized on a timeline

Set facial parameters to retrieve a face of interest

Visualizes when faces appear A list of faces that appear in the video clip (only text information)

keyword keyword

Figure 9.1: FaceXplorer.

components. One component handles facial appearance-based video retrieval (Figure 9.1 Retrieval), and the other handles playback of the retrieved video clip (Figure 9.1 Playback). Specifically, by showing a list of faces extracted from a collection of video clips by video analysis, the retrieval interface enables the user to search a clip from a selection of faces. After choosing the clip, the playback component visualizes scenes where the selected face appears and enables the user to enjoy watching only those scenes that contain the face.

FaceXplorer not only makes it possible for the user to enjoy the scenes in which their favorite person appears but also enables detailed scene retrieval based on facial appearance. It enables the retrieval of video scenes such as those in which the person is smiling, which is diﬃcult to retrieve using text-based video retrieval. This is essential information retrieval approach because the user does not need to think about a word or text information to retrieve a video clip. The user can intuitively retrieve video clips based on facial appearance.

ドキュメント内 A Study on Interaction between Human and Digital Content (ページ 123-128)