• 検索結果がありません。

Interaction recognition result

ドキュメント内 局所特徴を用いた動画像中の人間動作認識 (ページ 90-94)

Contribution estimation for human interaction recognition

5.4 Experiment

5.4.2 Interaction recognition result

According to the contribution estimation results, we determine the major participants and utilize their action information for interaction recognition.

Following the experiment procedure in section 5.3.4, the proposed algorithm is tested on UT and LIMU datasets. Since BoW is employed to integrate local features, a proper vocabulary size should be determined. When the descriptor HoG/HoF is used to represent actions, the op-timal vocabulary sizes are set to 570 for UT dataset and 2500 for LIMU dataset experimentally.

For the experiments using CHOG3D, 900 and 1300 are two proper vocabulary size for UT and LIMU datasets. Average recognition accuracy is calculated for each dataset to compare with the state-of-art algorithms. Since we adopt the local feature HoG/HoF for action representation and SVM for recognition in our proposed method, we compare the proposed method with the method “HoG/HoF + SVM” which use two participants’ actions as one action category

with-out considering contribution estimation. Through comparison, we can find how contribution estimation affects the final interaction recognition results. Table 5.2 shows the comparison of recognition results between the proposed method and the state-of-art algorithms. Since the fea-ture points are sparse, CHOG3D performs not very well. The recognition results of it are worse compared with that of HoG/HoF, but still better than previous algorithms. Compared with these state-of-art algorithms, the proposed method using HoG/HoF outperforms others. The results are discussed in detail in the following.

Many algorithms were tested on the set one of UT dataset. We compare our method (Propose-HoG/HoF) with the methods which combine the local features and relationships of features for interaction recognition [18]. Our algorithm performs near 20% better than theirs.

Compared with the methods which represent the actions of two participants as one category by local features, i.e., Laptev + SVM [71] , Cuboid + SVM [71], and HoG/HoF + SVM, the results shows that the contribution estimation improves recognition accuracy 5% or more.

Waltisberg [45] recognized single actions and combined them for interaction recognition. Their method obtained an accuracy of 88% in set one, which was also the best result in the previous works. The proposed method performs 2% higher than theirs. The comparison indicates that the processing of contribution estimation is helpful to correctly recognize interactions.

There were a few algorithms operated on the set two of UT dataset. Most of algorithms performed much worse in set two than in set one. It is because the environment of set two is much complex comparing with set one, which involves camera jitter and background move-ment. In addition, the background in set two is filled with grasses and trees, which affects the extraction of efficient feature points. Even though, the proposed algorithm (Propose-HoG/HoF) overcomes the disadvantage, obtaining a much better result. Compared with the existing algo-rithms in table 5.2, the proposed method obtains the recognition accuracy at least 13% higher than them. For LIMU dataset, we compare the proposed method with the method “HoG/HoF + SVM.” The recognition accuracy of method “HoG/HoF + SVM” is 82.2%. If we use the proposed method considering the contribution estimation, the recognition result is improved to 89%. Our proposed method improves recognition accuracy by 6.8%.

Table 5.2: Recognition accuracy comparison with state-of-art(%)

Algorithm UT#1 UT#2 LIMU

Ryoo et. al. [18] 70.8 -

-Laptev+SVM [71] 68 65

-Cuboid+SVM [71] 85 70

-Waltisberg et al. [45] 88 77

-HoG/HoF+SVM 85 77 82

Proposed-CHOG3D 87 85 80

Proposed-HoG/HoF 90 90 89

To indicate the efficiency of our method for “co-contribution” interactions and “single-contribution” interactions, we compare the average recognition accuracy on the two kinds of contribution interactions in table 5.3. The comparison is also performed on the set one, set two of UT dataset and LIMU dataset. Here, “CO” refers to co-contribution, and “Single” refers to single-contribution in the table.

In UT dataset, we compare the recognition accuracy of the two kinds of interactions cal-culated by the proposed method (Propose-HoG/HoF) with some previous methods. Com-pared with the methods which regard two participants’ actions as one action category, e.g.

Cuboid+SVM, HoG/HoF+SVM, our proposed method improves recognition of single-contribution greatly. Compared with the highest accuracy of 80% by Cuboid+SVM in set one, the proposed method improves more than 7%. Compared with HoG/HoF+SVM in set two, our method im-proves 22%. The methods which recognize interactions based on single action recognition, e.g.

Waltisberg et al. [45], performs better in single-contribution recognition than the methods re-garding two participants’ actions as one action category. An accuracy of 87% is obtained by method in paper [45] for single interactions in set one of UT, which is at the same level with

Table 5.3: Recognition result comparison of co-contribution and single-contribution(%) Algorithm

UT#1 UT#2 LIMU

CO Single CO Single CO Single

Laptev+SVM [71] 65 67 60 60 -

-Cuboid+SVM [71] 85 80 80 57 -

-Waltisberg et al. [45] 85 87 70 73 -

-HoG/HoF+SVM 90 77 80 73 90 76

Propose-CHOG3D 90 85 90 82.5 88 74

Propose-HoG/HoF 95 88 80 95 95 84

our method. However, our method performs much better than theirs in set two, 22% higher. For co-contribution interactions, our method also performs better comparing with previous methods in the two sets.

In LIMU dataset, we compare the proposed method (Propose-HoG/HoF) with HoG/HoF+SVM.

Our proposed method improves the recognition accuracy of the co-contribution and single-contribution by 5% and 8%, respectively. Comparing with UT dataset, we add one “co-single-contribution”

interaction with dissimilar actions, handover, in LIMU. The recognition accuracy of the cate-gory is 80% by “HoG/HoF+SVM,” while 90% by our proposed method. It shows our proposed method can solve the recognition of interactions with dissimilar actions more correctly.

In conclusion, our proposed method outperforms the previous methods, improving recogni-tion accuracy greatly. Comparing with “co-contriburecogni-tion” interacrecogni-tions, our proposed method is more efficient for the recognition of “single-contribution” interactions. Therefore, our method solves the problem which was ignored by previous algorithms, and improves the recognition accuracy for interaction recognition. In all the experiments, the CHOG3D with sparse feature points performs worse than HoG/HoF, but better than other researches.

ドキュメント内 局所特徴を用いた動画像中の人間動作認識 (ページ 90-94)

関連したドキュメント