物体識別評価実験

5.4.1 屋外環境データにおける評価実験

■ 実験データ

構造情報に基づく特徴量の評価実験を行う．識別クラスとして，人(SH)，人複数(HG)，

二輪車(BK)，自動車(VH)の4クラスを対象とする．評価用データとして，3年間分の映

像データから30分からなる映像シーンを選び，計23時間の映像データベースを作成し，

その中から各識別クラス200パターン，計800パターンを用いた．作成するコードブックのサイズは32，当てはめる正規分布数は4とする．図5.11に実験に使用した画像例を示す．

図 5.11: 実験データ例

■ 実験結果

統合率αを変化させたときの識別結果を表 5.2に示す．大域的特徴量と構造に基づく特徴量を統合することで，3.2%識別率を向上させることができた．表5.3に，α = 0.1での識別結果をコンフュージョンマトリクスに示す．構造情報を加えることで大域情報のみ (α = 0)を使用した際に人を二輪車と誤識別したパターンを減少させることができた．図 5.12に人と二輪車の領域全体と各領域ごとに得られたベクトル量子化ヒストグラムを示す．人と二輪車の大域的特徴量は非常に似ていることがわかる．従って，大域的に捉えた特徴の記述では，誤識別が起こる可能性が高い．一方，構造に基づく特徴量は，図5.12の (a)(b)と(e)(f)のように上半身の特徴量は似ているが，(c)(d)と(g)(h)のように下半身部分の特徴量が異なっていることがわかる．そのため，部分的に物体の見え方が似ている人と二輪車の誤識別を抑制することができたと考えられる．図5.13に，構造に基づく特徴量を加えたことで正解した例を示す．

表 5.2: 識別結果 [%]

0.0 0.1 0.3 0.5 0.7 0.9 1.0 SH 75.6 80.8 77.5 74.7 71.4 70.9 71.8 HG 80.4 87.1 85.7 85.7 85.7 85.2 85.2 Class BK 86.3 87.7 86.3 87.2 86.3 85.3 85.8 VH 97.3 96.8 95.9 95.9 96.4 96.4 96.4 合計 85.0 88.2 86.4 85.9 85.0 84.5 84.9

表 5.3: 識別結果(α= 0.1) out

SH HG BK VH correct rate[%]

SH 172 24 16 1 172 80.8

HG 9 182 16 2 182 87.1

in BK 15 10 185 1 185 87.7

VH 7 0 0 212 212 96.8

合計 751 88.2

5.4.2 一般物体認識における評価実験

■ 実験データ

Caltech database ¹ を用いた識別実験を行う．図5.14に，実験に用いるデータ例を示す．この実験では，以下に示す3つの手法の比較を行う．

bok1 [3] Bag of keypoints．対象画像中からSIFT特徴量を抽出し，ベクトル量子化ヒストグラムによりその物体の特徴を記述する手法．背景の変化に頑健で，物体の構造情報を必要としない．

bok2 [35] 対象領域をグリッドに分割し，それぞれの領域からベクトル量子化ヒストグラムを作成．

proposed method GMMを用いて領域分割し，それぞれの領域からベクトル量子化ヒストグラムを作成．

1http://www.vision.caltech.edu/Image Datasets/Caltech256/

■ 実験結果

図5.15に，各手法での実験結果を示す．図5.15より，提案手法は，bok1よりも約17.6%

精度を向上させることができた．しかし，bok2 と比較した場合では，提案手法は約5.6%

識別率が低い．一方，入力画像を45度回転させて入力した場合，提案手法，bok1では識別率に変化はないが，bok2では回転した画像をグリッドに分割するため識別率が低下する．図5.16 に，回転画像に対する，GMMを用いた領域分割結果を示す．図5.16より，

GMMを用いた領域分割は，回転した画像でも領域分割結果が変化しない．そのため，本手法は，回転に対して不変な特徴量を抽出することが可能となる．しかし，bok2では，画像が回転することにより，構造情報が変化する．そのため，bok2では正しく認識することができない．

5.5 _まとめ

画像セグメンテーションの応用例として，領域分割に基づく物体識別手法を提案した．

本手法は，得られた物体領域を混合正規分布を用いて領域分割し，各領域からSIFT特徴量に基づくベクトル量子化ヒストグラムを抽出する．このベクトル量子化ヒストグラムを各領域の特徴量とし，それらをグラフにより表現する．これらをグラフマッチングによりマッチングコストを計算する．屋外環境データを用いた評価実験より，領域全体から得られる大域的な特徴量と，各領域から得られる局所的な特徴量を用いることにより，大域的な特徴量のみの場合より3.2%識別性能を向上させることができた．特に，人と二輪車のように局所的な情報のみが異なる場合，大域的な特徴量ではそれらの情報が埋もれてしまうが，局所的な情報を入れることにより識別率を向上させることができる．また，一般物体認識において，従来法である Bag of keypoints と比較し，約17.6% 識別性能を向上させることができた．

図 5.12: 人と二輪車の特徴量の違い 38

図 5.13: 正解データ例

図 5.14: 実験データ例

図 5.15: 実験結果

図 5.16: 回転画像に対する領域分割結果

第 6 _章

むすび

本論文では，画像セグメンテーション手法として，平滑化の繰り返し処理によるグラフカットを用いた画像セグメンテーション手法について述べ，セグメンテーション結果の応用例として，領域分割に基づく物体識別手法について述べた．各章ごとのまとめは以下の通りである．

3章では，平滑化の繰り返し処理による，大域的なセグメンテーションから段階的に局所的なセグメンテーションを行う手法を提案した．提案手法では，複雑なエッジを含む画像に対しても安定してセグメンテーションを行えることを確認した．評価実験より，約

4.79%セグメンテーション精度を向上することができた．また，グラフカットのパラメー

タλに対して，安定したセグメンテーションが可能であることを実験により確認した．

4章では，動画像に対して，スーパーピクセルをMean Shift Segmentation により作成し，そのときのバンド幅を変化させることで，大域的なセグメンテーションから段階的に局所的なセグメンテーションを行う手法を提案した．評価実験により，スーパーピクセルを用いたのみの場合と比較し，バンド幅を変化させて繰り返し処理をすることにより，約

4.23% セグメンテーション精度を向上させることができた．

5章では，画像セグメンテーションの応用例として，領域分割に基づく物体識別手法を提案した．屋外環境データを用いた評価実験より，領域全体から得られる大域的な特徴量と，各領域から得られる局所的な特徴量を用いることにより，大域的な特徴量のみの場合より3.2%識別性能を向上させることができた．特に，人と二輪車のように局所的な情報のみが異なる場合，大域的な特徴量ではそれらの情報が埋もれてしまうが，局所的な情報を入れることに識別率を向上させることができる．また，一般物体認識において，従来法である Bag of keypoints より，約17.6% 識別性能を向上させることができた．

今後の課題として，動画像セグメンテーションでのスーパーピクセル作成時の Mean

Shift Segmentation の精度向上が挙げられる．また，画像セグメンテーション時の途中経

過から物体の構造情報を記述し，そこから物体認識を行う手法の検討を行う予定である．

謝辞

本研究を行うにあたり，指導教授として終始懇切なご指導を頂きました中部大学藤吉弘亘准教授に謹んで深謝します．

また，終始懇切なご指導を頂きました同学岩堀祐之教授，平田豊教授に謹んで深謝します．

また，本研究を進めるにあたり，カーネギーメロン大学金出武雄教授と株式会社日立製作所数井誠人氏に心から厚く御礼申し上げます．

最後に，本研究で用いたプログラムの開発や研究の相談など協力して頂いた藤吉研究室の皆様に感謝致します．

参考文献

[1] Y. Li, J. Sun, C.-K. Tang and H.-Y. Shum: “Lazy snapping”, ACM Trans. Graph., 23, 3, pp. 303–308 (2004).

[2] C. Rother, V. Kolmogorov and A. Blake: ““grabcut”: interactive foreground extrac-tion using iterated graph cuts”, ACM Trans. Graph., 23, 3, pp. 309–314 (2004).

[3] C. Dance, J. Willamowski, L. Fan, C. Bray and G. Csurka: “Visual categorization with bags of keypoints”, ECCV International Workshop on Statistical Learning in Computer Vision (2004).

[4] D. Comaniciu and P. Meer: “Mean shift: A robust approach toward feature space analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 5, pp. 603–619 (2002).

[5] F.-F. Li and P. Perona: “A bayesian hierarchical model for learning natural scene categories”, CVPR ’05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 2, Washington, DC, USA, IEEE Computer Society, pp. 524–531 (2005).

[6] A. W. Michael Kass and D. Terzopoulos: “Snakes: Active contour models”, Int. J.

Computer Vision, 1, 4, pp. 321–331 (1988).

[7] M. Sussman, P. Smereka and S. Osher: “A level set approach for computing solutions to incompressible two-phase ﬂow”, J. Comput. Phys., 114, 1, pp. 146–159 (1994).

[8] Y. Boykov and V. Kolmogorov: “An experimental comparison of min-cut/max-ﬂow algorithms for energy minimization in vision”, IEEE Transactions on Pattern Anal-ysis and Machine Intelligence,26, 9, pp. 1124–1137 (2004).

[9] Y. Boykov and M.-P. Jolly: “Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images”, ICCV2001,01, p. 105 (2001).

[10] Y. Boykov and G. Funka-Lea: “Graph cuts and eﬃcient n-d image segmentation”, Int. J. Comput. Vision, 70, 2, pp. 109–131 (2006).

[11] 石川博：“グラフカット(チュートリアル)”,情報処理学会研究報告. CVIM (コンピュータビジョンとイメージメディア), 31, pp. 193–204 (2007).

[12] D. Greig, B. Porteous and A. Seheult: “Exact maximum a posteriori estimation for binary images”, J. Royal Statistical Soc., Series B, 51, 2, pp. 271–279 (1989).

[13] Y. Boykov, O. Veksler and R. Zabih: “Fast approximate energy minimization via graph cuts”, Proc. IEEE Trans. Pattern Analysis and Machine Intelligence, 23, 11, pp. 1222–123 (2001).

[14] Y. Boykov, O. Veksler and R. Zabih: “Markov random ﬁelds with eﬃcient approxi-mations”, Technical Report TR97-1658 (1997).

[15] H. Ishikawa and D. Geiger: “Segmentation by grouping junctions”, CVPR ’98: Pro-ceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, IEEE Computer Society, p. 125 (1998).

[16] H. Ishikawa and D. Geiger: “Occlusions, discontinuities, and epipolar lines in stereo”, ECCV ’98: Proceedings of the 5th European Conference on Computer Vision-Volume I, London, UK, Springer-Verlag, pp. 232–248 (1998).

[17] J. Kim, V. Kolmogorov and R. Zabih: “Visual correspondence using energy mini-mization and mutual information”, ICCV ’03: Proceedings of the Ninth IEEE Inter-national Conference on Computer Vision, Washington, DC, USA, IEEE Computer Society, p. 1033 (2003).

[18] V. Kolmogorov and R. Zabih: “Computing visual correspondence with occlusions via graph cuts”, ICCV, pp. 508–515 (2001).

[19] M. H. Lin and C. Tomasi: “Surfaces with occlusions from layered stereo”, cvpr, 01, p. 710 (2003).

[20] Y. Boykov and G. Funka-Lea: “Graph cuts and eﬃcient n-d image segmentation”, Int. J. Comput. Vision, 70, 2, pp. 109–131 (2006).

[21] Y. Boykov and V. Kolmogorov: “Computing geodesics and minimal surfaces via graph cuts”, ICCV ’03: Proceedings of the Ninth IEEE International Conference on Computer Vision, Washington, DC, USA, IEEE Computer Society, p. 26 (2003).

[22] V. Kolmogorov and R. Zabih: “Multi-camera scene reconstruction via graph cuts”, ECCV ’02: Proceedings of the 7th European Conference on Computer Vision-Part III, London, UK, Springer-Verlag, pp. 82–96 (2002).

[23] L. Ford and D. Fulkerson: “Flow in Networks” (1962).

[24] A. V. Goldberg and R. E. Tarjan: “A new approach to the maximum ﬂow problem”, STOC ’86: Proceedings of the eighteenth annual ACM symposium on Theory of computing, New York, NY, USA, ACM, pp. 136–146 (1986).

[25] C. Stauﬀer and W. E. L. Grimson: “Adaptive background mixture models for real-time tracking”, Proceedings of the IEEE Computer Science Conference on Com-puter Vision and Pattern Recognition (CVPR-99), Los Alamitos, IEEE, pp. 246–252 (1999).

[26] A. P. Dempster, N. M. Laird and D. B. Rubin: “Maximum likelihood from incom-plete data via the em algorithm”, Journal of the Royal Statistical Society. Series B (Methodological),39, 1, pp. 1–38 (1977).

[27] P. Quelhas, F. Monay, J.-M. Odobez, D. Gatica-Perez, T. Tuytelaars and L. V. Gool:

“Modeling scenes with local descriptors and latent aspects”, ICCV ’05: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, Washington, DC, USA, IEEE Computer Society, pp. 883–890 (2005).

[28] D. G. Lowe: “Distinctive image features from scale-invariant keypoints”, Int. J.

Comput. Vision, 60, 2, pp. 91–110 (2004).

[29] K. Mikolajczyk and C. Schmid: “An aﬃne invariant interest point detector”, ECCV (1), pp. 128–142 (2002).

[30] T. Hofmann: “Unsupervised learning by probabilistic latent semantic analysis”, Mach. Learn.,42, 1/2, pp. 177–196 (2001).

[31] J. Sivic and A. Zisserman: “Video Google: A text retrieval approach to object matching in videos”, Proceedings of the International Conference on Computer Vi-sion, Vol. 2, pp. 1470–1477 (2003).

[32] M. Seki, K. Sumi, H. Taniguchi and M. Hashimoto: “Gaussian mixture model for object recognition”, MIRU2004,1, pp. 344–349 (2004).

[33] H. Nami, S. Makito, O. Haruhisa and H. Manabu: “Vehicle detection using gaussian mixture model from ir image”, Technical report of IEICE. PRMU,105, 62, pp. 37–42 (2005).

[34] N. Ueda and R. Nakano: “Deterministic annealing em algorithm”, Neural Netw.,11, 2, pp. 271–282 (1998).

[35] S. Lazebnik, C. Schmid and J. Ponce: “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories”, CVPR ’06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recogni-tion, Washington, DC, USA, IEEE Computer Society, pp. 2169–2178 (2006).

[36] J. J. Koenderink: “The structure of images”, Proc. of Biological Cybernetics, 50, pp. 363–370 (1984).

[37] T. Lindeberg: “Scale-space theory: A basic tool for analysing structures at diﬀerent scales”, J. of Applied Statistics,21(2), pp. 224–270 (1994).

[38] D. G. Lowe: “Object recognition from local scale-invariant features”, Proc. of the International Conference on Computer Vision ICCV, Corfu, pp. 1150–1157 (1999).

研究業績一覧

学会誌論文

[1] 永橋知行,藤吉弘亘, 金出武雄. “平滑化処理の繰り返しによるグラフカットを用いた画像セグメンテーション”, 情報処理学会論文誌コンピュータビジョンとイメージメディア, Vol 22, 2008(掲載予定)

国際会議論文

[1] S. Shimizu, T. Nagahashi, and H. Fujiyoshi. “Robust and Accurate Detection of Object Orientation and ID without Color Segmentation”, Proc. on ROBOCUP2005 SYMPOSIUM, 2005.

[2] Tomoyuki Nagahashi, Hironobu Fujiyoshi, and Takeo Kanade. “Object Type Classiﬁ-cation Using Structure-based Feature Representation”, MVA2007：IAPR Conference on Machine Vision Applications, pp. 142-145, May, 2007.

[3] Tomoyuki Nagahashi, Hironobu Fujiyoshi, and Takeo Kanade. “Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing”, Asian Conference on Computer Vision 2007, Part II, LNCS 4844, pp. 806-816, 2007.

研究会技術報告

[1] 永橋知行, 清水彰一, 藤吉弘亘. “照明変動に頑健なID認識とロボットの姿勢検出”, 第21回SIG-Challenge研究会, pp. 32-37, May, 2005.

[2] 永橋知行,藤吉弘亘, 金出武雄. “構造に基づく特徴量を用いたグラフマッチングによる物体識別”,情報処理学会研究報告 CVIM 154, pp. 69-74, 2006.

[3] 永橋知行,藤吉弘亘, 金出武雄. “領域分割に基づくSIFT 特徴を用いた物体識別”,電気学会システム・制御研究会 SC-07-8, pp39-44, Jan. 2007.

[4] 永橋知行,藤吉弘亘,金出武雄. “平滑化処理の繰り返しによる画像セグメンテーションのためのグラフカット”,第10回画像の認識・理解シンポジウム(MIRU2007), pp.

241-248, Jul, 2007.

学会口頭発表

[1] 永橋知行, 藤吉弘亘,金出武雄. “構造情報と大域的な特徴量を用いた物体識別”,電気関係学会東海支部連合大会, O-455, Sep, 2006.

受賞

[1] MIRU2007 学生賞

[2] 2005年度ロボカップ研究賞

付録 A

ドキュメント内 thesis.dvi (ページ 45-61)