今後の課題

SNRSNR

6.3 今後の課題

6.3.1 タスクに対する知識の利用

以上の議論を踏まえてここではロボットが獲得した知識を用いて，どのようにタスクに活用できるかについて考察したい．ロボットのタスクは，ロボット自身が置かれた環境との相互作用を通して遂行されると考えられる．当然のことながら，タスクをこなすためには適切な行動を選択しなければならず，これは実際にタスクを実行するために行動決定問題を考える必要があることを意味している．行動決定問題を自律的にロボットが実現するための枠組みとして，強化学習を用いるのが一般的である [73]．強化学習では，ロボットがある状態において行動を行うことで環境が変化し，それに対応した報酬が与えられる．この報酬は，ある状況においてロボットがとった行動に対する評価となる．この枠組では，報酬の最大化としてタスクにおける行動決定問題を解くことができる．この際重要なのは，

状態空間の決定，行動の設定，状態の認識，報酬の設定である．本論文で扱ったのは，状態空間の決定，行動の設定，状態の認識の問題である．図2.20に示したように，知覚情報がカテゴリ分類されることで状態空間が生成され，さらにはこれらと動きの情報との関係から行動セットも行動概念として自動的に獲得されることになる．さらには，知覚情報からこれらの状態や行動を推論することができるため，状態の認識も可能な枠組みとなっている．しかしこの枠組みには報酬が含まれていないため，これだけでは強化学習を実現することはできない．一般に報酬は，タスクによって異なり，基本的には設計者が設定するのが一般的である．

著者は報酬の設定に関して，例えば人とのインタラクションやロボットに感情 [74]

を与えることで実現できると考えているが，これについては今後の課題であると考えている．

また本来人間は感情や感性を持っており，感性によって知識や行動が影響を受ける．ここでは，ロボットの感性は知覚情報のフィルタとしての役割を果たすと考える．文献 [75]では，ロボットの感性として「美しい」をどのように考えることができるかが議論されている．「美しい」とは，ある概念の中心的なものに感じる感覚であると考えられ，非常に予測の精度が高いものに感じると考えるのが妥当であろう．この仕組みは，入力された知覚情報を概念構造に当てはめたときに，

その情報がある概念の中心に来るような場合のメタ認知として解釈することができる．著者はこのような仕組みをロボットに持たせることができれば，掃除タスクにあったような「きれい」や「汚い」という概念を実現することができるのではないかと考えており，このような仕組みを今後検討する必要があると考えている．

6.3.2 提案モデルに対する課題

提案モデルでは，確率モデルであるマルチモーダルLDA（MLDA）を階層化した多層マルチモーダルLDA（mMLDA）であり，各概念に対してカテゴリ数を予め人手で与えなければならない．しかし，実際にロボットが階層的な概念を形成する際に，予めカテゴリ数を得ることはできない．また，環境やセンサの性能によって取得されるマルチモーダル情報が異なるため，適切なカテゴリ数を事前に与えることは困難である．従って，各概念におけるカテゴリ数を自動的に推定可能なモデルを考える必要があると言える．この問題に対する解決方法として，変分ベイズ法やノンパラメトリックベイズなどが挙げられる．例えば，先行研究ではノンパラメトリックベイズであるHierarchical Dirichlet Processes（HDP）[60]をマルチモーダルに拡張したマルチモーダルHDP（MHDP）[62]が提案されており，

物体カテゴリ分類に対してその有効性を示した．この知見を活かして，MHDPを階層化することでカテゴリ数を自動的に推定可能なモデルを今後検討する必要があると考える．

また本論文における言語生成は，獲得した文法と言語モデルを統合する手法に

基づいている．この手法では学習したシーンに対する文を生成することはできるが，未学習のシーンに対しての生成が困難である．これを防ぐために，今後の課題として概念クラスのバイグラムと統語を用いて，文法の学習と文生成を行うことを考えている．

また本論文では，実際のコミュニケーションに重要となる様々な文脈をモデル化し，それをロボットに応用する手法を提案した．提案手法では行動文脈，場所文脈及び音声命令を統合してロボットの行動決定を実現し，シミュレーション実験でその有効性を示した．今後は，より現実的なシナリオやフィールドでの学習・

行動決定実験を行う予定である．

[1] iROBOT，“ROOMBA”，http://www.irobot.com/．

[2] SONY，“AIBO”，http://www.sony.jp/products/Consumer/aibo/．

[3] NDSOFT，“PARO”，http://www.ndsoft.jp/paro.php/． [4] HONDA，“ASIMO”，http://www.honda.co.jp/ASIMO/．

[5] TOYOTA，“TPR”，http://www.toyota.co.jp/．

[6] DARPA，“DARPA Robotics Challenge”，http://www.theroboticschallenge.org/． [7] “Robocup@home”，http://www.ai.rug.nl/robocupathome/．

[8] J．Locke，“An Essay Concerning Human Understanding”，London，1689． [9] D．L．Medin，and L．J．Rips，“Concepts and Categories: Memory，

Mean-ing，and Metaphysics”，2005．

[10] F．G．Ashby，and W．T．Maddox，“Human Category Learning”，Annual Review of Psychology，vol．56，pp．149–178，2005．

[11] E．Rosch，“Principles of categorization”，Concepts: core readings，pp．189–

206，1999．

[12] S．Lewandowsky，M．Kalish，S．K．Ngang，“Simplified learning in com-plex situations: knowledge partitioning in function learning”，Journal Exp.

Psychol: Gen，vol．131，pp．163–193，2002．

[13] A．B．Markman，B．H．Ross，“Category use and category learning”， Psy-chol. Bull, vol．129，no．4，pp．592–613，2003．

[14] E. M. Markman，“The whole-object, taxonomic, and mutual exclusivity as-sumptions as initial constraints on word meanings”，Perspectives on Lan-guage and Thought: Interrelations in Development，pp．72–106，1991． [15] S．Harnad，“The symbol grounding problem”，Physica D，vol．42，pp．335–

346，1990．

[16] J．F．Sowa，“Semantic Networks”，Encyclopedia of Artificial Intelligence， Wiley，1987．

[17] S．J．Russell，P．Norvig，“Artificial intelligence : a modern approach (3rd ed.)”. Prentice Hall，2010．

[18] 中村友昭，長井隆行，岩橋直人，“ロボットによる物体のマルチモーダルカテゴリゼーション”，電子情報通信学会和文論文誌，vol．J92-D，no．10，pp． 2507–2518，2008．

[19] R．Fergus，P．Perona，and A．Zisserman，“Object Class Recognition by Unsupervised Scale-Invariant Learning”，in Proc．of CVPR 2003，vol．2， pp．264–271，2003．

[20] J．Sivic，B．C．Russell，A．A．Efros，A．Zisserman，and W．T．Freeman，

“Discovering Object Categories in Image Collections”，in Proc. of ICCV 2005， pp．370–377，2005．

[21] L．Fei-Fei，“A bayesian hierarchical model for learning natural scene cat-egories”，IEEE Conf．on Computer Vision and Pattern Recognition，pp． 524–531，2005．

[22] C．Wang，D．Blei．and L．Fei-Fei，“Simultaneous image classification and annotation”，IEEE Conf．on Computer Vision and Pattern Recognition，pp． 1903–1910，2009．

[23] E．Torres-Jara，L．Natale，and P．Fitzpatrick，“Tapping into Touch”，Lund University Cognitive Studies，pp．22–24，2005．

[24] J．Sinapov，and A．Stoytchev，“Object Category Recognition by a Hu-manoid Robot Using Behavior-grounded Relational Learning”，in Proc．of ICRA 2011，pp．184–190，2011．

[25] W．Takano，H．Imagawa，and Y．Nakamura，“Prediction of Human Be-haviors in the Future through Symbolic Inference”，in Proc．of ICRA 2011，

pp．1970–1975，2011．

[26] W．Takano，and Y．Nakamura，“Bigram-Based Natural Language Model and Statistical Motion Symbol Model for Scalable Language of Humanoid Robots”，in Proc．of ICRA 2012，pp．1232–1237，2012．

[27] T．Taniguchi，and S．Nagasaka，“Double Articulation Analyzer for Unseg-mented Human Motion using Pitman-Yor Language Model and Infinite Hid-den Markov Model”，in Proc．of SII 2011，pp．250–255，2011．

[28] T．Ogata，S．Nishide，H．Kozima，K．Komatani，and H．Okuno， “Inter-modality Mapping in Robot with Recurrent Neural Network”，Pattern Recog-nition Letters，vol．31，no．12，pp．1560–1569，2010．

[29] L．Montesano，M．Lopes，A．Bernardino，and J．S．Victor，“Learning Object Aﬀordances: From Sensory-Motor Coordination to Imitation”，IEEE Trans．on Robotics，vol．24，no．1，2008．

[30] B．Moldovan，P．Moreno，M．Otterlo，J．S．Victor，and L．D．Raedt，

“Learning Relational Aﬀordance Models for Robots in Multi-Object Manip-ulation Tasks”，in Proc．of ICRA 2012，pp．4373–4378，2012．

[31] A．Gupta，A．Kembhavi，and L．S．Davis，“Observing Human-Object In-teractions: Using Spatial and Functional Compatibility for Recognition”， IEEE Trans．on PAMI，vol．31，no．10，pp．1775–1789，2009．

[32] B．Yao，and L．Fei-Fei，“Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses”，IEEE Trans．on PAMI，vol．34，pp．1691–1703，2012．

[33] C．L．Teo，Y．Yang，H．Daum´e III，C．Ferm¨uller，Y．Aloimonos，“A Corpus-Guided Framework for Robotic Visual Perception”，in Proc．of AAAI 2011, 2011．

[34] H．Yu，J．M．Siskind，“Grounded Language Learning from Video Described with Sentences”，in Proc．of ACL，pp．53–63，2013．

[35] M．Regneri，M．Rohrbach，D．Wetzel，S．Thater，B．Schiele，and M．

Pinkal，“Grounding Action Descriptions in Videos”，in Proc．of ACL，pp． 25–36，2013．

[36] R．Brooks，“A robust layered control system for a mobile robot”，IEEE Journal of Robotics and Automation, vol．RA-2，no．1，1986．

[37] M．Attamimi，A．Mizutani，T．Nakamura，T．Nagai，K．Funakoshi，M． Nakano，“Real-Time 3D Visual Sensor for Robust Object Recognition”，Int.

Conf. on IROS，pp．4560–4565，2010．

[38] S. M. Lavalle，“Rapidly-exploring random trees: A new tool for path plan-ning”，1998．

[39] H．Murase，V．V．Vinod，“Fast Visual Search Using Focussed Color Match-ing: Active Search”，IEIEC J81-D-2，no．9，pp．2035–2042，1998 (in Japanese)．

[40] K．Okada，S．Kagami，M．Inaba，H．Inoue，“Plane Segment Finder: Algo-rithm, Implementation and Applications”，Int．Conf．on ICRA，pp．2120–

2125，2001．

[41] R．Osada，T．Funkhouser，B．Chazelle，D．Dobkin，“Shape Distributions”， ACM Transactions on Graphics，vol．21，no．4，pp．807–832，2002．

[42] G．Csurka，C．Dance，L．Fan，J．Williamowski，C．Bray，“Visual Cat-egorization with Bags of Keypoints”，Int. Workshop on Statistical Learning in Computer Vision，pp．1–22，2004．

[43] E．Nowak, F．Jurie，B．Triggs，“Sampling Strategies for Bag-of-Features Image Classification”，Int．Conf．on ECCV，vol．3954，pp．490–503，2006． [44] A．Vedaldi, B．Fulkerson，“VLFeat-An Open and Portable Library of

Com-puter Vision Algorithms”，ACM Multimedia，pp．1469–1472，2010． [45] T．Oggier，“Miniature 3D TOF Camera for Real-Time Imaging”，Perception

and Interactive Technology，pp．212–216，2006．

[46] M．Bohme，“Shading Constraint Improves Accuracy of Time-of-Flight Mea-surements”，CVIU，2010．

[47] M．Sturmer，“Standardization of Intensity Values Acquired by Time-of-Flight-Cameras”，CVPRW，2008．

[48] C．C．Chang，and C．J．Lin，“LIBSVM: A Library for Support Vector Machines”，ACM Transactions on Intelligent Systems and Technology，vol．

2，no．3，pp．1–27，2011．

[49] C. E Rasmussen，“The Infinite Gaussian Mixture Model”，In Advances in Neural Information Processing Systems，vol．12，pp．554–560，2000． [50] D. G¨or¨ur，and C. E. Rasmussen，“Dirichlet Process Gaussian Mixture

Mod-els: Choice of the Base Distribution”，Journal of Computer Science and Tech-nology，vol．25，no．4，pp．653–664，2010．

[51] M．Everingham，L．Gool，C．K．Williams，J．Winn，and A．Zisserman，

“The Pascal Visual Object Classes (VOC) Challenge”，Int. Journal of Com-puter Vision，vol．88，no．2， pp．303–338，2010．

[52] P．J．Besl，and N．D．McKay，“A Method for Registration of 3-D Shapes”，

IEEE Trans. on Pattern Analysis and Machine Intelligence，vol．14，no．2， pp．239–256，1992．

[53] M．Attamimi，T．Nakamura，and T．Nagai，“Hierarchical Multilevel Object Recognition Using Markov Model”，in．Proc．ICPR 2012，pp．2963–2966， 2012．

[54] M．Attamimi，K．Ito，T．Nakamura，and T．Nagai，“A Planning Method for Eﬃcient Mobile Manipulation Considering Ambiguity”，in．Proc．IROS 2012, pp．965–972，2012．

[55] 長井隆行，中村友昭，“マルチモーダルカテゴリゼーション：経験を通して概念を形成し言葉の意味を理解するロボットの実現に向けて”，人工知能学会誌，

vol．27，no．6，pp．555–562，2012．

[56] T. Hofmann，“Unsupervised learning by probabilistic latent semantic analy-sis”，Machine Learning，vol．42，pp．177–196，2001．

[57] D．M．Blei，A．Y．Ng，and M．I．Jordan，“Latent Dirichlet allocation”，

Journal of Machine Learning Research，vol．3，pp．993–1022，2003． [58] T．Nakamura，T．Araki，T．Nagai，and N．Iwahashi，“Grounding of Word

Meanings in LDA-Based Multimodal Concepts”，Advanced Robotics，vol． 25，no．17，pp．2189–2206，2011．

[59] 中村友昭，西田匡志，長井隆行，“把持動作による物体カテゴリの形成と認識”，情報処理学会全国大会，5V-3，2010．

[60] Y．W．Teh，M．I．Jordan，M．J．Beal，and D．M．Blei，“Hierarchical Dirichlet processes”，Journal of the American Statistical Association, vol． 101，no．476，pp．1566–1581, 2006．

[61] O．Mangin and P．Y．Oudeyer，“Learning to Recognize Parallel Combina-tions of Human Motion Primitives with Linguistic DescripCombina-tions using Non-negative Matrix Factorization”，in Proc．of IROS 2012，pp．3268–3275， 2012．

[62] 中村友昭，荒木孝弥，長井隆行，岩橋直人，“階層ディリクレ過程に基づくロボットによる物体のマルチモーダルカテゴリゼーション”，計測自動制御学会論文集，pp．469–478，vol．49，no．4，2013．

[63] K. Kinoshita, Y. Konishi, S. Lao, and M. Kawade, “Facial Feature Extraction and Head Pose Estimation Using Fast 3D Model Fitting”, in Proc. of MIRU 2008, pp.1325–1329, 2008 (in Japanese)

[64] Y. Konishi, K. Kinoshita, S. Lao, and M. Kawade, “Real-Time Estimation of Smile Intensities”, in Proc. of Interaction 2008, no. 2008, vol. 4, pp.47–48, 2008 (in Japanese)

[65] K．Papineni，S．Roukos，T．Ward，W．J．Zhu，“BLEU: a Method for Automatic Evaluation of Machine Translation”，in Proc．of ACL，2002．

[66] M．J．Beal，Z．Ghahramani，and C．E．Rasmussen，“The infinite hid-den markov model”，Advances in neural information processing systems，pp． 577–584，2001．

[67] R．Kelley，M．Nicolescu，A．Tavakkoli，C．King，and G．Bebis， “Under-standing human intentions via Hidden Markov Models in autonomous mobile robots”，ACM/IEEE Int．Conf．in HRI，pp．367–374，2008．

[68] D．Gehrig，P．Krauthausen，L．Rybok，H．Kuehne，U．D．Hanebeck，T．

Schultz，and R．Stiefelhagen，“Combined intention，activity，and motion recognition for a humanoid household robot”，IEEE Int．Conf．on IROS， pp．4819–4825，2011．

[69] H．Koppula，R．Gupta，and A．Saxena，“Learning Human Activities and Object Aﬀordances from RGB-D Videos”，Int．Journal of Robotics Research， vol．32，no．8，pp．951–970，2013．

[70] 中村友昭，船越孝太郎，長井隆行，“HDP-HMMを用いたロボットによる物体軌道の学習と予測”，日本ロボット学会学術講演会，2C1-05，2013． [71] K．Sugiura，N．Iwahashi，H．Kawai，and S．Nakamura，“Situated spoken

dialogue with robots using active learning”，Advanced Robotics，vol．25， no．17，pp．2207–2232，2011．

[72] 太田裕治，元岡展久，椎尾一郎，塚田浩二，神原啓介，“ユビキタスコンピューティング実験住宅における無侵襲歩行モニタリングの試み”，電気学会論文誌 C編（電子・情報・システム部門誌）vol．130，no．3，pp．383–387，2010． [73] 浅田稔，“ロボットの行動獲得のための能動学習”，情報処理，vol．38，no．

7，pp．583–588，1997．

[74] 山口拓郎，アッタミミムハンマド，中村友昭，長井隆行，池原雅章，“マルチモーダルLDAを用いたロボットによる感情語彙の獲得”，第14回計測自動制御学会システムインテグレーション部門講演会，1M4-5，2014．

[75] 長井隆行，“自身の経験が生み出すロボットの知性と感性”，第15回感性工学会大会, 企画：感性ロボティクスの未来(感性ロボティクス部会)，招待講演，

D71，2013．

論文

[1] Muhammad Attamimi，Takaya Araki，Tomoaki Nakamura，and Takayuki Nagai，“Visual Recognition System for Cleaning Tasks by Humanoid Robots”， International Journal of Advanced Robotic Systems: Humanoid, pp.1–14, 2013. （2章の内容に関連）

[2] アッタミミムハンマド，ファドリルムハンマド，阿部香澄，中村友昭，船越孝太郎，長井隆行，“多層マルチモーダルLDAを用いた人の動きと物体の統合概念の形成”，日本ロボット学会誌，vol.32, no.8，pp.753–764，2014．（3 章の内容に関連）

国際会議プロシーディングス

[1] Muhammad Attamimi, Tomoaki Nakamura, and Takayuki Nagai, “Hierar-chical Multilevel Object Recognition Using Markov Model”, ICPR 2012, pp.2963–2966, 2012．（2章の内容に関連）

[2] Muhammad Attamimi, Muhammad Fadlil, Kasumi Abe, Tomoaki Naka-mura, Kotaro Funakoshi, and Takayuki Nagai, “Integration of Various Con-cepts and Grounding of Word Meanings Using Multi-layered Multimodal LDA for Sentence Generation”, IROS 2014，pp.2194–2201，2014．（4章，5 章の内容に関連）

ドキュメント内マルチモーダル潜在的ディリクレ配分法の多層化による知識の確率的表現 (ページ 154-172)

SNRSNR

6.3 今後の課題

6.3.1 タスクに対する知識の利用

6.3.2 提案モデルに対する課題

関連論文

論文

国際会議プロシーディングス