• 検索結果がありません。

Fine-Tuning による領域に特化した DistilBERT モデルの構築

N/A
N/A
Protected

Academic year: 2021

シェア "Fine-Tuning による領域に特化した DistilBERT モデルの構築"

Copied!
4
0
0

読み込み中.... (全文を見る)

全文

(1)

Fine-Tuning

ʹΑΔྖҬʹಛԽͨ͠

DistilBERT

Ϟσϧͷߏங

Construction of Domain Specific DistilBERT Model by Using Fine-Tuning

৽ೲߒ޾

Hiroyuki Shinnou

ന੩

Bai Jing

૤Ӷ

Cao Rui

അϒϯ

Ma Wen

ҵ৓େֶେֶӃཧ޻ֶݚڀՊ৘ใ޻ֶઐ߈

Major in Computer and Information Sciences, Graduate School of Science and Enginnering, Ibaraki University In this paper, we point out the problem that BERT is domain dependent, and propose to construct the domain specific pre-training model by using fine-tuning. In particular, parameters of a DistilBERT model are initialized by a trained BERT model, and then they are tuned from the specific domain corpus. As a result, we can efficiently construct the domain specific DistilBERT model. In the experiment, we make the test set for each domain, which is the estimation of a masked word in a sentence. By this test set, we evaluate the domain specific DistilBERT model by comparing with the general BERT model, and show the superiority of our proposed model.

1. ͸͡Ίʹ

ࣄલֶशϞσϧͰ͋ΔBERT (Bidirectional Encoder Rep-resentations from Transformers) ͸༷ʑͳࣗવݴޠॲཧͷλ εΫʹରͯ͠ར༻͞ΕɼͦͷγεςϜͷਫ਼౓޲্ʹେ͖͘د ༩͍ͯ͠Δ[Devlin 19]ɽͦͷΑ͏ʹ༗ӹͳBERTͰ͋Δ͕ɼ BERTࣗମ͸େن໛ͳίʔύε͔Βڭࢣͳ͠ͷ࿮૊ΈͰֶश ͞Εͨ΋ͷͰ͋Γɼ౰વɼͦ͜Ͱར༻͞ΕͨίʔύεͷόΠΞ εΛड͚Δ͸ͣͰ͋Δɽຊ࿦จͰ͸͜ͷBERTͷྖҬґଘͷ ໰୊Λ؆୯ͳ࣮ݧ͔Βࣔ͠ɼྖҬʹಛԽͨ͠ࣄલֶशϞσϧͷ ߏஙΛఏҊ͢Δɽ ྖҬʹಛԽͨ͠ࣄલֶशϞσϧΛߏங͢Δ৔߹ɼྖҬຖͷ ίʔύεΛ४උ͠ɼͦΕΛར༻ͯ͠BERTϞσϧΛֶश͢Ε ͹Α͍ɽͨͩ͜͠ͷ৔߹ɼେن໛ͳྖҬͷίʔύεΛ४උͰ͖ Δͱ͸ݶΒͳ͍ɽ·ͨBERTͷֶशʹ͸ଟେͳܭࢉػࢿݯΛ ඞཁͱ͢Δɽͭ·Γɼ͜ͷΑ͏ʹ୯७ͳΞϓϩʔνͰ͸ɼྖҬ ʹಛԽͨ͠ࣄલֶशϞσϧΛߏங͢Δ͜ͱ͸ࠔ೉Ͱ͋Δɽ ຊ࿦Ͱ͸͜ͷ໰୊ʹରॲ͢ΔͨΊʹɼࣄલֶशϞσϧͱͯ͠

DistilBERT [Sanh 19]Λར༻͢ΔɽDistilBERT͸ৠཹͷς ΫχοΫΛར༻ͨܰ͠ྔ൛ BERTͰ͋ΔɽBERT͸௨ৗ12

૚ͷMulti-head AttentionΛ͕࣋ͭɼDistilBERT͸ͦΕΛ

6૚ʹ͍ͯ͠Δɽ͜ͷ6૚ͷύϥϝʔλͷॳظ஋ΛطଘBERT ͷ֤૚͔Βίϐʔ͢Δɼͭ·ΓBERTͷfine-tuningͷܗͰ DistilBERT ͷֶश͕ߦ͑Δɽfine-tuningͰ͋ΔͨΊʹɼ௥ ՃͰඞཁͱͳΔֶशίʔύε͸খن໛ͳ΋ͷͰ͢Έɼޮ཰తʹ ࣄલֶशϞσϧͷߏங͕ߦ͑Δɽ ࣮ݧͰ͸AmazonϨϏϡʔͷσʔληοτΛར༻͢Δɽ͜ͷ σʔληοτ͸books, dvd, musicͷ3ͭͷྖҬΛ΋ͭɽ֤ྖ ҬຖʹطଘͷBERTϞσϧ͔ΒྖҬʹಛԽͨ͠DistilBERT ϞσϧΛߏஙͨ͠ɽྖҬຖʹ MASK ୯ޠͷਪఆ໰୊Λ࡞੒ ͠ɼߏஙͨ͠Ϟσϧͱطଘ BERTϞσϧΛൺֱ͢Δ͜ͱͰɼ ྖҬʹಛԽͨ͠Ϟσϧͷ༗ӹੑΛࣔ͢ɽ

2. BERT ͷྖҬґଘੑ

BERT ͸ Transformer[Vaswani 17]Ͱ༻͍ΒΕͨ Multi-head attentionΛ12૚ʢ͋Δ͍͸24૚ʣॏͶͨϞσϧͰ͋ Γɼύϥϝʔλͷֶश͸Masked Language Model ͱNext

࿈བྷઌ:৽ೲߒ޾[email protected] Sentence Predictionͱ͍͏ೋͭͷλεΫΛղ͘͜ͱͰɼڭࢣ ͳ͠ͷ࿮૊ΈͷԼͰߦΘΕΔɽͦͷͨΊֶशʹඞཁͳσʔλ ͸ૉͳίʔύεͰ͋Γɼେن໛͔ͭΦʔϓϯͱ͍͏؍఺͔Βɼ ଟ͘ͷBERTϞσϧ͸ֶशίʔύεͱͯ͠WikipediaΛར ༻͍ͯ͠Δɽ͜ͷ৔߹ɼ໌Β͔ʹߏங͞ΕͨBERTϞσϧ͸ WikipediaͷόΠΞεΛड͚Δ͸ͣͰ͋Δɽ ͜ͷ఺Λࣔͨ͢Ίʹɼ؆୯ͳ࣮ݧΛߦͬͯΈΔɽࠓɼެ։ ͞Ε͍ͯΔ೔ຊޠBERTϞσϧͱͯ͠ɼ౦๺େֶ͕ެ։ͨ͠

BERT (tohoku-BERTͱུ͢)∗1 ͱStockmark͕ࣾެ։͠ ͨBERT (stockmark-BERTͱུ͢)∗2 ͕ଘࡏ͢Δɽ͜ͷ2 ͭͷϞσϧ͸ڞʹtokenizerͱͯ͠MeCab-NEologdΛ࢖ͬ ͓ͯΓɼϞσϧͷ࿮૊Έతʹ͸ಉ͡Ͱ͋Δ͕ɼֶश༻ίʔύε ͕ҟͳ͍ͬͯΔɽtohoku-BERT͸ֶश༻ίʔύεͱͯ͠೔ຊ ޠWikipedia Λར༻͠ɼstockmark-BERT ͸೔ຊޠϏδω εχϡʔεهࣄ(300ສهࣄ)Λར༻͍ͯ͠Δɽ͜ΕΒ2ͭͷ BERTϞσϧΛར༻ͯ͠ɼҎԼͷจͰ[MASK] ͷ୯ޠΛਪఆ ͯ͠ΈΔɽ





೔ຊͷ[MASK]͸ඇৗʹ͢͹Β͍͠





tohoku-BERTͱstockmark-BERT͕ͦΕͧΕਪఆ্ͨ͠ Ґ5ݸͷ୯ޠ͸ҎԼͷ௨ΓͰ͋Δɽ݁Ռ͕શ͘ҟͳΓɼֶश༻ ίʔύεͷόΠΞεΛड͚͍ͯΔ͜ͱ͕֬ೝͰ͖Δɽ tohoku-BERT ͷ্Ґ5ݸ [’৔߹’, ’[UNK]’, ’גՁ’, ’νϟʔτ’, ’ؾީ’] stockmark-BERT ͷ্Ґ5ݸ [’[UNK]’, ’ٕज़’, ’αʔϏε’, ’੡଄ۀ’, ’ࣄྫ’] ࣄલֶशϞσϧֶ͕श࣌ʹ༻͍ͨίʔύεͷόΠΞεΛड͚ ΔͷͰ͋Ε͹ɼ࣮ࡍʹγεςϜ͕ର৅ͱ͢ΔྖҬʹಛԽͨ͠ࣄ લֶशϞσϧΛߏங͠ɼͦΕΛར༻͢Δํ͕γεςϜͷੑೳ͸ ্͕Δ͸ͣͰ͋Δɽྫ͑͹࿦จ[Alsentzer 19]Ͱ͸ҩྍͷςΩ ετΛѻ͏৔߹ɼҩྍͷྖҬͷίʔύε͔ΒBERTΛߏங͠ɼ ߏஙͨ͠BERTΛར༻͢Δํ͕ɼطଘͷWikipedia͔Βߏங ͞ΕͨBERTΛར༻͢ΔΑΓ΋ҩྍؔ܎ͷλεΫਫ਼౓͕5ͭ ∗1 https://github.com/cl-tohoku/bert-japanese ∗2 https://drive.google.com/drive/folders/

1iDlmhGgJ54rkVBtZvgMlgbuNwtFQ50V-1

The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020

(2)

த3͕ͭվળ͞Εͨ͜ͱΛใࠂ͍ͯ͠Δɽ·ͨ࿦จ[ࣳࢁ19] Ͱ͸จॻ෼ྨͷλεΫͷλεΫʹΑͬͯෳ਺ͷBERTϞσϧ ͷൺֱΛߦ͍ͬͯΔɽͦ͜Ͱੜͨ͡ਫ਼౓ͷҧ͍͸BERTߏங ࣌ʹར༻͞Εֶͨश༻ίʔύεͷҧ͍͕ݪҼͰ͋Δͱߟ࡯ͯ͠ ͍Δɽ

3. DistilBERT ͷར༻

ྖҬʹಛԽͨ͠ࣄલֶशϞσϧ͸༗ӹͰ͋Δ͕ɼͦͷߏஙʹ ͸޻෉͕ඞཁͰ͋Δɽ୯७ʹߦ͏ͷͰ͋Ε͹ɼର৅ྖҬͷίʔ ύεΛେྔʹूΊͯɼBERT Λߏங͢Ε͹Α͍ɽ͔͠͠ର৅ ྖҬΛݶఆͨ͠৔߹ɼͦͷίʔύεΛେྔʹऩू͢Δͷ͸ࠔ೉ Ͱ͋Δɽ·ͨBERTͷߏஙʹ͸ଟେͳܭࢉػࢿݯΛඞཁͱ͢ ΔͨΊʹɼྖҬ͕มߋ͞ΕΔͨͼʹଟେͳܭࢉΛߦ͏ͷ͸ݱ࣮ తʹ͸ແཧͰ͋Δɽ ͦ͜Ͱຊ࿦จͰ͸DistilBERTΛར༻͢ΔɽDistilBERTͰ ͸ৠཹͷख๏[Hinton 15]Λར༻ͯ͠ɼطଘͷBERTϞσϧ Λॖ໿͢ΔɽߏஙͰ͖ͨϞσϧͷར༻ํ๏͸BERTϞσϧͱ ಉ͡Ͱ͋Δɽύϥϝʔλ਺͸໿6ׂʹͳΔͨΊʹֶशʹ͔͔ Δ࣌ؒ͸େ෯ʹܰݮ͞ΕΔɽ ৠཹ͸جຊతʹ͸ڭࢣϞσϧ͕ग़ྗ͢Δϥϕϧ෼෍ͱੜె Ϟσϧ͕ग़ྗ͢Δϥϕϧͷ෼෍ͱͷࠩΛଛࣦ(soft target loss)

ͱֶͯ͠श͢Δख๏Ͱ͋Δ(ਤ1ࢀর)ɽDistilBERT΋جຊ తʹ soft target lossΛMLM lossͱҰॹʹݮΒͯ͠Ώ͘͜ ͱͰֶश͍ͯ͠Δɽ ชॽ Ϡυϩگࢥ ਫ਼ై Ϡυϩ ෾ාΝ߻ΚͦΖ Γ͑Ͷָस ƐŽĨƚƚĂƌŐĞƚůŽƐƐ ਤ1: ৠཹͷجຊख๏

·ͨ DistilBERT ͸ BERT ͷ fine-tuning ͕Մೳͳ͜ͱ ΋େ͖ͳಛ௃Ͱ͋ΔɽDistilBERT ͸૚ͷ਺Ҏ֎͸جຊతʹ

BERT ͱಉ͡ߏ଄Λ͍࣋ͬͯΔɽBERT ͸ Multi-head At-tentionΛ12૚ॏͶͨϞσϧͰ͋Δ͕ɼDistilBERT͸6૚ ʹͳ͍ͬͯΔɽͦͷͨΊطଘͷBERTϞσϧͷ0ɼ2ɼ4ɼ7ɼ 9ٴͼ11൪໨ͷ૚ͷύϥϝʔλΛDistilBERTͷ֤૚ͷύϥ ϝʔλͷॳظ஋ͱ͢Δ͜ͱͰ fine-tuning͕ՄೳʹͳΔ(ਤ2 ࢀর)ɽͦͷ্ͰDistilBERTͷֶशʹར༻͢Δίʔύεͱ͠ ͯର৅ྖҬͷίʔύεΛ༻͍Ε͹ɼྖҬʹಛԽͨ͠ࣄલֶशϞ σϧͷߏங͕ߦ͑Δɽ /ŶƉƵƚ KƵƚƉƵƚ Zd KƵƚƉƵƚ /ŶƉƵƚ ŝƐƚŝůZd ŽƉLJ ਤ2: BERTʹΑΔDistilBERTͷॳظԽ

4. ࣮ݧ

4.1

ֶश༻σʔλ

DistilBERTͷֶशʹ͸ڭࢣ໾ͷBERTϞσϧ͕ඞཁͰ͋ Δ͕ɼ͜͜Ͱ͸tohoku-BERTΛར༻͢Δɽfine-tuningͷͨ ΊͷϞσϧͷύϥϝʔλͷॳظ஋ʹ΋tohoku-BERTͷ΋ͷ Λར༻ͨ͠ɽ ·ֶͨश༻ίʔύεͱͯ͠͸ҎԼͷαΠτͰެ։͞Ε͍ͯ ΔAmazonͷϨϏϡʔจॻΛར༻͢Δɽ https://webis.de/data/webis-cls-10.html ͜ͷσʔληοτ͸booksɼdvdɼmusicͷ3ͭͷྖҬ͕ଘ ࡏ͢Δɽ֤ྖҬʹؚ·ΕΔจॻ਺Λද1ʹࣔ͢ɽ ද1: ֤ྖҬͷσʔλ਺ books dvd music train 2,000 2,000 2,000 test 2,000 2,000 2,000 unlabeled 169,779 68,326 55,892 ֶश༻ͷίʔύεͱͯ͠͸ unlabeled ͷจॻΛར༻͢Δɽ ֤จॻ͔ΒจΛऔΓग़͢͜ͱͰbooks͸821,892จɼdvd͸ 315,327จɼmusic ͸221,367 จ͔ΒͳΔ֤ʑͷίʔύεΛ ࡞੒ͨ͠∗3ɽ

4.2

ධՁ༻σʔλ

ධՁ͸ߏஙͨ͠Ϟσϧͷۭॴ୯ޠͷਪఆ໰୊ͷੑೳʹΑΓ ߦ͏ɽ۩ମతʹ͸ɼ͋Δจsதͷ୯ޠwΛ[MASK]ʹஔ͖׵ ͑ͯɼsΛϞσϧʹೖྗ͢Δͱ[MASK]͕wͰ͋Δ֬཰͕ٻ· Δɽ͜ͷ֬཰ͷ஋ʹΑΓϞσϧΛධՁ͢Δɽ ֤ྖҬຖʹจͱ୯ޠΛtestσʔλ͔Βબग़͢Δɽ·ͣྖҬ ຖʹtestσʔλ͔Β໊ࢺͰ͋Δ୯ޠͷස౓දΛ࡞੒͢Δɽ্ Ґͷ΋ͷ͔Β20୯ޠΛબग़͠ɼ֤୯ޠΛؚΉจΛͦͷྖҬͷ testσʔλ͔ΒϥϯμϜʹ5จऔΓग़ͨ͠ɽ͜ΕʹΑ֤ͬͯ ྖҬຖʹ100จͷςετจΛߏஙͨ͠ɽྖҬbooksʹରԠ͢ ΔධՁσʔλΛbooks-testྖҬdvdʹରԠ͢ΔධՁσʔλΛ

dvd-testٴͼྖҬmusicʹରԠ͢ΔධՁσʔλΛmusic-test

ͱ໊෇͚Δɽ·ͨྖҬຖʹબग़ͨ͠20୯ޠ͸ҎԼͷ௨ΓͰ ͋Δɽ

∗3 ໌Β͔ʹจͱͳΔΑ͏ͳจͷΈΛऔΓग़ͨ͠ͷͰɺ࣮ࡍʹจॻ಺ ʹଘࡏ͢Δจͷ਺ΑΓ͸খ͍͞ɽ

2

The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020

(3)





books: ຊ, ਓ, ஶऀ, ಺༰, ࣗ෼, ࡞඼, ຊॻ, ײ͡, จষ, ओਓެ, খઆ, ෦෼, ࠷ޙ, ݴ༿, ಡऀ, ࡞ऀ, ਓؒ, ෺ޠ, ଞ, ੈք dvd: өը, ࡞඼, ਓ, γʔϯ, ө૾, ݪ࡞, ࣗ෼, ετʔϦʔ, ಺༰, ϑΝϯ, ײ͡, ओਓެ, ࠷ޙ, Ξχϝ, υϥϚ, ෺ޠ, ਓؒ, ੈք, ࢠڙ, ෦෼ music: ۂ, ΞϧόϜ, ࡞඼, ਓ, Իָ, ײ͡, ϑΝϯ, Ի, όϯυ, ࣗ෼, Վࢺ, ੠, Ϊλʔ, Վ, ̘̙, ָۂ, α΢ϯυ, ϥΠϒ, γϯάϧ, લ࡞





֤ϞσϧΛ༻͍Ε͹֤ධՁจʹରͯ͠[MASK]͕ਖ਼͍͠୯ޠ Λ༧ଌ͢Δ֬཰͕ٻ·ΔͷͰɼͦΕΒΛ߹ܭͨ͠஋ΛධՁ஋ͱ ͢Δɽ

4.3

࣮ݧ݁Ռ

֤ྖҬͷֶश༻ίʔύεΛར༻֤ͯ͠ྖҬʹಛԽͨ͠ Dis-tilBERTͷϞσϧΛߏஙͨ͠ɽྖҬbooksʹରԠ͢ΔϞσϧ Λ books-modelɼྖҬ dvdʹରԠ͢ΔϞσϧΛ dvd-model ٴͼྖҬ music ʹରԠ͢ΔϞσϧΛ music-model ͱ໊෇͚ Δɽֶश͸10ΤϙοΫ·Ͱֶशͨ͠ɽ֤ΤϙοΫຖʹֶशͰ ͖ͨϞσϧΛอଘ͓͖ͯ͠ɼ֤ϞσϧʹΑΔධՁσʔλʹΑΓ ධՁ஋ΛٻΊͨɽධՁ஋͕࠷େʹͳΔ΋ͷΛબ୒ͨ͠ɽ݁ՌΛ ද2ʹࣔ͢ɽද͔Β໌Β͔ͳΑ͏ʹɼର৅ͱ͢ΔྖҬʹಛԽ͠ ͨϞσϧ͕࠷΋ධՁ஋͕ߴ͘ͳͬͨɽͨͩ͠ର৅ͱ͢ΔྖҬҎ ֎Ͱ͸ڭࢣ໾ͱͳ͍ͬͯΔ tohoku-BERTͷධՁ஋Λ௒͑Δ ͜ͱ͸Ͱ͖͍ͯͳ͍ɽ

5. ߟ࡯

5.1

fine-tuning ͷޮՌ

fine-tuning ͷޮՌΛ֬ೝ͢ΔͨΊʹɼfine-tuning Λར༻ ͠ͳ͍ɼͭ·Γ DistilBERTͷϞσϧͷॳظ஋ΛϥϯμϜʹ ઃఆֶͯ͠शΛߦͬͨ݁ՌΛද3ʹࣔ͢ɽfine-tuningΛߦͬ ͨ৔߹ɼର৅ྖҬʹର͢Δ DistilBERTͷϞσϧ͸ڭࢣ໾ͷ tohoku-BERTͷධՁ஋Λ௒͑Δ͜ͱ͕Ͱ͖͕ͨɼfine-tuning ΛߦΘͳ͔ͬͨ৔߹ɼtohoku-BERTͷධՁ஋Λ௒͑Δ͜ͱ͸ Ͱ͖ͣɼfine-tuningͷޮՌ͕֬ೝͰ͖Δɽ

5.2

શྖҬΛؚΉίʔύεͷར༻

ྖҬʹಛԽͨ͠ίʔύεͰ͸ͳ͘ɼͦͷྖҬΛؚΉେن໛ͳ ίʔύεΛ࢖͏͜ͱͰɼߋʹධՁ஋͕վળ͞ΕΔՄೳੑ͕͋ Δɽ͜ͷ఺Λ֬ೝ͢ΔͨΊͷ࣮ݧΛߦͬͨɽ۩ମతʹ͸ઌͷ࣮ ݧͰ༻͍ͨbooksͷίʔύεɼdvdͷίʔύεٴͼmusicͷ ίʔύεશͯͷ߹Θͤͨ 1,358,586จ͔ΒͳΔίʔύεΛ༻ ͍ͯɼઌͱಉ࣮͡ݧΛߦͬͨɽֶशͨ͠ϞσϧΛd3modelͱ ໊෇͚Δɽ݁ՌΛද4ʹࣔ͢ɽ d3-modelͷϞσϧ͕ͲͷྖҬͰ΋࠷΋ߴ͍ධՁ஋Λग़ͯ͠ ͍Δɽ͜ͷͨΊ͋ΔྖҬʹಛԽͨ͠ࣄલֶशϞσϧΛ࡞Δ৔߹ ʹ͸ྖҬʹಛԽͨ͠ίʔύεͰ͸ͳ͘ɼͦͷྖҬΛؚΉେن໛ ͳίʔύεΛ࢖͑͹ྑ͍͜ͱ͕Θ͔Δɽͨͩ͜͜͠Ͱͷ࣮ݧͰ ֤Ϟσϧͷ̍ΤϙοΫʹཁֶͨ͠श࣌ؒΛද5ʹࣔ͢∗4ɽֶ शͷ࣌ؒ͸΄΅ίʔύεͷαΠζʹൺྫ͍ͯ͠Δ͜ͱ͕෼͔ Δɽ͜ͷͨΊྖҬʹಛԽͨ͠ίʔύε͚ͩͰֶश͢Δํ͕ޮ཰ ͸Α͍ɽ

∗4 ϚγϯͷεϖοΫ͸ CPU ͕ Core i7-8700, ϝϞϦ͕ 64GB, GPU ͕ GeForce RTX 2070 Ͱ͋Δɽ ද5: ̍ΤϙοΫͷֶश࣌ؒ σʔλ਺ ࣌ؒ(෼) books-model 821,892 132 dvd-model 315,327 49 music-model 221,367 34 d3-model 1,358,586 214

5.3

ڭࢣϞσϧΛӽ͑ΔϞσϧ

ৠཹͷख๏Λར༻ͯ͠ϞσϧѹॖΛߦ͏৔߹ɼߏஙͰ͖Δ Ϟσϧ͕ڭࢣϞσϧΛӽ͑Δ͜ͱ͸ࠔ೉Ͱ͋ΔɽΦϦδφϧ ͷ࿦จͰ͋ͬͯ΋ѹॖ͞ΕͨϞσϧͰ͸ɼߏங࣌ؒ΍࣮ߦ࣌ؒ ͕վળ͞Ε͍ͯΔ͕ɼੑೳతʹ͸ڭࢣϞσϧΛӽ͍͑ͯͳ͍ɽ ͜͜Ͱͷ࣮ݧͰ͸ର৅ྖҬʹಛԽ͢Δ͜ͱͰڭࢣϞσϧΛӽ͑ ͍ͯΔ͕ɼର৅ྖҬΛ޿͘औͬͨ৔߹ʹ͸ڭࢣϞσϧΛӽ͑ͯ ͸͍ͳ͍ɽද6Ͱ͸࣮ݧͰߦͬͨ3ͭͷྖҬͷධՁ஋ͷฏۉ Λ͍ࣔͯ͠ΔɽڭࢣϞσϧ͕࠷΋ੑೳ͕ߴ͍͜ͱ͕෼͔Δɽ ද6: ࣮ݧ݁Ռʢฏۉ஋ʣ 3ྖҬͷฏۉධՁ஋ tohoku-BERT 8.67 books-model 7.57 dvd-model 7.48 music-model 6.85 ৠཹͷख๏Λ࢖͏৔߹ɼڭࢣϞσϧͷੑೳΛӽ͑Δ͜ͱ͕ ೉͍͠ͱ͍͏໰୊͸͋Δ͕ɼຊ࿦จͰߦͬͨΑ͏ʹର৅ྖҬΛ ݶఆ͢Δ͜ͱͰɼͦͷର৅ྖҬʹಛԽͯ͠ڭࢣϞσϧΛӽ͑Δ ͜ͱ͸ՄೳͰ͋ΓɼຊΞϓϩʔν͸࣮ફతͩͱߟ͑ΒΕΔɽ

5.4

࠷దͳΤϙοΫ਺

࣮ݧͰ͸10ΤϙοΫ·Ͱֶश͠ɼ࠷΋ධՁ஋ͷߴ͔ͬͨϞ σϧΛબ୒͍ͯ͠ΔɽҰൠʹΤϙοΫ਺͕େ͖͍΄Ͳྑ͍Ϟσ ϧ͕ಘΒΕΔͱߟ͑ΒΕΔ͕ɼ࣮ࡍ͸ͦ͏Ͱ͸ͳ͍ɽ

books-testʹର͢Δbooks-modelɼdvd-testʹର͢Δ dvd-model ٴͼmusic-testʹର͢Δ music-model ͷ֤ΤϙοΫ ʹର͢ΔධՁ஋Λਤ3,4,5ʹࣔ͢ɽ ƚŽŚŽŬƵͲďĞƌƚ ŬƐͲŵŽĚĞů ĨŽƌŬƐͲƚĞƐƚ ਤ3: ΤϙοΫ਺ʹର͢ΔධՁ஋ͷมԽ(books-model)

3

The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020

(4)

ද2: ࣮ݧ݁Ռ

tohoku-BERT books-model dvd-model music-model

books-test 10.26 10.75 6.61 6.27

dvd-test 9.39 8.95 11.48 7.27

music-test 6.37 3.00 4.34 7.02

ද3: fine-tuningΛ༻͍ͳ͍࣮ݧ݁Ռ

tohoku-BERT books-model dvd-model music-model

books-test 10.26 8.63 4.66 3.96

dvd-test 9.39 6.80 6.98 3.69

music-test 6.37 2.48 2.46 4.08

ද4: શྖҬΛؚΉίʔύεΛར༻࣮ͨ͠ݧ݁Ռ

tohoku-BERT books-model dvd-model music-model d3-model

books-test 10.26 10.75 6.61 6.27 11.20 dvd-test 9.39 8.95 11.48 7.27 11.84 music-test 6.37 3.00 4.34 7.02 7.02 ƚŽŚŽŬƵͲďĞƌƚ ĚǀĚͲŵŽĚĞů ĨŽƌĚǀĚͲƚĞƐƚ ਤ4: ΤϙοΫ਺ʹର͢ΔධՁ஋ͷมԽ(dvd-model) ƚŽŚŽŬƵͲďĞƌƚ ŵƵƐŝĐͲŵŽĚĞů ĨŽƌŵƵƐŝĐͲƚĞƐƚ ਤ5: ΤϙοΫ਺ʹର͢ΔධՁ஋ͷมԽ(music-model) books-modelͱdvd-modelʹؔͯ͠͸࠷େΤϙοΫ਺ͷͱ ͜ΖͰ࠷ྑͷ஋Λग़͍ͯ͠Δ͕ɼmusic-modelʹؔͯ͠͸ͦ ͏Ͱ͸ͳ͘ɼάϥϑͷ༷ࢠ͔Β͸ɼ͜ΕҎ্ֶशΛଓ͚ͯ΋ධ Ձ஋͸޲্͠ͳ͍Α͏ʹݟ͑Δɽ࠷దͳΤϙοΫ਺ΛͲ͏΍ͬ ͯٻΊΔ͔ɼͭ·ΓֶशΛͲͷ࣌఺Ͱऴྃ͢Δ͔͸ࠓޙͷ՝୊ Ͱ͋Δɽ

6. ͓ΘΓʹ

ຊ࿦จͰ͸BERTͷྖҬґଘͷ໰୊Λࢦఠ͠ɼfine-Tuning Λར༻͢Δ͜ͱͰྖҬʹಛԽͨ͠ࣄલֶशϞσϧΛޮ཰తʹ ߏஙͨ͠ɽ۩ମతʹ͸طଘBERTϞσϧͷύϥϝʔλΛ Dis-tilBERTͷύϥϝʔλͷॳظ஋ͱ͠ɼྖҬຖͷίʔύεΛར ༻ͯ͠DistilBERTϞσϧͷֶशΛߦͬͨɽ࣮ݧͰ͸ɼྖҬ ຖʹۭॴ୯ޠͷਪఆ໰୊Λ࡞੒͠ɼ໰୊ͷྖҬʹಛԽͯ͠ߏங ͨ͠ϞσϧͱطଘBERTϞσϧΛൺֱ͢Δ͜ͱͰɼߏஙͨ͠ Ϟσϧͷ༗ӹੑΛࣔͨ͠ɽৠཹΛ༻ֶ͍ͨशͰ͸ɼͲͷ࣌఺Ͱ ֶशΛࢭΊΔ͔͕ॏཁͰ͋Δɽࠓޙ͸͜ͷ఺Λௐ͍ࠪͨ͠ɽ

ࢀߟจݙ

[Alsentzer 19] Alsentzer, E., Murphy, J., Boag, W., Weng, W.-H., Jindi, D., Naumann, T., and McDer-mott, M.: Publicly Available Clinical BERT Embed-dings, in Proceedings of the 2nd Clinical Natural

Lan-guage Processing Workshop, pp. 72–78 (2019)

[Devlin 19] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.: BERT: Pre-training of Deep Bidirec-tional Transformers for Language Understanding, in

Pro-ceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguis-tics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)

[Hinton 15] Hinton, G., Vinyals, O., and Dean, J.: Distill-ing the knowledge in a neural network, arXiv preprint

arXiv:1503.02531 (2015)

[Sanh 19] Sanh, V., Debut, L., Chaumond, J., and Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019)

[Vaswani 17] Vaswani, A., Shazeer, N., Parmar, N., Uszko-reit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polo-sukhin, I.: Attention is all you need, in Advances in

neu-ral information processing systems, pp. 5998–6008 (2017)

[ࣳࢁ19] ࣳࢁ௚ر, ૤Ӷ, ന੩, അϒϯ, ৽ೲߒ޾ɿ೔ຊޠ

Pretrained BERTϞσϧͷൺֱ, ୈ15ճςΩετΞφϦ ςΟΫεɾγϯϙδ΢Ϝ, 21 (2019)

4

The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020

参照

関連したドキュメント

Precisely, over a period of 120 months, the total number of new infections that will be generated from the two patches in the absence of optimal control is 1.2037× 10 4 , whereas,

In recent communications we have shown that the dynamics of economic systems can be derived from information asymmetry with respect to Fisher information and that this form

q-series, which are also called basic hypergeometric series, plays a very important role in many fields, such as affine root systems, Lie algebras and groups, number theory,

Therefore, with the weak form of the positive mass theorem, the strict inequality of Theorem 2 is satisfied by locally conformally flat manifolds and by manifolds of dimensions 3, 4

We shall see below how such Lyapunov functions are related to certain convex cones and how to exploit this relationship to derive results on common diagonal Lyapunov function (CDLF)

[25] Nahas, J.; Ponce, G.; On the persistence properties of solutions of nonlinear dispersive equa- tions in weighted Sobolev spaces, Harmonic analysis and nonlinear

† Institute of Computer Science, Czech Academy of Sciences, Prague, and School of Business Administration, Anglo-American University, Prague, Czech

[Mag3] , Painlev´ e-type differential equations for the recurrence coefficients of semi- classical orthogonal polynomials, J. Zaslavsky , Asymptotic expansions of ratios of