Fine-Tuning
ʹΑΔྖҬʹಛԽͨ͠
DistilBERT
Ϟσϧͷߏங
Construction of Domain Specific DistilBERT Model by Using Fine-Tuning
৽ೲߒ
Hiroyuki Shinnouന੩
Bai JingӶ
Cao Ruiഅϒϯ
Ma WenҵେֶେֶӃཧֶݚڀՊใֶઐ߈
Major in Computer and Information Sciences, Graduate School of Science and Enginnering, Ibaraki University In this paper, we point out the problem that BERT is domain dependent, and propose to construct the domain specific pre-training model by using fine-tuning. In particular, parameters of a DistilBERT model are initialized by a trained BERT model, and then they are tuned from the specific domain corpus. As a result, we can efficiently construct the domain specific DistilBERT model. In the experiment, we make the test set for each domain, which is the estimation of a masked word in a sentence. By this test set, we evaluate the domain specific DistilBERT model by comparing with the general BERT model, and show the superiority of our proposed model.
1. ͡Ίʹ
ࣄલֶशϞσϧͰ͋ΔBERT (Bidirectional Encoder Rep-resentations from Transformers) ༷ʑͳࣗવݴޠॲཧͷλ εΫʹରͯ͠ར༻͞ΕɼͦͷγεςϜͷਫ਼্ʹେ͖͘د ༩͍ͯ͠Δ[Devlin 19]ɽͦͷΑ͏ʹ༗ӹͳBERTͰ͋Δ͕ɼ BERTࣗମେنͳίʔύε͔Βڭࢣͳ͠ͷΈͰֶश ͞ΕͨͷͰ͋Γɼવɼͦ͜Ͱར༻͞ΕͨίʔύεͷόΠΞ εΛड͚ΔͣͰ͋ΔɽຊจͰ͜ͷBERTͷྖҬґଘͷ Λ؆୯ͳ࣮ݧ͔Βࣔ͠ɼྖҬʹಛԽͨ͠ࣄલֶशϞσϧͷ ߏஙΛఏҊ͢Δɽ ྖҬʹಛԽͨ͠ࣄલֶशϞσϧΛߏங͢Δ߹ɼྖҬຖͷ ίʔύεΛ४උ͠ɼͦΕΛར༻ͯ͠BERTϞσϧΛֶश͢Ε Α͍ɽͨͩ͜͠ͷ߹ɼେنͳྖҬͷίʔύεΛ४උͰ͖ ΔͱݶΒͳ͍ɽ·ͨBERTͷֶशʹଟେͳܭࢉػࢿݯΛ ඞཁͱ͢Δɽͭ·Γɼ͜ͷΑ͏ʹ୯७ͳΞϓϩʔνͰɼྖҬ ʹಛԽͨ͠ࣄલֶशϞσϧΛߏங͢Δ͜ͱࠔͰ͋Δɽ ຊͰ͜ͷʹରॲ͢ΔͨΊʹɼࣄલֶशϞσϧͱͯ͠
DistilBERT [Sanh 19]Λར༻͢ΔɽDistilBERTৠཹͷς ΫχοΫΛར༻ͨܰ͠ྔ൛ BERTͰ͋ΔɽBERT௨ৗ12
ͷMulti-head AttentionΛ͕࣋ͭɼDistilBERTͦΕΛ
6ʹ͍ͯ͠Δɽ͜ͷ6ͷύϥϝʔλͷॳظΛطଘBERT ͷ֤͔Βίϐʔ͢Δɼͭ·ΓBERTͷfine-tuningͷܗͰ DistilBERT ͷֶश͕ߦ͑Δɽfine-tuningͰ͋ΔͨΊʹɼ ՃͰඞཁͱͳΔֶशίʔύεখنͳͷͰ͢Έɼޮతʹ ࣄલֶशϞσϧͷߏங͕ߦ͑Δɽ ࣮ݧͰAmazonϨϏϡʔͷσʔληοτΛར༻͢Δɽ͜ͷ σʔληοτbooks, dvd, musicͷ3ͭͷྖҬΛͭɽ֤ྖ ҬຖʹطଘͷBERTϞσϧ͔ΒྖҬʹಛԽͨ͠DistilBERT ϞσϧΛߏஙͨ͠ɽྖҬຖʹ MASK ୯ޠͷਪఆΛ࡞ ͠ɼߏஙͨ͠Ϟσϧͱطଘ BERTϞσϧΛൺֱ͢Δ͜ͱͰɼ ྖҬʹಛԽͨ͠Ϟσϧͷ༗ӹੑΛࣔ͢ɽ
2. BERT ͷྖҬґଘੑ
BERT Transformer[Vaswani 17]Ͱ༻͍ΒΕͨ Multi-head attentionΛ12ʢ͋Δ͍24ʣॏͶͨϞσϧͰ͋ ΓɼύϥϝʔλͷֶशMasked Language Model ͱNext
࿈བྷઌ:৽ೲߒ[email protected] Sentence Predictionͱ͍͏ೋͭͷλεΫΛղ͘͜ͱͰɼڭࢣ ͳ͠ͷΈͷԼͰߦΘΕΔɽͦͷͨΊֶशʹඞཁͳσʔλ ૉͳίʔύεͰ͋Γɼେن͔ͭΦʔϓϯͱ͍͏؍͔Βɼ ଟ͘ͷBERTϞσϧֶशίʔύεͱͯ͠WikipediaΛར ༻͍ͯ͠Δɽ͜ͷ߹ɼ໌Β͔ʹߏங͞ΕͨBERTϞσϧ WikipediaͷόΠΞεΛड͚ΔͣͰ͋Δɽ ͜ͷΛࣔͨ͢Ίʹɼ؆୯ͳ࣮ݧΛߦͬͯΈΔɽࠓɼެ։ ͞Ε͍ͯΔຊޠBERTϞσϧͱͯ͠ɼ౦େֶ͕ެ։ͨ͠
BERT (tohoku-BERTͱུ͢)∗1 ͱStockmark͕ࣾެ։͠ ͨBERT (stockmark-BERTͱུ͢)∗2 ͕ଘࡏ͢Δɽ͜ͷ2 ͭͷϞσϧڞʹtokenizerͱͯ͠MeCab-NEologdΛͬ ͓ͯΓɼϞσϧͷΈతʹಉ͡Ͱ͋Δ͕ɼֶश༻ίʔύε ͕ҟͳ͍ͬͯΔɽtohoku-BERTֶश༻ίʔύεͱͯ͠ຊ ޠWikipedia Λར༻͠ɼstockmark-BERT ຊޠϏδω εχϡʔεهࣄ(300ສهࣄ)Λར༻͍ͯ͠Δɽ͜ΕΒ2ͭͷ BERTϞσϧΛར༻ͯ͠ɼҎԼͷจͰ[MASK] ͷ୯ޠΛਪఆ ͯ͠ΈΔɽ
ຊͷ[MASK]ඇৗʹ͢Β͍͠ tohoku-BERTͱstockmark-BERT͕ͦΕͧΕਪఆ্ͨ͠ Ґ5ݸͷ୯ޠҎԼͷ௨ΓͰ͋Δɽ݁Ռ͕શ͘ҟͳΓɼֶश༻ ίʔύεͷόΠΞεΛड͚͍ͯΔ͜ͱ͕֬ೝͰ͖Δɽ tohoku-BERT ͷ্Ґ5ݸ [’߹’, ’[UNK]’, ’גՁ’, ’νϟʔτ’, ’ؾީ’] stockmark-BERT ͷ্Ґ5ݸ [’[UNK]’, ’ٕज़’, ’αʔϏε’, ’ۀ’, ’ࣄྫ’] ࣄલֶशϞσϧֶ͕श࣌ʹ༻͍ͨίʔύεͷόΠΞεΛड͚ ΔͷͰ͋Εɼ࣮ࡍʹγεςϜ͕ରͱ͢ΔྖҬʹಛԽͨ͠ࣄ લֶशϞσϧΛߏங͠ɼͦΕΛར༻͢Δํ͕γεςϜͷੑೳ ্͕ΔͣͰ͋Δɽྫ͑จ[Alsentzer 19]ͰҩྍͷςΩ ετΛѻ͏߹ɼҩྍͷྖҬͷίʔύε͔ΒBERTΛߏங͠ɼ ߏஙͨ͠BERTΛར༻͢Δํ͕ɼطଘͷWikipedia͔Βߏங ͞ΕͨBERTΛར༻͢ΔΑΓҩྍؔͷλεΫਫ਼͕5ͭ ∗1 https://github.com/cl-tohoku/bert-japanese ∗2 https://drive.google.com/drive/folders/1iDlmhGgJ54rkVBtZvgMlgbuNwtFQ50V-1
The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020
த3͕ͭվળ͞Εͨ͜ͱΛใࠂ͍ͯ͠Δɽ·ͨจ[ࣳࢁ19] ͰจॻྨͷλεΫͷλεΫʹΑͬͯෳͷBERTϞσϧ ͷൺֱΛߦ͍ͬͯΔɽͦ͜Ͱੜͨ͡ਫ਼ͷҧ͍BERTߏங ࣌ʹར༻͞Εֶͨश༻ίʔύεͷҧ͍͕ݪҼͰ͋Δͱߟͯ͠ ͍Δɽ
3. DistilBERT ͷར༻
ྖҬʹಛԽͨ͠ࣄલֶशϞσϧ༗ӹͰ͋Δ͕ɼͦͷߏஙʹ ͕ඞཁͰ͋Δɽ୯७ʹߦ͏ͷͰ͋ΕɼରྖҬͷίʔ ύεΛେྔʹूΊͯɼBERT Λߏங͢ΕΑ͍ɽ͔͠͠ର ྖҬΛݶఆͨ͠߹ɼͦͷίʔύεΛେྔʹऩू͢Δͷࠔ Ͱ͋Δɽ·ͨBERTͷߏஙʹଟେͳܭࢉػࢿݯΛඞཁͱ͢ ΔͨΊʹɼྖҬ͕มߋ͞ΕΔͨͼʹଟେͳܭࢉΛߦ͏ͷݱ࣮ తʹແཧͰ͋Δɽ ͦ͜ͰຊจͰDistilBERTΛར༻͢ΔɽDistilBERTͰ ৠཹͷख๏[Hinton 15]Λར༻ͯ͠ɼطଘͷBERTϞσϧ Λॖ͢ΔɽߏஙͰ͖ͨϞσϧͷར༻ํ๏BERTϞσϧͱ ಉ͡Ͱ͋Δɽύϥϝʔλ6ׂʹͳΔͨΊʹֶशʹ͔͔ Δ࣌ؒେ෯ʹܰݮ͞ΕΔɽ ৠཹجຊతʹڭࢣϞσϧ͕ग़ྗ͢Δϥϕϧͱੜె Ϟσϧ͕ग़ྗ͢ΔϥϕϧͷͱͷࠩΛଛࣦ(soft target loss)ͱֶͯ͠श͢Δख๏Ͱ͋Δ(ਤ1ࢀর)ɽDistilBERTجຊ తʹ soft target lossΛMLM lossͱҰॹʹݮΒͯ͠Ώ͘͜ ͱͰֶश͍ͯ͠Δɽ ชॽ Ϡυϩگࢥ ਫ਼ై Ϡυϩ ාΝΚͦΖ Γ͑Ͷָस ƐŽĨƚƚĂƌŐĞƚůŽƐƐ ਤ1: ৠཹͷجຊख๏
·ͨ DistilBERT BERT ͷ fine-tuning ͕Մೳͳ͜ͱ େ͖ͳಛͰ͋ΔɽDistilBERT ͷҎ֎جຊతʹ
BERT ͱಉ͡ߏΛ͍࣋ͬͯΔɽBERT Multi-head At-tentionΛ12ॏͶͨϞσϧͰ͋Δ͕ɼDistilBERT6 ʹͳ͍ͬͯΔɽͦͷͨΊطଘͷBERTϞσϧͷ0ɼ2ɼ4ɼ7ɼ 9ٴͼ11൪ͷͷύϥϝʔλΛDistilBERTͷ֤ͷύϥ ϝʔλͷॳظͱ͢Δ͜ͱͰ fine-tuning͕ՄೳʹͳΔ(ਤ2 ࢀর)ɽͦͷ্ͰDistilBERTͷֶशʹར༻͢Δίʔύεͱ͠ ͯରྖҬͷίʔύεΛ༻͍ΕɼྖҬʹಛԽͨ͠ࣄલֶशϞ σϧͷߏங͕ߦ͑Δɽ /ŶƉƵƚ KƵƚƉƵƚ Zd KƵƚƉƵƚ /ŶƉƵƚ ŝƐƚŝůZd ŽƉLJ ਤ2: BERTʹΑΔDistilBERTͷॳظԽ
4. ࣮ݧ
4.1
ֶश༻σʔλ
DistilBERTͷֶशʹڭࢣͷBERTϞσϧ͕ඞཁͰ͋ Δ͕ɼ͜͜Ͱtohoku-BERTΛར༻͢Δɽfine-tuningͷͨ ΊͷϞσϧͷύϥϝʔλͷॳظʹtohoku-BERTͷͷ Λར༻ͨ͠ɽ ·ֶͨश༻ίʔύεͱͯ͠ҎԼͷαΠτͰެ։͞Ε͍ͯ ΔAmazonͷϨϏϡʔจॻΛར༻͢Δɽ https://webis.de/data/webis-cls-10.html ͜ͷσʔληοτbooksɼdvdɼmusicͷ3ͭͷྖҬ͕ଘ ࡏ͢Δɽ֤ྖҬʹؚ·ΕΔจॻΛද1ʹࣔ͢ɽ ද1: ֤ྖҬͷσʔλ books dvd music train 2,000 2,000 2,000 test 2,000 2,000 2,000 unlabeled 169,779 68,326 55,892 ֶश༻ͷίʔύεͱͯ͠ unlabeled ͷจॻΛར༻͢Δɽ ֤จॻ͔ΒจΛऔΓग़͢͜ͱͰbooks821,892จɼdvd 315,327จɼmusic 221,367 จ͔ΒͳΔ֤ʑͷίʔύεΛ ࡞ͨ͠∗3ɽ4.2
ධՁ༻σʔλ
ධՁߏஙͨ͠Ϟσϧͷۭॴ୯ޠͷਪఆͷੑೳʹΑΓ ߦ͏ɽ۩ମతʹɼ͋Δจsதͷ୯ޠwΛ[MASK]ʹஔ͖ ͑ͯɼsΛϞσϧʹೖྗ͢Δͱ[MASK]͕wͰ͋Δ͕֬ٻ· Δɽ͜ͷ֬ͷʹΑΓϞσϧΛධՁ͢Δɽ ֤ྖҬຖʹจͱ୯ޠΛtestσʔλ͔Βબग़͢Δɽ·ͣྖҬ ຖʹtestσʔλ͔Β໊ࢺͰ͋Δ୯ޠͷසදΛ࡞͢Δɽ্ Ґͷͷ͔Β20୯ޠΛબग़͠ɼ֤୯ޠΛؚΉจΛͦͷྖҬͷ testσʔλ͔ΒϥϯμϜʹ5จऔΓग़ͨ͠ɽ͜ΕʹΑ֤ͬͯ ྖҬຖʹ100จͷςετจΛߏஙͨ͠ɽྖҬbooksʹରԠ͢ ΔධՁσʔλΛbooks-testྖҬdvdʹରԠ͢ΔධՁσʔλΛdvd-testٴͼྖҬmusicʹରԠ͢ΔධՁσʔλΛmusic-test
ͱ໊͚Δɽ·ͨྖҬຖʹબग़ͨ͠20୯ޠҎԼͷ௨ΓͰ ͋Δɽ
∗3 ໌Β͔ʹจͱͳΔΑ͏ͳจͷΈΛऔΓग़ͨ͠ͷͰɺ࣮ࡍʹจॻ ʹଘࡏ͢ΔจͷΑΓখ͍͞ɽ
2
The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020
4.3
࣮ݧ݁Ռ
֤ྖҬͷֶश༻ίʔύεΛར༻֤ͯ͠ྖҬʹಛԽͨ͠ Dis-tilBERTͷϞσϧΛߏஙͨ͠ɽྖҬbooksʹରԠ͢ΔϞσϧ Λ books-modelɼྖҬ dvdʹରԠ͢ΔϞσϧΛ dvd-model ٴͼྖҬ music ʹରԠ͢ΔϞσϧΛ music-model ͱ໊͚ Δɽֶश10ΤϙοΫ·Ͱֶशͨ͠ɽ֤ΤϙοΫຖʹֶशͰ ͖ͨϞσϧΛอଘ͓͖ͯ͠ɼ֤ϞσϧʹΑΔධՁσʔλʹΑΓ ධՁΛٻΊͨɽධՁ͕࠷େʹͳΔͷΛબͨ͠ɽ݁ՌΛ ද2ʹࣔ͢ɽද͔Β໌Β͔ͳΑ͏ʹɼରͱ͢ΔྖҬʹಛԽ͠ ͨϞσϧ͕࠷ධՁ͕ߴ͘ͳͬͨɽͨͩ͠ରͱ͢ΔྖҬҎ ֎Ͱڭࢣͱͳ͍ͬͯΔ tohoku-BERTͷධՁΛ͑Δ ͜ͱͰ͖͍ͯͳ͍ɽ5. ߟ
5.1
fine-tuning ͷޮՌ
fine-tuning ͷޮՌΛ֬ೝ͢ΔͨΊʹɼfine-tuning Λར༻ ͠ͳ͍ɼͭ·Γ DistilBERTͷϞσϧͷॳظΛϥϯμϜʹ ઃఆֶͯ͠शΛߦͬͨ݁ՌΛද3ʹࣔ͢ɽfine-tuningΛߦͬ ͨ߹ɼରྖҬʹର͢Δ DistilBERTͷϞσϧڭࢣͷ tohoku-BERTͷධՁΛ͑Δ͜ͱ͕Ͱ͖͕ͨɼfine-tuning ΛߦΘͳ͔ͬͨ߹ɼtohoku-BERTͷධՁΛ͑Δ͜ͱ Ͱ͖ͣɼfine-tuningͷޮՌ͕֬ೝͰ͖Δɽ5.2
શྖҬΛؚΉίʔύεͷར༻
ྖҬʹಛԽͨ͠ίʔύεͰͳ͘ɼͦͷྖҬΛؚΉେنͳ ίʔύεΛ͏͜ͱͰɼߋʹධՁ͕վળ͞ΕΔՄೳੑ͕͋ Δɽ͜ͷΛ֬ೝ͢ΔͨΊͷ࣮ݧΛߦͬͨɽ۩ମతʹઌͷ࣮ ݧͰ༻͍ͨbooksͷίʔύεɼdvdͷίʔύεٴͼmusicͷ ίʔύεશͯͷ߹Θͤͨ 1,358,586จ͔ΒͳΔίʔύεΛ༻ ͍ͯɼઌͱಉ࣮͡ݧΛߦͬͨɽֶशͨ͠ϞσϧΛd3modelͱ ໊͚Δɽ݁ՌΛද4ʹࣔ͢ɽ d3-modelͷϞσϧ͕ͲͷྖҬͰ࠷ߴ͍ධՁΛग़ͯ͠ ͍Δɽ͜ͷͨΊ͋ΔྖҬʹಛԽͨ͠ࣄલֶशϞσϧΛ࡞Δ߹ ʹྖҬʹಛԽͨ͠ίʔύεͰͳ͘ɼͦͷྖҬΛؚΉେن ͳίʔύεΛ͑ྑ͍͜ͱ͕Θ͔Δɽͨͩ͜͜͠Ͱͷ࣮ݧͰ ֤Ϟσϧͷ̍ΤϙοΫʹཁֶͨ͠श࣌ؒΛද5ʹࣔ͢∗4ɽֶ शͷ࣌ؒ΄΅ίʔύεͷαΠζʹൺྫ͍ͯ͠Δ͜ͱ͕͔ Δɽ͜ͷͨΊྖҬʹಛԽͨ͠ίʔύε͚ͩͰֶश͢Δํ͕ޮ Α͍ɽ∗4 ϚγϯͷεϖοΫ CPU ͕ Core i7-8700, ϝϞϦ͕ 64GB, GPU ͕ GeForce RTX 2070 Ͱ͋Δɽ ද5: ̍ΤϙοΫͷֶश࣌ؒ σʔλ ࣌ؒ() books-model 821,892 132 dvd-model 315,327 49 music-model 221,367 34 d3-model 1,358,586 214
5.3
ڭࢣϞσϧΛӽ͑ΔϞσϧ
ৠཹͷख๏Λར༻ͯ͠ϞσϧѹॖΛߦ͏߹ɼߏஙͰ͖Δ Ϟσϧ͕ڭࢣϞσϧΛӽ͑Δ͜ͱࠔͰ͋ΔɽΦϦδφϧ ͷจͰ͋ͬͯѹॖ͞ΕͨϞσϧͰɼߏங࣮࣌ؒߦ࣌ؒ ͕վળ͞Ε͍ͯΔ͕ɼੑೳతʹڭࢣϞσϧΛӽ͍͑ͯͳ͍ɽ ͜͜Ͱͷ࣮ݧͰରྖҬʹಛԽ͢Δ͜ͱͰڭࢣϞσϧΛӽ͑ ͍ͯΔ͕ɼରྖҬΛ͘औͬͨ߹ʹڭࢣϞσϧΛӽ͑ͯ ͍ͳ͍ɽද6Ͱ࣮ݧͰߦͬͨ3ͭͷྖҬͷධՁͷฏۉ Λ͍ࣔͯ͠ΔɽڭࢣϞσϧ͕࠷ੑೳ͕ߴ͍͜ͱ͕͔Δɽ ද6: ࣮ݧ݁Ռʢฏۉʣ 3ྖҬͷฏۉධՁ tohoku-BERT 8.67 books-model 7.57 dvd-model 7.48 music-model 6.85 ৠཹͷख๏Λ͏߹ɼڭࢣϞσϧͷੑೳΛӽ͑Δ͜ͱ͕ ͍͠ͱ͍͏͋Δ͕ɼຊจͰߦͬͨΑ͏ʹରྖҬΛ ݶఆ͢Δ͜ͱͰɼͦͷରྖҬʹಛԽͯ͠ڭࢣϞσϧΛӽ͑Δ ͜ͱՄೳͰ͋ΓɼຊΞϓϩʔν࣮ફతͩͱߟ͑ΒΕΔɽ5.4
࠷దͳΤϙοΫ
࣮ݧͰ10ΤϙοΫ·Ͱֶश͠ɼ࠷ධՁͷߴ͔ͬͨϞ σϧΛબ͍ͯ͠ΔɽҰൠʹΤϙοΫ͕େ͖͍΄Ͳྑ͍Ϟσ ϧ͕ಘΒΕΔͱߟ͑ΒΕΔ͕ɼ࣮ࡍͦ͏Ͱͳ͍ɽbooks-testʹର͢Δbooks-modelɼdvd-testʹର͢Δ dvd-model ٴͼmusic-testʹର͢Δ music-model ͷ֤ΤϙοΫ ʹର͢ΔධՁΛਤ3,4,5ʹࣔ͢ɽ ƚŽŚŽŬƵͲďĞƌƚ ŬƐͲŵŽĚĞů ĨŽƌŬƐͲƚĞƐƚ ਤ3: ΤϙοΫʹର͢ΔධՁͷมԽ(books-model)
3
The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020
ද2: ࣮ݧ݁Ռ
tohoku-BERT books-model dvd-model music-model
books-test 10.26 10.75 6.61 6.27
dvd-test 9.39 8.95 11.48 7.27
music-test 6.37 3.00 4.34 7.02
ද3: fine-tuningΛ༻͍ͳ͍࣮ݧ݁Ռ
tohoku-BERT books-model dvd-model music-model
books-test 10.26 8.63 4.66 3.96
dvd-test 9.39 6.80 6.98 3.69
music-test 6.37 2.48 2.46 4.08
ද4: શྖҬΛؚΉίʔύεΛར༻࣮ͨ͠ݧ݁Ռ
tohoku-BERT books-model dvd-model music-model d3-model
books-test 10.26 10.75 6.61 6.27 11.20 dvd-test 9.39 8.95 11.48 7.27 11.84 music-test 6.37 3.00 4.34 7.02 7.02 ƚŽŚŽŬƵͲďĞƌƚ ĚǀĚͲŵŽĚĞů ĨŽƌĚǀĚͲƚĞƐƚ ਤ4: ΤϙοΫʹର͢ΔධՁͷมԽ(dvd-model) ƚŽŚŽŬƵͲďĞƌƚ ŵƵƐŝĐͲŵŽĚĞů ĨŽƌŵƵƐŝĐͲƚĞƐƚ ਤ5: ΤϙοΫʹର͢ΔධՁͷมԽ(music-model) books-modelͱdvd-modelʹؔͯ͠࠷େΤϙοΫͷͱ ͜ΖͰ࠷ྑͷΛग़͍ͯ͠Δ͕ɼmusic-modelʹؔͯͦ͠ ͏Ͱͳ͘ɼάϥϑͷ༷ࢠ͔Βɼ͜ΕҎ্ֶशΛଓ͚ͯධ Ձ্͠ͳ͍Α͏ʹݟ͑Δɽ࠷దͳΤϙοΫΛͲ͏ͬ ͯٻΊΔ͔ɼͭ·ΓֶशΛͲͷ࣌Ͱऴྃ͢Δ͔ࠓޙͷ՝ Ͱ͋Δɽ
6. ͓ΘΓʹ
ຊจͰBERTͷྖҬґଘͷΛࢦఠ͠ɼfine-Tuning Λར༻͢Δ͜ͱͰྖҬʹಛԽͨ͠ࣄલֶशϞσϧΛޮతʹ ߏஙͨ͠ɽ۩ମతʹطଘBERTϞσϧͷύϥϝʔλΛ Dis-tilBERTͷύϥϝʔλͷॳظͱ͠ɼྖҬຖͷίʔύεΛར ༻ͯ͠DistilBERTϞσϧͷֶशΛߦͬͨɽ࣮ݧͰɼྖҬ ຖʹۭॴ୯ޠͷਪఆΛ࡞͠ɼͷྖҬʹಛԽͯ͠ߏங ͨ͠ϞσϧͱطଘBERTϞσϧΛൺֱ͢Δ͜ͱͰɼߏஙͨ͠ Ϟσϧͷ༗ӹੑΛࣔͨ͠ɽৠཹΛ༻ֶ͍ͨशͰɼͲͷ࣌Ͱ ֶशΛࢭΊΔ͔͕ॏཁͰ͋Δɽࠓޙ͜ͷΛௐ͍ࠪͨ͠ɽࢀߟจݙ
[Alsentzer 19] Alsentzer, E., Murphy, J., Boag, W., Weng, W.-H., Jindi, D., Naumann, T., and McDer-mott, M.: Publicly Available Clinical BERT Embed-dings, in Proceedings of the 2nd Clinical Natural
Lan-guage Processing Workshop, pp. 72–78 (2019)
[Devlin 19] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.: BERT: Pre-training of Deep Bidirec-tional Transformers for Language Understanding, in
Pro-ceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguis-tics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
[Hinton 15] Hinton, G., Vinyals, O., and Dean, J.: Distill-ing the knowledge in a neural network, arXiv preprint
arXiv:1503.02531 (2015)
[Sanh 19] Sanh, V., Debut, L., Chaumond, J., and Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019)
[Vaswani 17] Vaswani, A., Shazeer, N., Parmar, N., Uszko-reit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polo-sukhin, I.: Attention is all you need, in Advances in
neu-ral information processing systems, pp. 5998–6008 (2017)
[ࣳࢁ19] ࣳࢁر, Ӷ, ന੩, അϒϯ, ৽ೲߒɿຊޠ
Pretrained BERTϞσϧͷൺֱ, ୈ15ճςΩετΞφϦ ςΟΫεɾγϯϙδϜ, 21 (2019)