Construction of Database for Korean Books Written in Chinese Characters and Input Process of Chinese Characters

全文

(1)Vol.2014-DD-94 No.1 2014/7/24. IPSJ SIG Technical Report. Construction of Database for Korean Books Written in Chinese Characters and Input Process of Chinese Characters. SUNGDUK CHO †1. YONGJU HWANG †2 . Countries using Chinese characters such as Korea, China and Japan have been struggling to display Chinese characters in old texts or being newly made in computer system through the discussions at the IRG Meeting. We suggested the input method of Korean variants of Chinese characters for the digitalization of traditional texts and applied the method to the practical use; most of the icheja characters have been inputted in the standard forms or ihyeongja characters. †3 . 1. Introduction This paper discusses the methodological issues with respect to the digitalization of Korean books written in Chinese Characters focusing on the input method of Chinese characters. Currently, Korean people do not use Chinese characters widely in their daily lives; but instead, they use Hangeul. Hangeul is the vernacular Korean writing system that was created in 1443 by King Sejong g the Great. Historically, the majority of books published during the Joseon Dynasty (1392-1910) were written in Chinese characters. However, the mandatory use of Hangeul in official documents came to be codified when the Republic of Korea was established in 1948. In the beginning, there was a clause of permission to add Chinese characters if needed, but it is common to use Hangeul alone. The exclusive use of Hangeul in official documents is also stated in the Framework Act on the National Language enacted in 2005 for the development and preservation of the Korean language. In a nutshell, the government-initiated language policies have been encouraging the use of Hangeul instead of the traditionally-used Chinese characters; now the concern is not the use of Chinese characters but the excessive use of English words. Even though the official writing system of Korea is Hangeul, we still have a situation to deal with Chinese characters in certain areas: historical texts, personal names on government-issued certificates such as the family relation certificate and the copy of resident registration. Although its usage is limited in contemporary Korean’s ordinary life, the importance of Chinese characters should not be underestimated as they are the vehicles of cultural traditions and also are used in government documents. That is the reason that we have been participating in the international standardization of Chinese characters. There have been continuous efforts to preserve and bequeath cultural traditions in each country and the development of online digital archive is one of those endeavors. In Korea, the Korean Literary Collection Database by the Institute for the Translation †1 Sungkyunkwan University. †2 The National Institute of the Korean Language. †3The information discussed in this paper is not a South Korea's national official opinion and position, only own individual's.. ⓒ 2014 Information Processing Society of Japan. of Korean Classics and the Annals of the Joseon Dynasty Database by the National Institute of Korean History are two of the most representative project works. Input method of Chinese characters became a problem from the early stage of the database development project as some of the Chinese characters in classical texts were missing in the computer system. Developing a method to input those Chinese characters was the essential part of the project. The following chapters explore the input method of Chinese characters focusing on the variant forms used in Korean historical texts.. 2. Definition of In the previous discussions on variant Chinese characters, two types of variant have been mentioned as opposed to the standard and icheja[ 1 ]. We follow the character: definitions of each as described below[ 2]. 1). 2). Standard character: Standard character is the one that most clearly reflects its formation principle or the representative form decided by the government investigation. Most of them are the same as the standard form in Kanghijajeon(Kangxi Dictionary, . However there are exceptions of choosing another form as standard if it is more broadly used in Korea than the one in Kanghijajeon. Ihyeongja: Ihyeongja is an allograph that has different components from the standardized form in terms of strokes or radicals. Some take the typical stroke variations while the others take randomly different forms. The latter case includes the variant characters with a completely different-shaped radical. Old characters cited. 1 The followings are the previous studies on the concept of icheja: 蝗ᬒ㐲, ṇᏐ㢖 ␗㧓Ꮠ㠦 ╖䞲 ⪃ᐹ, 㡑ᅧ୰ㄒ୰ᩥᏥ᭳, ୰ㄒ୰ᩥᏥ ➨ 23 ㍴, 1999.; 紼ጞ⾝, ␗㧓Ꮠ㦮 ᐃ⩏෹ศ㢮 ⹥ ⪃㔚᪉ἲ㠦や䞮㡂, ୰ᅧㄒᩥᏥ ◦✲᭳, ୰ᅧㄒᩥᏥ緰㞟 ➨ 15 ⹰, 2000.; チႀ, ␗㧓Ꮠ ᩚ⌮㦮 ⌧ἣὒ ᒎ ᮃ(1)-␗㧓Ꮠ㦮 ᇶᮏ ᵧᛕὒ ㏆⌧௦ Ꮠ⾲ ᩚ⌮㦮 ẚ㍑⯒ ୰ᚰ㦒⪲-, 㡑ᅧ ₎Ꮠ₎ᩥᩃ⫱Ꮵ᭳, ₎Ꮠ₎ᩥᩃ⫱ ➨ 20 ㍴, 2008. 5. 2 The definition of icheja and ihyeongja below is quoted from ᮅ㩭ᮅᩥ⋙ ␗㧓Ꮠ◦✲ ϩ. ඛ⾜◦✲ ᷙウ 1. ␗㧓Ꮠ ᵧᛕ 2) ␗㧓Ꮠ ศ㢮.. 1.

(2) Vol.2014-DD-94 No.1 2014/7/24. IPSJ SIG Technical Report. 3). in Jaseo (Character Book) are also a kind of ihyeongja. Icheja: Icheja is an allograph that has the same components from the standard form in terms of strokes or radicals but has a different shape as a whole. Icheja came to be used in casual handwritings or in different calligraphic styles. Its shape also depends upon the types of writing tools and sometimes reflects artistic purposes. (When it comes to the scope of icheja, there are two different perspectives: a broad sense and a narrow sense. In a broad sense, icheja means a character that has a different shape; there is no concern about the original sound or meaning. Therefore, if two different characters have the same sound and meaning so that they are replaceable, those two can be described as icheja. Also, a newly created character to replace the old one due to considerable semantic changes of the old one, the old and new ones can be described as icheja. In a narrow sense, however, icheja only means a variant that was used in the same period with a standard form, which has the same sound and meaning[3].). 3. History of the Research on Korean Icheja The first attempt to examine Korean icheja was done in written by Kyeong’am Nosu Kim Jahakgo in 1945[ 4 ]. In the part titled Jeong’i in Jahakgo, he illustrated 697 commonly used icheja characters and added some annotations when needed. Jeong’i consists of three sections including Kyujangjeonun(約蟶衒蚏), Jeonunokpyun(衒蚏藛诈), and Sokrye(荱罚) as shown in Figure 1; Kyujang and Okpyun show the standard Figure 1. Jahakgo forms illustrated in the book Kyujangjeonun and Jeonunokpyun, respectively, while Sokrye shows the mundane usages. Some characters in Sokrye section were marked as ‘wrong (ㄗ)’ and it throws a sidelight on its widespread use. Jahako has a great importance as it gives a clue to the shape of Chinese characters that were widely used in commercial novels or other type of popular texts. Yet, the standard forms suggested in the section of Kyujang or Okpyun in Jahakgo are not identical to the current standardized forms. In other words, the standard forms recommended in Jahakgo are considered as icheja at present. Hanguksokjabo(( (Figure 2), published in 1986 3 ㉿ᡂᚫ, 㺀㡑ᅧᩥ㞟 ␗㧓Ꮠ Ꮠᙧ◦✲(1) -㡑ᅧᩥ㞟ྀห ୰ᚰ-㺁, ᮾ᪉ ₎ᩥᏥ᭳, ᮾ᪉₎ᩥᏥ ➨ 36 ㍴, 2008. 9. p327. 4 紼編὚, ᩗᗡ㞟 13 ༹, 㺀ᏐᏥ⪃㺁. ௘⣽䞲緉ᐜ㦖 ‘⏦┦㈼, 㺀ᩗᗡ 紼編὚㦮 ᏐᏥ⪃㠦 ᑞ䞲 ◦✲㺁, 㡑ᅧ₎ᩥྂ඾Ꮵ᭳, ₎ᩥྂ඾◦✲, ➨ 24 ㍴, 2012.’ ཨ⪃.. ⓒ 2014 Information Processing Society of Japan. by Professor Young-hwa Kim at the Chinese Culture University in Taiwan, categorized Korean icheja into 19 types by examining 3,478 characters (1,764 types) used in 36 volumes of traditional handwritten novels. This book has been assessed as the first monograph on the classification of icheja according to the handwriting styles[5]. Hangukhanjajahyeongjosa 艞) (Research on the Shape of Korean Chinese Characters) Figure 2. Hanguksokjabo project was the first government-initiated inspection of icheja which was being carried out by the National Institute of the Korean Language (NIKL) from 1996 to 2000. The project team examined 4,622 Chinese characters among 4,888 characters in KSC-5601 by exploring 290 old Korean texts. The result of this project (Figure 3) demonstrates 3,492 standard characters and 14,877 icheja characters[ 6 ]. Some parts of project. It was, however, at the early stage of the development of the. Figure 3.. Figure 4. Korean Icheja Table. Hangukhanjajahyeongjosa 1-2. Korean Literary Collection when the project was carried out. As a result, only 26 texts could be examined out of 1,259 texts in the final version of the Korean Literary Collection. Those 26 texts are ranging from the late Dynasty to the early Joseon Dynasty and mostly standardized woodprint versions. Hence, it is hard to say that the result of this project contains diverse shapes of icheja in handwritten texts or the collection of personal writings.. 5 ㉿ᡂᚫ, 㺀㡑ᅧᩥ㞟 ␗㧓Ꮠ Ꮠᙧ◦✲(1) -㡑ᅧᩥ㞟ྀห ୰ᚰ-㺁, ᮾ᪉ ₎ᩥᏥ᭳, ᮾ᪉₎ᩥᏥ ➨ 36 ㍴, 2008. 9. p329~330. 6 ㉿ᡂᚫ, 㺀㡑ᅧᩥ㞟 ␗㧓Ꮠ Ꮠᙧ◦✲(1) -㡑ᅧᩥ㞟ྀห ୰ᚰ㺁-, ᮾ᪉ ₎ᩥᏥ᭳, ᮾ᪉₎ᩥᏥ ➨ 36 ㍴, 2008. 9. p328.. 2.

(3) Vol.2014-DD-94 No.1 2014/7/24. IPSJ SIG Technical Report. 4. Korean Icheja Table (豈糑蝝誎螳谉) V 10.1. of the Korean pronunciation[8].. 4.1. Outline The development of the Korean Icheja Table (hereinafter referred to as the Table) began in 2000 with a view to conduct full-scale research on icheja by improving the previous data; it was a follow-up to the 1995 project examining literary collections and national treasury texts. The Table is now being used as a guideline for the correction or input of icheja or ihyeongja when building digital database. The Table came to be developed in 2000 to support the correction and input process of Chinese characters in the construction of Korean History Online Database. The first table contained 500 high frequency character types used in the Korean Literary Collection Database. Then, the number of characters came to be expanded to 1,000 types in 2003 and 4,363 types in 2004. That was the completion of the List of Icheja with standard forms, icheja, and iheongja. In 2006, it was again expanded to 4,850 types and 14,000 characters. Moreover, information on the reference for about 3,000 characters was added. This 2006 version came to be processed to Figure 5. The Korean Icheja the Icheja Retrieval Dictionary System on the web[7]. In 2007, it was updated by selecting high resolution data and deleting the overlapping character shapes. This database will be continuously revised. The Table involves around 20 Korean old texts including the Korean Literary Collection(( , the Annals of the Joseon Dynasty(( (manuscripts in King P and King Jeongjong(( parts), the Diaries of the Tajoe(( Royal Secretariatt ) (King Kojong(( ) part), Kyujangjeonun(( Muyedobotongji(( the Chronicles of the Three States ), Honameupji (Jeolla-do(( part), Pyeok’onbang g Pyeok’onsinbang(( , Imhapilgi 螞 ), and Kosachwal’yo In addition, the Table illustrates Sohakjeunghe(( 34,000 characters including 6,761 Unicode standard characters, 13,435 variant characters (icheja and ihyeongja), and 14,000 characters in image. The Table is currently in the middle of the publication and the editing process of the Korean Icheja (Figure 5, hereinafter referred to as Dictionary( the Dictionary) has been finished. The Dictionary illustrates the information by centuries on the shape of the Chinese Characters, Korean pronunciations, variant forms, print types, references, etc. The index of the Dictionary will follow the alphabetic order 7 ㉿ᡂᚫ, 㺀㡑ᅧᩥ㞟 ␗㧓Ꮠ Ꮠᙧ◦✲(1) -㡑ᅧᩥ㞟ྀห ୰ᚰ-㺁, ᮾ᪉ ₎ᩥᏥ᭳, ᮾ᪉₎ᩥᏥ ➨ 36 ㍴, 2008. 9. p329.. ⓒ 2014 Information Processing Society of Japan. 4.2. Scope of the Chinese Characters The Table includes Chinese characters extracted from certain texts as described in the introductory remarks; if the shape of a character except for EXT_A frequently appeared in Korean historical texts, the code was expected to be needed when building digital database, and the best quality image of the character was collected. If the standard form and the variant form of the character belonged to the range of EXT_B, then they were processed as an image. 4.3. Use of Data The 2004 guideline for correction and input has been cited in Chapter 4 (The Optimization and Upgrade of Icheja Dictionary Table) of the Report of Academic Research Project published by the National Institute of Korean History in 2005. The contents of the List of Icheja except for the images have been added to the Dictionary Table of Icheja by KRISTAL of the Korean History Online Database[9]. In addition, the data is now widely used by the national institutes when building Korean old text databases such as the Korean Literary Collection Database built by the Institute for the Translation of Korean Classics. The guideline is also in use by government document databases such as the Annals of the Joseon Dynasty or the Diaries of the Royal Secretariat with a partial revision. The 2004 guideline is also used by the major research institute translation project by the Institute for the Translation of Korean Classics.. 5. Processing of Ihyeongja and Icheja As the Chinese characters after EXT_B cannot be shown on the web, we have assigned a code point “KC_0000” to the characters not within the range of EXT_B; for the characters within the range of EXT_B, we used their code points. The Table was expected to be used mostly by the researchers in the field of Korean studies or by translators. Therefore, inputting variant shapes in image was thought to cause users much trouble. Moreover, it was impossible to input all the variant forms to the database. Thus, we made separate guidelines for standard forms, ihyeongja, and icheja and described inputting principles and exceptive clauses for the sake of the unity of the database as well as the user’s convenience. Standard forms and ihyeongja were in principle input as used in text, while icheja characters were replaced by their corresponding standard forms or ihyeonja. However, if there are too many ihyungja characters for a standard form, some of them were input in most similar looking standard forms or ihyeongja characters. For example, the variant forms of ⹶ were grouped 8 The Korean Icheja Dictionary (㡑ᅧ␗㧓Ꮠ඾) contains icheja found in Korean literary collections from the thirteenth to the twentieth centuries. 9 Academic research report - Search engine upgrades dictionary table (䞯㑶㡆ῂ㣿㡃㌂㠛 ἆὒ⽊ἶ㍲ Ỗ㌟㠪㰚㌂㩚䎢㧊な㠛⁎⩞㧊✲ the National Institute of Korean History,. 3.

(4) Vol.2014-DD-94 No.1 2014/7/24. IPSJ SIG Technical Report. according to the shape of ꃷ, ⹬(⹨), and ⹧ (Figure 6).. 1617.. ᒑ. Figure 6. Ihyeonja of 䠣. ᒑ. Choseo characters in the above paragraph were all put in the standard forms.. As for some font characters that are not commonly used in Korean texts such as ⃝/眤, ᧪/洐, ㆞/ヂ and 㦬/㥐, we put the simplified characters in the icheja group. On the contrary, although some characters are described as a pair of the original and simplified characters like V楞 and ⏘/罉 according to Kanghijajeon, we allocated both characters a code space if they are all broadly used in Korean texts. The following sections will briefly introduce hengseo (⾜᭩), choseo (ⲡ᭩), yeseo (薷᭩), and heseo (ᴠ᭩) parts in the Table.. 3) Yeseo(薷苗) The Korean Literary Collection contains two texts exclusively written in Yeseo style: Wangujip (ᐄୣ㞟) published in 1820 and Sagajipseo (ᅄె㞟ᗎ, the preface of the book Sagajip) in 1705. Yeseo characters in these two texts were also input in representative forms recommended by the Yeseo experts.. 臺. 5.1. Font Style Processing. 臺. 1) Hengseo(貇苗) The underlined characters in the above example were inputted There are approximately 300 hengseo characters in the Korean Literary Collection and most of them appear in preface or yumuk (㑇ቚ). As hengseo and choseo were difficult to input, they were put in the representative forms which had been recommended and also confirmed by the hengseo experts. In addition, Kiujip (㥽∵㞟) published in 1897 also contains hengseo characters together with heseo and choseo in preface (Figure 7).. in the standard forms: -臺,. -ྍ,. -அ,. -ᘥ.. -⪅,. -ᩝ,. -᪊,. -ே,. -↉,. -Ề,. -ᩥ,. -᫂,. As shown in Figure 7, the character Ꮵ, ␜, and. ⯅ have been written in hengseo style:. ,. , and. Figure 10. Wangujip. ,. respectively. Furthermore, ஸ has been written in its icheja These variant forms were all input in the and ⨕ in standard forms. 2) Choseo(誣苗) There are few choseo characters in the preface or yumuk of the Korean Literary Collection, and they were input in standard forms. The following example is from Jebongjip (㟨ᓟ㞟) published in. -௨,. Figure 9. Sagajipseo. Figure 7. Kiujip. -ἄ,. -ᅄ,. 4) Heseo(豷苗) , Most of texts in the Korean Literary Collection were written in heseo regardless of print methods (woodblock, type, or handwriting). The below is the example of heseo characters extracted from the preface of Gyegokjip (㇈㇂㞟). ∔. Figure 8. Jebongjip . ⓒ 2014 Information Processing Society of Japan. 臺. ∔. Figure 11. Gyegokjip. 4.

(5) Vol.2014-DD-94 No.1 2014/7/24. IPSJ SIG Technical Report. ∔. ᭉ 臺. The underlined characters in the above example were all typed in the standard characters suggested in the Table: -㡩,. - ,. -᫲,. -ᣦ,. -ᘄ,. -ᐢ,. -⛙,. -⧢.. In summary, hengseo, choseo, yeseo characters were input in the standard forms but heseo characters were put in the standard form recommended in the Table. The input method of the cursive forms like choseo and hengseo has been raised by a Korean representative at the IRG 40th Hong Kong Conference and also at the 42 nd Qingdao Conference[10]. The below shows what has been discussed in Hong Kong. ROK considers that those four characters are derived/modifed as follows: 1.1 orignal character Æ cursive form (嫛㦇) or handwriting Æ print (block) form cursive form (粡孍) or handwriting. original character. print (block) form (巌孍䝫). stoke. It can confuse its users when typing not only in Korea but also in other countries where texts in Chinese characters should be treated. If it is the case that the variants should be included in the Unicode system, it is the space for hengseo not for heseo that is needed. Otherwise, this kind of problem will continuously be brought up in the future. The following part will illustrate the common character shapes shown in Korean texts. Some characters proposed therein are registered in Unicode but some are not; this inconsistency often causes troubles when processing variant forms for database construction. 5.2. Character Shape Processing 1) Non-Unicode (~EXT_F) Character Types Example 1) 扠-. [b113;444b;7][ 11 ]: This variant. frequently appears in Korean handwritten scripts but is not included in Unicode. The variant from has got ᘎ, while the standard form has got ㎳ therefore these two characters should be processed separately. Yet,. was typed in its standard. form 扠.. . 癜. ᤠ. ᤠ . 癜. ᤠ. . ᤠ. - Problems: In general, several cursive forms could be derived from the ONE original character; furthermore, several print (block) forms could be derived from EACH cursive form. Therefore, in general, tens of print (block) forms could be derived from ONE original char.. Example 2) 䩗1.2 Simplification (亰▥): original form -> simplified form. 蔛G ѣ. [b113;542b;1]: 䩜 has been replaced by. (䠚) which has similar shape as well as comparable meaning.. . - In case of Chinese Simplified characters, there is a "fixed and stable" correspondence between Traditional and Simplified characters. - Problems: However, in general, several (or even tens of) simplified forms could be derived from the one original characters depending on which part to simplify and how much to simplify.. The problem regarding ‘yeo (⯅)’ discussed above must be an example among many similar cases. Although this kind of variants can be treated separately in domestic word processors, it seems not to be ideal to put them in Unicode. The side effect has been well shown in the case of EXT_B. EXT_B contains too many characters with minor differences such as one dot or one 10 ISO/IEC JTC1/SC2/WG2/IRG N1921_ROK_Feedback1_Consol_Comm. Doc. #: Korea JTC1/SC2 K2191_26, ISO/IEC JTC1/SC2/WG2/IRG N1979_KR_Feedback1. Doc. #: Korea JTC1/SC2 K2247_1.. ⓒ 2014 Information Processing Society of Japan. Example 3) 嶛-. [b081;070b;1]: ♲ in the right side of. the character has been replaced by hengseo style. .. 11 [b113;444b;7] is the reference in the Korean Literary. ‘b’ means ⧰㍴, ‘113’ means ㍴ᩝ, ‘444’ means the page number, ‘b’ means ẁᩝ, and ‘7’ means ⾜ᩝ. That is, [b113;444b;7] represents “㡑ᅧᩥ㞟ྀห⧰㍴ 113 ㍴ 444 㴓, ୖ ẁ ᕥഃ 7 ⾜.” In addition, the mark ‘*’ indicates the registration to EXT_B~EXT_F1, EXT_F2.. 5.

(6) Vol.2014-DD-94 No.1 2014/7/24. IPSJ SIG Technical Report. Example 4) 懎-. [a249;038d;6]: The symmetrical. has been replaced by. . Similar type of variants are registered. in Unicode including 㲑-㰌 and 塴-堻. Example 5) 琍been replaced by. Example 3). [b098;057b;9]: F05874. ⚑ has. (♲ൺ凡). This change is found in every. character with ⚑ shape in all the periods.. Example 4). Example 6) 拽-. [b098;134b;10]: 䋻 has been replaced. by 䌡 but this change seldom appears in handwritten texts.. So far, we have demonstrated non-Unicode characters which appear relatively in high frequency and also have examined whether the similar type of character is registered in Unicode. As we have seen in Example 1) some characters in ᘎ-㎳ or ㎳ᘎ type are included in Unicode, while others are not.. [b098;022b;1]:. This type has. got ⃅ or ⃎ on either side of ⮶, which makes ⮶ shape more stable and aesthetic. This change, which emerges in handwritten texts from all the periods, is also found in the characters including 䔻.. 5.3 Other ExampleG 1) Type of Icheja by Centuries. 2) Unicode (~EXT_F) Character Types. Example 1). We have exemplified Unicode characters which appear frequently and also have explored whether the similar type of character is registered in Unicode. As a result, we have found that some of the similar type variants are not included in Unicode. The symbol ‘*’ in above examples represents the inclusion of the same characters or the same type of characters in Unicode (~EXT_F). The characters without a code number below them are not included in Unicode; currently they are inputted in the standard form in our database.. The shape of the same gugeon (ᵓ௳) has changed continuously. To date, 462 items in ten types have been examined: (1) ୍, ୫ type 17 items, (2) ஧ type 35 items, (3) ே type 146 items, (4) ซ type 32 items, (5) ཱྀ type 58 items, (6) ኱ type 51 items, (7) ⅆ type 17 items, (8) ᪥ type 43 items, (9) ᩥ type 35 items, (10) Ⱂ type 28 items. The table below illustrates the shape of ஧ by centuries[12]. ex). Change. of ‘ℛ’ type. Example 2) 12 The following contents are quoted from ᮅ㩭ᮅᩥ⋙ ␗㧓Ꮠ◦✲ ‘ϫ. ␗㧓Ꮠ㦮 ᶏ┦ὒ ≉ᚪ 1) ᮅ㩭ᮅ ᩥ⋙ ␗㧓Ꮠ㦮 ᶏ┦ (1)᫬ᮇู ฟ⌧ ␗ 㧓Ꮠ 㶽ᆺ.. ⓒ 2014 Information Processing Society of Japan. 6.

(7) Vol.2014-DD-94 No.1 2014/7/24. IPSJ SIG Technical Report. G 2) A Table of Character Shapes by Centuries The table below demonstrates various forms of icheja characters which appear frequently in historical texts. The data is shown in two parts: the early Joseon period and the late Joseon period. The texts from the late Goryeo period also contain icheja but the type of ichje was not diverse; therefore they were not displayed in the table. To date, 100 types that were frequently used have been analyzed.. Deleted strokes. 6. Conclusion 1) Advantages. 3) Added or deleted strokes Added strokes. ⓒ 2014 Information Processing Society of Japan. One of the main purposes of building a national database is to help the majority of people find the data more easily. Thus, icheja characters except for the ones used in order to explain character styles are inputted in the similar looking standard forms so that users do not have to type the image characters when editing the texts in the database for their use.. 7.

(8) IPSJ SIG Technical Report. 2) Drawbacks As the characters consisting different components are typed in the standard form, it is impossible to confirm the original shape of the characters. Therefore, users should examine the image file of the original text to check the character shape. This can be troublesome for those who study graphonomy or compile a dictionary. 3) Database Update. Vol.2014-DD-94 No.1 2014/7/24. ꇺ愲祉敾祉澮潵諙斟焪, “祉敾祉澮潵諙” 蕓20饖, 2008. 5. 13) 韀氷槞, 㺀ꇺ愲澮ꃭ 羗ꓻ敾敾榉艶蒝(1) -ꇺ愲澮ꃭ帉就塔槪-㺁, 熘濠祉澮斟焪, “熘濠祉澮斟” 蕓36饖, 2008. 9. 14) 韀氷槞, 㺀ꇺ愲澮ꃭ 羗ꓻ敾敾榉艶蒝(2) -蝸,粣,繗,账㦮腯财⯒ 襦葖㦒⪲-㺁, ꇺ愲帋実ꌃ閖ꂉ, “瘸濶澮崽” 蕓35饖, 2010. 15) 罚脟霉, 㺀澓棜鯸ꖖ着ⳡ “敾斟訪”⭙ 昴㫥艶蒝㺁, ꇺ愲祉澮帋実斟焪, 祉澮帋実艶蒝, 蕓 24饖, 2012. 16) SK C&C Consortium, Academic research report - Search engine upgrades dictionary table, the National Institute of Korean History(“㫢➩⭹ድⰲ⭶☵⭎ ሹኅ≽ቩ⚥ -ሉ♒⭝⽍ ☵ⶍ㛕⳽⎝ ⭎ᎁᶑ⳽ᩥ-”, 愲帙蠏袩揻弈焪), 2005.11.30. 17) 韀氷槞, 嶁挒斟奴隒嫀铽澮 “煄ꗕ煄澮綢羗ꓻ敾艶蒝”, 2010.12.. Figure 12. The Principle of The contents in the Classifying Icheja Table have been updated for around ten years. The remaining task is the reorganization of the icheja groups according to the Principle of Classifying Icheja (Figure 12)[13]. First step is to examine icheja characters input in a standard form or ihyeongja in the current database. Next step is to consider whether the present grouping is appropriate. On the one hand, if there are any inappropriate pairs of icheja-standard form or icheja-inhyeongja the group is disbanded and information is revised. On the other hand, there are similar icheja characters that are currently input, these characters are bound into one group with the same standard character.. Reference 1) 鯸珕踖, “ꇺ愲姾敾閃”, 壅蝗壅澮崽荥, 1986. 2) 昍鴫蠏袩旋, “疊敾姾敾昴箎銏”, 煵網語, 1930. 3) 鯸ꖖ着, “澓棜ꃭ”, 1945. 4) 愲蓲愲铅艶蒝ꂉ, “ꇺ愲祉敾羗ꓻ敾铦爢”, 2002. 5) 韀氷槞, “ꇺ愲羗ꓻ敾銏”, 2014.5. 6) 韀氷槞, “ꇺ愲羗ꓻ敾実” 駸就. 7) 韀氷槞, 㺀羗ꓻ屋旁尭ꊅ巆屮㺁, 2006梛. 8) ISO/IEC JTC1/SC2/WG2/IRG N1921_ROK_Feedback1_Consol_Comm. Doc. #: Korea 9) JTC1/SC2 K2191_26, ISO/IEC JTC1/SC2/WG2/IRG N1979_KR_Feedback1. Doc. #: Korea JTC1/SC2 K2247_1. 9) 煵慔罙, 㺀“꓿ꢾ捎迶螺”ⳡ 羗ꓻ敾艶蒝㺁, 塔愲铅澮斟艶蒝焪, “塔愲铅澮斟铽ꃭ” 蕓7邆, 1995. 6. 10) 煵炖骇, 㺀疊敾⯉ 羗ꓻ敾⭙ ᢉ㫥訪昆㺁, ꇺ愲塔铅塔澮斟焪, “塔铅塔澮斟” 蕓23饖, 1999. 11) 鯸揲鉴, 㺀羗ꓻ敾㺁ⳡ 旁觐‫ܠ‬尭ꊅ ↘ 訪鯲濠盼⭙ 鏧㫡⭵㺁, 塔愲铅澮斟艶蒝焪, “塔愲铅澮斟铽ꃭ” 蕓15邆, 2000. 12) 鑘徭, 㺀羗ꓻ敾澛縭ⳡ 縥盨ኅ 晼煂(1)-羗ꓻ敾ⳡ 戡煓琑樜ኅ 駸縥夊敾銏澛縭ⳡ 痻餪ἅ 塔槪ⳅᷥ-㺁, 13 The Principle of Classifying Icheja (␗㧓ุᐃศ㢮ཎ๎) contains the result of the 2006 Icheja Information Search Project; the list of icheja was categorized into 316 types.. ⓒ 2014 Information Processing Society of Japan. 8.

(9)