• 検索結果がありません。

鹿児島大学リポジトリ

N/A
N/A
Protected

Academic year: 2021

シェア "鹿児島大学リポジトリ"

Copied!
6
0
0

読み込み中.... (全文を見る)

全文

(1)

AN IMPROVEMENT OF A PATTERN MATCHING ALGORITHM

BY MAKING USE OF A STATISTICAL PROPERTY OF

LANGUAGES

著者

TOGASHI Akira, KAWASAKI Hirotoki

journal or

publication title

鹿児島大学理学部紀要. 数学・物理学・化学

volume

16

page range

27-30

別言語のタイトル

言語の統計的性質を利用したパターンマッチングア

ルゴリズムの改良について

URL

http://hdl.handle.net/10232/6406

(2)

AN IMPROVEMENT OF A PATTERN MATCHING ALGORITHM

BY MAKING USE OF A STATISTICAL PROPERTY OF

LANGUAGES

著者

TOGASHI Akira, KAWASAKI Hirotoki

journal or

publication title

鹿児島大学理学部紀要. 数学・物理学・化学

volume

16

page range

27-30

別言語のタイトル

言語の統計的性質を利用したパターンマッチングア

ルゴリズムの改良について

URL

http://hdl.handle.net/10232/00007015

(3)

Rep. Fac. Sci., Kagoshima Univ., (Math., Phys., & Chem.), No.16, p.27-30, 1983

AN IMPROVEMENT OF A PATTERN MATCHING

ALGORITHM BY MAKING USE OF A

STATISTICAL PROPERTY OF LANGUAGES

Akira Togasi* and Hirotoki Kawasaki*

(Received September 9, 1983)

Abstract

In this paper we give an algorithm improving processing times for a pattern matching

● ●

problem, which is often used to information retrieval, by using statistical properties of

languages.

1. Introduction

The statistical properties of the natural languages have been investigated by several researchers. Especially there are many papers describing the investigation about proper-ties relating to consecutive characters in a word, and also the properproper-ties of the vocal sounds in it, since the alphabets used in the India-European language being very small sets, we can easily seek statistical properties of languages.

In this paper we consider an improvement about the number of handling times of the pattern matching to the natural language.

2. Pattern Matching

Suppose that r is a string whose length is m, n is a string whose length is n, and assume that m>n.

In this paper, we say r a text and k a pattern. Moreover suppose r and k are strings

onthe alphabet ∑-{tfi,..., as) i.e. r,7C∈∑+-∑*-w.

Then the problem whether a pattern n exists in a text z or not is said to be a pattern matching problem. And it is the problem we consider in this paper.

3. Basic Algorithm

We give one of the most primitive algorithms for the pattern matching (algorithm 1).

First, let us denote the i-th character of r and n by z{i) and x{i¥ respectively. Fig. 1 is the aowchart of algorithm 1.

(4)

A. Togasi and H. Kawasaki

Fig. 1 The flowchart of Algorithm 1.

The program of Algorithm 1.

1-1,j-l,M-l

L:IF ri)-i;)

THEN IF j-n

THEN O.K.

ELSE IF /-1

THEN

M-i-i+l

]-)+l

GO TO L

ELSE   2-2+1

J-j+1

GO TO L

ELSE IF i-m-n+l ¥Jm-n

THEN false

ELSE i-M+l

-1

GO TO L

Algorithm 1 starts from M-l, compares rd+M-1) with x(i). If the matching

succeeds to i- n, it turns on O.K. (i.e. k exists in r), and if the unmatching happens on the

halfway, adds one to M and does the same things. When M- m-n+l, ii the unmatching

happens, as it is impossible to increase M more, so it turns on false{n does not exist in r).

Then let TN be the number of times comparing rii+M-1) with n¥i) until it turns

on O.K. or false.

(5)

t)

m川川  umm山川日掛仙打hmRHH国肌川mmmWm川川川mm川N相  川川M川仙川m川mmm川帆山川仙川矧川川仙      川川仙川脚h川相川川

An Improvement of a Pattern Matching Algorithm 29

most (when it turns on O.K. or false on M-m-n+1, i- n after the unmatchings happen

on i-n from M-¥ to M-m+n), that is,

n≦TN<nX(tn-n+1).

Definition 1. Given a text r and a pattern n, we define the referring order by the number of times going through the label L of algorithm 1.

4. An improvement of the algorithm

When the text r is a string on the natural language, the appearance pattern of r is not uniform. The probability of z{i)-aj is not generally equal to that of r(i)-a/{j:^j ).

So let us assume that />,-, is the probability of z{i)-aj and pt/ that of r(i)-aj.

Then we can de丘ne an appearance pattern matrix ;

(≡

P-♪11 '-- -Pr, ●      ● \Pis蝣-Pms inthis,triviallywehave s 2pu-l j=l[i-l,2,-,m) Given方suchthat tc-an..Mme∑+, theprobabilitythatkexistsfromk-thinris n Hbj十k-ljtj  --  -・・----・・・-  -     --      (1) .7-1

Definition 2. Given the pattern %, we define the averaged referring order by the average of the referring order of matching by the text r which has an appearance pattern matrix P.

Moreover, suppose Tk is the averaged referring order that n exists from k-th in r.

Let us denote by F′・ the averaged referring order that the unmatching happens on 〟 - ㍗(㍗

-1, 2,..., k-1). Then we have immediately

n h-¥

F,-∑ hinpji+r-l,ijX(1 Ph+r, ih))

.7-1 (Fr is independent of k), and k-1

∴ Tk-∑Ft+n

1-1

Therefore the averaged referring order 7'of Algorithm 1 is

t n

T-∑ Tk npJ+k-i.    ----・-・--・-・--・・・--・-   (4)

.7-1

¥t-m-n+¥)

Now we know that T is dependent on {1, 2,..., /} which is the order M moves inthe

algorithm 1. So we can decrease 7'by changing the order 〟 moves.

Suppose that

W-{w¥w is apermutationof {1, 2,..., t}},

(6)

30 A. Togasi and H. Kawasaki

W∋W-{wi,..., Wt)

as the new order M moves, we denote the new Tk by Then Tkw is k-1 7V-2 Fwt+n一        (5) z-l So7Wis t n

Tw-∑   npj, _i,a-・-  -   -・・-     -・・--・- (6)

j-¥

Therefore choosing w* such that

Tw -min T"

w<=W

as new order 〟 moves, then

qrw <^p

So we can improve the algorithm 1. We show the algorithm 2 improving the algori-thml.

The program of Algorithm 2 Search w* such that min Tw

welT

i-wi,y-i,M-¥

L: IF tU)-icU)

THEN IF j-n

THEN O.K.

ELSE IF /-1

THEN M-i

ELSE

i-i+l

/-;+!

GO TO L

1-1+1

- +1

GO TO L

ELSE IFi-t v m-n

THEN false

ELSE i - u)M^¥

7-1 GO TO L References

[ 1 ] Suen, C.Y : n-Gram Statistics for Natural Language Understanding and Test Processing, IEEE Trans. Vol. PAMI-1, No.2, April 1979

[ 2 ] Knuth, E.E., Morris, J.HJr., Pratt, V.R. : Fast Pattern Matching in Strings, SIAM J. Comput Vol.6, No. 2, pp. 323-350 (1977).

[ 3 ] Boyer, R.S. and Moore, J.S. : A Fast String Searching Algorithm, Comm. ACM, Vol. 20, No. 10,

pp. 762-772 (1977).

[4] Guibas, LJ. and Odlyzko, A.M.: A New Proof of the Linearity of the Boyer-Moore String Searching Algorithm, SIAM J. Comput., Vol. 9, No. 4, pp. 672-682 (1980).

参照

関連したドキュメント

Next, we prove bounds for the dimensions of p-adic MLV-spaces in Section 3, assuming results in Section 4, and make a conjecture about a special element in the motivic Galois group

Maria Cecilia Zanardi, São Paulo State University (UNESP), Guaratinguetá, 12516-410 São Paulo,

Abstract The representation theory (idempotents, quivers, Cartan invariants, and Loewy series) of the higher-order unital peak algebras is investigated.. On the way, we obtain

In Section 13, we discuss flagged Schur polynomials, vexillary and dominant permutations, and give a simple formula for the polynomials D w , for 312-avoiding permutations.. In

Debreu’s Theorem ([1]) says that every n-component additive conjoint structure can be embedded into (( R ) n i=1 ,. In the introdution, the differences between the analytical and

Then it follows immediately from a suitable version of “Hensel’s Lemma” [cf., e.g., the argument of [4], Lemma 2.1] that S may be obtained, as the notation suggests, as the m A

Applications of msets in Logic Programming languages is found to over- come “computational inefficiency” inherent in otherwise situation, especially in solving a sweep of

Abstract The classical abelian invariants of a knot are the Alexander module, which is the first homology group of the the unique infinite cyclic covering space of S 3 − K ,