• 検索結果がありません。

情報損失指標の非数値データへの適用

N/A
N/A
Protected

Academic year: 2021

シェア "情報損失指標の非数値データへの適用"

Copied!
5
0
0

読み込み中.... (全文を見る)

全文

(1)情報処理学会研究報告 IPSJ SIG Technical Report. Vol.2017-MPS-113 No.10 Vol.2017-BIO-50 No.10 2017/6/23. 1,a). 2. ILD ILD ILD ILD. 2. Application of an Information Loss Index to Non-numerical Data Hiroko Akiyama1,a). Masaaki Wada2. Abstract: We define a new information loss index ILD as the ratio of the amount of information lost by microaggregation. Since ILD is based on distance between data, it is applicable not only to numerical data, but also to data like ones represented by character strings if we choose suitable distance. In this paper, we apply ILD to non-numerical datasets, and discuss choice of distance. For microaggregation of numerical datasets using the average values as the representatives of groups, ILD coincides with the information loss index based on the sum of squares of the difference between the average and data. In this sense, ILD is a natural extension of the information loss index. Keywords: Information Loss, Microaggregation, Anonymization. 1. ILD Information Loss based on Distance [1] [2]. ILD 3 ILD. 1. Nagano Collage of Technology 2. a). ILSSDM Osaka University h [email protected]. ⓒ 2017 Information Processing Society of Japan. Information. Loss based on Sum of Square Difference from the Mean. 1.

(2) 情報処理学会研究報告 IPSJ SIG Technical Report. Vol.2017-MPS-113 No.10 Vol.2017-BIO-50 No.10 2017/6/23. [3] [4] [5] [6] [7] [8], [9], [10]. ( I(A) > I(A). ILD. ILSSDM ILD. ILSSDM ILD =. 4. 2.. ILD. ( I(A) − I(A) I(A). ILD. 2.1. 2.1 (. ). D = (1, 2, 3, 4). D1 = (1, 2). A = (x1 , . . . , xN ), xi ∈ X(i = 1, . . . , N ) X. 2. x, y. X. •. 2 D. ( = 32 I(D) = 40 I(D). X = Rn. x, y ∈ Rn. 2.2 (. ! " n "$ d(x, y) = # (xi − yi )2 x, y ∈ X ⎧ ⎨1 d(x, y) = ⎩0. X. 1. S( = (a1 , a1 , a2 , a2 ). a1 a2 2. (x ̸= y). S. (x = y) 2. S(. X. X. ( = 32 I(S) = 144 I(S) ILD =. S (( S = (a, a, a, a). A. I(A) =. = 0.2. S1 = (a11 , a12 ) S. ILD =. A. 40−32 40. ) 4. S2 = (a21 , a22 ). A A⊆X. x. ( D. S1 S2. X. x, y. ILD =. S = (a11 , a12 , a21 , a22 ). i=1. •. D2. ( = (1.5, 1.5, 3.5, 3.5) D. d(x, y). •. D2 = (3, 4). D1. a. 144−32 144. = 0.778. ( S(. 144−0 144. (( I(S) =0. =1. 2. y I(A). N $ N $. d(xi , xj )2. i=1 j=1. 1. 2.2 ILD. 3. ILD A = (x11 , . . . , x1n1 , x21 , . . . , x2n2 , . . . , xm1 , . . . , xmnm ). ILD. A1 , . . . , A m Ak = (xk1 , . . . , xknk ) (k = 1, . . . , m) Ak x ˆk. 3.1 Ak. ILD 8. D D=( ). ( = (ˆ ˆ1 , x ˆ2 , . . . , x ˆ2 , . . . , x ˆm , . . . , x ˆm ) A x1 , . . . , x. ⓒ 2017 Information Processing Society of Japan. 2.

(3) 情報処理学会研究報告 IPSJ SIG Technical Report. ( D=(. Vol.2017-MPS-113 No.10 Vol.2017-BIO-50 No.10 2017/6/23. ). ILD. (1) ( = 48 Idiscr (D) = 56 Idiscr (D) ILDdiscr = 0.14285. (2). D. ( D D. 1. ( D D. ( = 576 It (D). ! D. 4. 2. I(D). It (D) = 1440. 56. ILDt = 0.6. 1440. ! I(D). 648. ILD ILD. 48. 0.14285. 576. 0.60000. 640. 0.01234. 3.2 ILD. Rn 1≤p≤∞ 2. dp (x, y) =. ⎧) ⎨( n. i=1. ⎩max. d2 (x, y). (3). 6. ( D D. 1. |xi − yi |p ) p. i=1,...,n. |xi − yi |. p=∞. ILD. 2. 2.1. p=1 2. 2 p=1. 1. I(D). ( = 640 Ig (D) = 648 Ig (D). 56. ILDg = 0.01234. 3. 1≤p<∞. 3.1. 3 4 4. p−. D. 272 168. ! I(D). ILD ILD. 48. 0.14285. 160. 0.41176. 160. 0.04762. 3.1. 1 ⓒ 2017 Information Processing Society of Japan. 3.

(4) 情報処理学会研究報告 IPSJ SIG Technical Report. 4. ILD. Vol.2017-MPS-113 No.10 Vol.2017-BIO-50 No.10 2017/6/23. I(X) =. ILSSDM. $ $. x∈X y∈X. =. $ $. x∈X y∈X. = ILD. $ $. x∈X y∈X. ILSSDM. |x − y|2 |(x − x) + (x − y)|2 (|x − x|2 + |x − y)|2 ). = 2N 2 var(X). (3). 4.1 ILSSDM N X1 , . . . , X m xij (0 ≤ j ≤ ni ). Xi. ni. X ∈ Rn. Xi 2 SSE =. 4.3 xi. ILD. X. i=1 j=1. |xij − xi |2. I(X ′ ) =. SSA =. i=1. SST =. i=1 j=1. m $ m $. ni nj |xi − xj |2. (4). (2), (4). ni |xi − x|2. I(X) − I(X ′ ) = 2. ni m $ $. X′. i=1 j=1. x m $. Xi. xi. (Sum of Squared Errors). ni m $ $. ILSSDM. X1 , . . . , X m. m $ m $. ni nj (var(Xi ) + var(Xj )). i=1 j=1. =2. m $ m $. ni nj var(Xi ). i=1 j=1. |xij − x|2 = SSE + SSA. = 2N. m $. ni var(Xi ). (5). i=1. ILSSDM (3), (5). I(X) − I(X ′ ) I(X) )m ni var(Xi ) = i=1 N var(X). SSE SST − SSA = ILSSDM = SST SST Xi. var(Xi ) ILSSDM =. X. )m. var(X). ni var(Xi ) N var(X). i=1. ILD =. ILD. (1). (6). ILSSDM. 4.4 ILD. N. 2. O(N ). 4.2. R. 3.1 I(X) =. m $ m $ $ $. i=1 j=1 x∈Xi y∈Xj. =. m $ m $ $ $. i=1 j=1 x∈Xi y∈Xj. =. m $ m $ $ $. i=1 j=1 x∈Xi y∈Xj. =. m $ m $ i=1 j=1. ILSSDM ILSSDM. |x − y|. O(N ). n. ILD. 2. |(x − xi ) + (xi − xj ) + (xj − y)|2. 5.. (|x − xi |2 + |xi − xj |2 + |xj − y|2 ). ILD. ni nj (var(Xi ) + |xi − xj |2 + var(Xj )) (2) X. ILD. x. ⓒ 2017 Information Processing Society of Japan. 4.

(5) 情報処理学会研究報告 IPSJ SIG Technical Report. ILD. Vol.2017-MPS-113 No.10 Vol.2017-BIO-50 No.10 2017/6/23. ILD. p−. ILD ILSSDM ILD. [1]. [2]. [3]. [4]. [5]. [6]. [7]. [8]. [9]. [10]. ILSSDM. Domingo-Ferrer and Vicen c Torra. Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, Vol. 11, No. 2, pp. 195–212, 2005. Domingo-Ferrer, Josep and Mart´ınez-Ballest´e, Antoni and Mateo-Sanz, Josep Maria and Seb´e, Francesc, Efficient multivariate data-oriented microaggregation, The VLDB Journal The International Journal on Very Large Data Bases, Vol. 15, No.4, pp. 355–369, 2006. Anthony WF Edwards and L Luka Cavalli-Sforza. A method for cluster analysis. Biometrics, pp. 362–375, 1965. AD Gordon and JT Henderson. An algorithm for euclidean sum of squares classification. Biometrics, pp. 355–362, 1977. Pierre Hansen, Brigitte Jaumard, and Nenad Mladenovic. Minimum sum of squares clustering in a low dimensional space. Journal of Classification, Vol. 15, No. 1, pp. 37–55, 1998. James MacQueen, et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1, pp. 281–297. Oakland, CA, USA., 1967. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, Vol. 58, No. 301, pp. 236–244, 1963. Agusti Solanas, Antoni Martinez-Balleste, and J Domingo-Ferrer. V-mdav: a multivariate microaggregation with variable group size. In 17th COMPSTAT Symposium of the IASC, Rome, pp. 917–925, 2006. Oganian, Anna and Domingo-Ferrer, Josep. On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Commission for Europe, Vol. 18, No. 4, pp. 345–353, 2001. Domingo-Ferrer, Josep and Mateo-Sanz, Josep Maria. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and data Engineering, Vol. 14, No. 1, pp. 189–201, 2001.. ⓒ 2017 Information Processing Society of Japan. 5.

(6)

参照

関連したドキュメント

The issue of classifying non-affine R-matrices, solutions of DQYBE, when the (weak) Hecke condition is dropped, already appears in the literature [21], but in the very particular

By the algorithm in [1] for drawing framed link descriptions of branched covers of Seifert surfaces, a half circle should be drawn in each 1–handle, and then these eight half

The last sections present two simple applications showing how the EB property may be used in the contexts where it holds: in Section 7 we give an alternative proof of

Greenberg and G.Stevens, p-adic L-functions and p-adic periods of modular forms, Invent.. Greenberg and G.Stevens, On the conjecture of Mazur, Tate and

It is not a bad idea but it means that since a differential field automorphism of L|[x 0 ] is given by a birational transformation c 7→ ϕ(c) of the space of initial conditions, we

The limiting phase trajectory LPT has been introduced 3 as a trajectory corresponding to oscillations with the most intensive energy exchange between weakly coupled oscillators or

The proof uses a set up of Seiberg Witten theory that replaces generic metrics by the construction of a localised Euler class of an infinite dimensional bundle with a Fredholm

Using the batch Markovian arrival process, the formulas for the average number of losses in a finite time interval and the stationary loss ratio are shown.. In addition,