関連ベクターマシンに基づく非線形回帰モデリング

(1)

Nonlinear Regression Modeling via the Relevance Vector Machine

数学専攻松田和己

Kazuki Matsuda

1 ^はじめに

近年の計算機システムの飛躍的な向上は

,

データネットワークの発展と相まって

,

医学

,

薬学

,

環境科学

,

経済学

,

マーケティングなどの諸分野において

,

大量かつ多様なデータの獲得・蓄積を可能にした

.

集約されたデータベースから

,

背後の現象構造を解明し

,

有益な情報を効率的に抽出するためには

,

より有用な多変量解析手法の研究・開発が不可欠である

.

(RVM, Tipping, 2001)

は

,

カーネル関数を用いたベイズアプローチによる学習機であり

,

データの一部に依存する疎なモデルを構築する手法として

,

回帰・

判別問題に広く応用されている

. RVM

に基づく非線形回帰モデリングでは

,

通常のモデリング手法と同様に

,

モデル選択の問題が本質となる

. RVM

回帰モデルに対しては

,

広く利用されている情報量規準

AIC

や

BIC

などによる安定的なモデル評価が困難であり

,

モデルの評価基準として有効に機能しない

.

この問題に対して

,

予測分布に基づくモデルの評価を検討し

,

数値実験による比較検証を行う

.

また

, RVM

回帰モデルの評価・選択プロセスを克服する方法について提案する

.

大規模な自然災害などは

,

不連続な構造変化を与えることが考えられる

.

このような変化点を持つデータに対しては

,

変化点を適切に捉えることが重要となる

.

これに対して

, Tateishi and Konishi(2011)

は

, RVM

回帰に基づく変化点探索手法を提案している

.

この手法をより有用な手法とするための修正を提案し

,

修正手法の有用性を検証する

.

^最後に

, 2011

^年

3

月に発生した東日本大震災に関するデータに対して不連続なモデルによる当てはめを実行する

.

1

(2)

2 RVM ^{回帰モデリング}

説明変数

y

と

1

次元目的変数

x

に関して観測された

n

組のデータを

{ (y _i , x _i ); i =

1, 2, · · · , n }

^とする

. RVM

に基づく回帰モデリングでは

,

非線形なモデルを構成する方法

として広く利用される基底展開法に対して

,

基底関数にガウス型カーネル関数を利用した次のようなモデルを考える

.

y i = w 0 +

∑ n

j=1

w j exp {

− (x i − x j ) ² 2h ²

}

+ ε i i = 1, 2, · · · , n. (2.1)

ただし

, w j (j = 0, 1, · · · , n)

は各基底関数の重みを調整する係数パラメータで

, h ²

はガウ

ス型カーネル関数の広がりの程度を調整するパラメータである

.

^また

, ε i (i = 1, 2, · · · , n)

は互いに無相関に正規分布

N (0, β ⁻¹ )

に従う誤差項であり

, β

は誤差の散らばりを調整する分散パラメータである

.

このとき

,

係数パラメータベクトル

w = (w 0 , w 1 , · · · , w n ) ^T

に対して

, ARD(Automatic Relevance Determination)

事前分布

p(w | α) =

∏ n

j=0

N (w _j ; 0, α ⁻ _j ¹ ) (2.2)

による推定を実行すると

,

大部分の係数は

0

と推定され

,

疎なモデルが構築される

.

ここ

で

, α j (j = 0, 1, · · · , n)

は各係数パラメータ

w j

に対応する超パラメータである

.

このとき

,

モデル選択の問題として

,

ガウス型カーネル関数に含まれる調整パラメータ

h ²

に対して最適な値を決定する必要がある

.

しかしながら

, RVM

に基づいて構成された疎なモデルは

,

調整パラメータ

h ²

に対する変動が大きく

,

モデル評価規準として広く用いられている情報量規準

AIC, BIC

などは評価基準として有効に機能しない

.

この問題に対して

,

予測分布に基づいて導出される予測情報量規準

PIC

によるモデル評価を検討し

,

数値実験により他のモデル評価基準との比較検証を行う

.

また

, RVM

回帰モデルの評価・選択のプロセスを克服し

,

さらに

,

より柔軟なモデルを構築する手法として

, Multi-Overlapping RVM

を提案する

.

3 RVM ^{に基づく変化点探索}

大規模な自然現象や企業の倒産

,

物質の化学変化などは

,

関連する現象に対して突発的な変化を与え

,

不連続な変化点を生じさせると考えられる

.

現象構造が不連続性を内包する場合

,

変化点を適切に捉えることが重要となる

.

2

(3)

Tateishi and Konishi(2011)

は

, RVM

回帰を用いて変化点推定を行い

,

その結果を利用することで

,

不連続な回帰構造を構築する手法を提案した

.

この手法についてより有用な手法とするための修正を提案する

.

また

, 2011

年

3

月

11

日に発生した東日本大震災に関連するデータへの適用を考える

.

以下は

, 2011

年

3

月

15

日に茨城県東海村で観測された

10

分ごとの放射線量データへの適用結果である

.

0 20 40 60 80 100 120 140

0500100015002000250030003500

x

y

0 20 40 60 80 100 120 140

0500100015002000250030003500

x

y

図

1

に茨城県東海村で観測された

10

分ごとの放射線量データ

(

左

)

と適用結果

(

右

)

参考文献

[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle, 2nd Inter. Symp. Information Theory (eds. by B. N. Petrov and F. Csaki), Akademiai Kiado, Budapest, 267 ― 281. (Reproduced in Breakthroughs in Statistics, Vol. I, Foundations and Basic Theory (eds. S. Kotz and N. L. Johnson), Springer-Verlag, New York, (1992) 610 ^― 624.) [2] Akaike, H. (1977). On entropy maximization principle, Applications of Statistics (Krishinaiah,

P.R., ed.), North-Holland, 27(41).

[3] Akaike, H. (1978). A Bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics. 30, 9-14.

[4] Akaike, H. (1979). A Bayesian extension of the minimum AIC procedure of autoregressive model ﬁtting. Biometrika, 66, 237-242.

[5] Ando, T., Konishi, S. and Imoto, S. (2008). Nonlinear regression modeling via regularized radial basis function networks. Journal of Statistical Planning and Inference, 138, 3616 ^― 3633.

[6] Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.

[7] Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach, 2nd ed., Springer.

[8] Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation, Numerische Mathe- matik. 31, 377-403.

[9] Davison, A. C. (1986). Approximate predictive likelihood. Biometrika, 73, 323 ^― 32.

[10] de Boor, C. (2001). A Practical Guide to Splines. Springer.

[11] Denison, D. G. T., Holmes, C. C., Mallick, B. K. and Smith A. F. M. (2002). “Bayesian Methods for Nonlinear Classiﬁcation and Regression”. Wiley

3

(4)

[12] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion), Ann. Statist., 32, 407 ^― 499.

[13] Friedman, J., Hastie, T. and Tibshirani, R. (2009). Regularization paths for generalized linear models via coordinate descent, Technical Report, Stanford University.

[14] Gijbels, I., Lambert, A. and Qiu, P. (2007). Jump-preserving regression and smoothing using local linear ﬁtting: a compromise. Annals of the Institute of Statistical Mathematics, 59, 235 ^― 272.

[15] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models, London, Chapman & Hall.

[16] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning (2nd edition). Springer ^― Verlag, New York.

[17] Imoto, S. and Konishi, S. (2003). Selection of smoothing parameter in B-spline nonparametric regression models using information criteria. Annals of the Institute of Statistical Mathematics, 55, 671 ^― 687.

[18] Kawano, S. and Konishi, S. (2007). Nonlinear regression modeling via regularized Gaussian basis functions. Bull. Inform. Cybern., 39, 83 ― 96.

[19] Kitagawa, G. (1997). Information criteria for the predictive evaluation of bayesian models.

Communications in Statistics-Theory and Methods, 26, 2223 ^― 2246.

[20] Konishi, S., Ando, T. and Imoto, S. (2004). Bayesian information criteria and smoothing parameter selection in radial basis function networks. Biometrika, 91, 27 ― 43.

[21] Konishi, S. and Kitagawa, G. (1996). Generalised information criteria in model selection.

Biometrika, 83, 875 ^― 890.

[22] Konishi, S. and Kitagawa, G. (2008). Information Criteria and Statistical Modeling. Springer.

[23] Kullback, S. and Leibler, R. A. (1951). On information and suﬃciency, The Annals of Mathe- matical Statistics, 22, 79 ^― 86.

[24] Loader, C. R. (1996). Change point estimation using nonparametric regression. Annals of Statis- tics, 24, 1667 ― 1678.

[25] MacKay, D. J. C. (1994). Bayesian methods for backpropagation networks. In E. Domany, J. L.

van Hemmen, and K. Schulten (Eds.), Models of Neural Networks III, Chapter 6, pp. 211254.

Springer.

[26] Muller, H. G. (1992). Change-points in nonparametric regression analysis. Annals of Statistics, 20, 737 ^― 761.

[27] Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer.

[28] Qiu, P. (2003). A jump-preserving curve ﬁtting procedure based on local piecewise-linear kernel estimation. Journal of Nonparametric Statistics, 15, 437 ^― 453.

[29] Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461 ^― 464.

[30] Speckman, P. L. (1997). Detection of change-points in nonparametric regression. Unpublished manuscript.

[31] Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion), J. Roy. Statist. Soc. B, 36, 111 ― 147.

[32] Tateishi, S. and Konishi, S. (2011). Nonlinear regression modeling and detecting change points via the relevance vector machine. Computational Statistics, 26, 477-490.

[33] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser.

B, 58, 267 ― 288.

[34] Tipping, M.E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211 ^― 244.

4

関連ベクターマシンに基づく非線形回帰モデリング