Introduction to Sequential Data assimilation methods:

(1)

Introduction to Sequential Data assimilation methods:

Their mathematical basis and recent development

Tomoyuki Higuchi

Research Organization of Information and Systems The Institute of Statistical Mathematics/JST CREST

1/42

Deductive Approach

Inductive Approach

TESD: Four Kinds of Methodology of Science

T:Theory E:Experiment

S:Simulation D 䠖 Massive Data Analysis

(Axle)

Data Assimilation

Drive Force for Science

2/42 ^{Red color}indicates a slide used in the last year’s presentation

(2)

3

What is Data Assimilation?

• Emerging subject in meteorology and oceanography.

• Methodology to synthesize numerical simulation model and observed data

– Simulation model can not reflect real physics accurately.

• 䠄e.g.䠅Accurate weather forecast needs good initial conditions.

• Uncertainty in the model (boundary condition, initial condition, unknown parameters, unknown dynamics...) exists.

– Observation data have some physical/budgetary restrictions.

Correct variables in numerical simulation model using observation data. = Data Assimilation

Simulation model Observation data

3/42

Objects of Data Assimilation from a viewpoint of Meteorology and Oceanography

[1] To produce the best (better) initial condition for forecasting. It is actually realized in the real weather forecast (ex., Japan Meteorological Agency).

[2] To find the best (better) boundary condition in constructing a simulation model. This procedure includes a setting of appropriate boundary conditions necessary for dealing with a coupled phenomena.

[3] To attain an optimal parametervector that appears in an empirical law (scheme) employed for describing complicated phenomena with the different time and spatial scales. A validationof the empirically given values is

regarded as this problem.

[4] To inter/extrapolate (estimate) an physical quantity at times and locations without observations based on a numerical simulation model. This procedure is called “a generation of re-analysis dataset (product)”. This dataset is used to discover a new scientific findings by general geophysical researchers.

[5] To conduct an experiment with a virtual observation network and perform a sensitivity analysis in an attempt to construct an effective observation

(3)

5

Outline

• Mathematical basis and Bayesian computation

• Sequential data assimilation

• Ensemble-based nonlinear filtering method

䞊Particle filter (PF)

• Advanced methods for PF

䞊Merging PF 䞊Meta PF

䞊PF with GPGPU

• Conclusions

5/42

Construction of Simulation Model

Observation points and observed variables are limited.

) , , ( _i _i _i

i T U V

[

i 1 i

physical variable vector is assigned at each grid point. temperature

Wind vector

1

[i

) , ( _t ₁ _t

t F x v

x

) ( _t₁

t F x

x

vt

w

w ₂ t cx x

䠄simplified meteorological model around Japan䠅

PDE : Partial differential equation

(time varying)

6/42

(4)

777

Past and

Present Present

and Future State

x _t

State Vector 䠖 Contact point between past and future

) ( _t ₁

t F x

x

Simulation Model

State of time t-1

State of timet

7/42

From one path to PDF (=Probability Distribution Function)

Simulation model

1 t t

t F x

x

1

| _t _t

t F x

x

Simulation model with uncertainty

x

t

x

t

_t _t₁

t F x

x

p G

xt

p

t t

t h x

y |

Relation to observed data _yt

_xt _yt

p |

Estimate

DA

^xt

xt yy

p |

1 t

t x

F

)

| (

_t _t

p x y

Conditional distribution

Next slide

(5)

999

Conditional and Joint Probabilities

9/42

䠝

^B

䠞

A

{ ) A (

p Probability of A

) { B A, (

p Probability of A and B

{ ) B

| A (

p Probability of B given A

Conditional Probability Joint Probability

Total: 100

# of consumers to buy a bag of coffee grounds : 30

# of consumers to buy milk: 60

# of consumers to buy a coffee bag and milk: 10

A=1: Buy a bag of coffee grounds

=0: Not buy B=1: Buy milk

=0: Not buy

A B

20 10 50

30 60

20 )

10 20 (

10 30

) 10 1 A

| 1 B ( 100, ) 10 1 B , 1 A ( 100, ) 30 1 A

( p p

p

¦¦

⁴

1 6

1

1 ) B , A (

i j

j i

p

10 10 10

Marginalization

10/42

¦

⁶

1

) B

, 3 A ( )

3 A (

j

j p

p p ( A i , B j )

Joint probability

A=3

B=1 B=6

A

A=1: Yahoo

=2: Google

=3: Microsoft

=4: Others

B=1: NEC, =2: Fujitsu, =3: Dell

=4: Toshiba, =5: Apple , =6: Others Others

B

possible

¦

B _'

) B B , A ( )

A A (

j j

j i

i p

p

PC Web Search Engine

³ ⁽ ^A, ^B ⁾ ^B

) A

( p d

p

(6)

11 11 11

Bayes’ Theorem

11/42

䠝

^B

䠞

A

B A

20 10 50

30 60

20

¦

) A ( A)

| (B

) A ( A)

| (B )

B A, (

) B A, ( )

B (

B) , ) (A

B

| A (

) A ( ) A

| B ( B) , ) (A

A (

B) , ) (A

A

| B (

A A

p p

p p p

p p

p p p p

䚷䚷䚷䚷䚷䚷

䛾䛸䜛ྍ⬟ᛶ

50 10

10 100

70 70 50 100

30 30

10 100 30 30 10

0) (A 0) A

| 1 (B 1) (A 1) A

| 1 (B

1) (A 1) A

| 1 (B ) 1 B

| 1 A (

p p

p

It is

easy

^to

calculate.

A=1: Buy a bag of coffee grounds (Search Engine type) B=1: Buy milk (PC type)

Generative Model, Inversion with Bayes’ theorem, and Data Assimilation

y x x y p( | )

Data distribution :Forward Posterior distribution:

Inverse

y x y x p( | ) x

x p( )

Prior distribution :Forward

Build a generative model and Use Bayes’ theorem Simulation Fitness of Simulation to Data

x

Latent Variables䠖

) , (

) ( )

| (

) (

) ( )

| ) (

| (

y x p

x p x y p

y p

x p x y y p

x p

v

{

䚷䚷䚷䚷

Bayes’ Theorem

䚷䚷䚷䚷

x䠖simulation variables

y䠖_data

Prior dist.

Data dist.䠄likelihood function䠅

Posterior dist.

(7)

) ,

(

) ,

( ₁

t t

t

t t

t

w x

h y

v x

f

x

Data Assimilation in Generalized State Space Model

State Vector䠄Simulation variables䠅

Stochastic simulation model

Observation model

Measurement model

map nonlinear :

L L

t t

G G

!!

' '

1 step time simulation :

ns observatio of

time sampling :

14/42

G t

' t

Missing value (Outlier)

time integration

%D\HVLDQ HVWLPDWLRQ

Prior Belief about values of x Posterior

Improved knowledge about values of x

13 Likelihood

Feasibility of realization of y for given x

) ( )

| ( )

|

( x y p y x p x

p v

13/42

Cyclical structure

(8)

x

1

x

₂

x

₃

) ( x

₁

x p

)

| ( X x

T

p

:

x

1

X {

x

t

) (

_t

x p x

x

T

State Vector and Concatenated State Vectors

15 15

Chain Structure Graphical Model

x

0

y

2

y

_t

x

2

x

_t

Latent Vector

Observable Vector

y

1

Observation model

x

1

System model

^ `

^

^x₀^,^y^x¹₁^,^,^,^,^y^x^t_t^,^,^,^,^y^x_T^T

`

)

| (

]) ,

, , [

| (

]) ,

, , [

| (

]) ,

, , [

| (

: 1 :

1

2 1 :

1

2 1 :

1

1 2

1 1

: 1

T T

t

t t

t

t t

t

p

y y

y p

y y

y p

y y

y p

y x

{

Today’s economic situation given

yesterday’s stock market data Today’s economic situation estimated by the stock market data up to today Today’s economic situation analyzed by

using all available data when we look back on the today in future

Suppose a statistical inference problem on a daily economic status given daily stock market data

15/42

(9)

Find X=[x₁, …, x_T ] that maximize

y

_T

p X |

₁_:

For t= 1, …,T :

estimate ^p

^x_t ^| ^y₁_:_T

x1 xt1 xt xt1 xT

y1 yt1 yt yt1 yT

x1 xt1 xt xt1 xT

y1 yt1 yt yt1 yT

Variational DA method Sequential DA method

•Adjoint method (4DVar)

•Representer method

•Kalman filter (KF), smoother

•Extended KF (EKF)

•Ensemble KF (EnKF)

•Particle filter (PF)

Two ways of DA method

Smoothing dist.

17/42

Optimization and Statistical Inference

x

t

t t t

t

t t

d p

p f

x x x

x

x x

³ ⁽ ⁾

ˆ

) ( )

(

^ ⁽ ⁾ `

ˆ max ) (

X X

X

f f

Dimension of Xis huge

^ `

^ ` ^

f T

`

T f

T

h f exp ( )/

/ ) ( exp

/ ) ( ) exp

( ș

ș

ș ș v

¦

18/42

(10)

Recursive formula

)

|

di ti d it (

Conditional Distribution

)

| (

)

|

(

_t _1t_: ₁

x

p x p

y y

predictive density:

filter density:

Today’s economic situation given yesterday’s stock market data Today’s economic situation

)

| (

)

| (

: 1

T t

t t

x p

y y

filter density:

smoother density:

estimated by the stock market data up to today

Today’s economic situation analyzed by using all available data when we

)

| (

_t _1T_:

p y

y

)

| ( x

_j ₁_:_k

p y j

y g

look back on the today in future

)

| ( x

_t ₁

y

₁_:_t ₁

p (

_t₁

| y

₁_:_t^prediction₁

) p ( x

_t

| y

_1t_: ₁

) ^y

¹^:^t

^{ ^{ ^y

¹

^, ^, ^y

^t

^} p

k _p ₍ _x

_t₁

_| _y

₁_:_t

₎

)

| (

_t

y

₁_:_t₁

p

)

| ( x

_t

y

_1t_:

p

filtering

)

| ( x

_n₁

y

₁_:_n₁

p

19 19

smoothing 19

)

| ( x

_t ₁

y

₁_:_T

p

)

| ( x

_t

y

_1T_:

p p ( x

_T

| y

_1T_:

)

19/42

Prediction

1 :

1

)

| ( x

_t

y

_t

p

1 1

: 1 1 1

: 1

)

| ,

(

|

³

^t ^t ^t ^t

t t

dx y

x x p y

1 1

: 1 1 1

: 1

1

, ) ( | )

|

(

³ ^p ^x

^t

^x

^t

^y

^t

^p ^x

^t

^y

^t

^dx

^t

³

)

| ( ) ,

|

(x_t x_t₁ y₁_:_t₁ p x_t x_t₁

p Markov property䠄䠍䠅

1 1

: 1 1

1

) ( | )

|

(

³ ^p ^x

^t

^x

^t

^p ^x

^t

^y

^t

^dx

^t

Filter pdf at time t-1

(11)

filtering

t

y

x

p ( |

₁_:

) Posterior, Belief

t t t

y y x p

)

| , (

) ,

| (

1 : 1

t t

t t t

y y p

)

| ( )

| (

)

| (

)

| , (

1 : 1

Markov Property䠄䠎䠅

t t

t t t

t t

y y p

y x p y

x y p

)

| (

)

| ( ) ,

| (

1 : 1

1 : 1 1

:

1 p(y_t |x_t,y₁_:_t₁) p(y_t |x_t)

Markov Property䠄䠎䠅

_t _t t

t

y y p

y x p x y p

)

| (

)

| ( )

|

(

₁_: ₁

_t _t

t t

y x p x y p

y y p

)

| ( )

| (

)

| (

1 : 1 1

: 1

³ ^p ⁽ ^y

^t

^| ^x

^t

⁾ ^p ⁽ ^x

^t

^| ^y

¹^:^t¹

⁾ ^dx

^t 21 19-2/42

Smoothing

: 1

)

| (

)

| (

³

T t

dx y

x x p y x p

1 :

1 1 :

1 1

1 :

1 1

)

| ( ) ,

| (

)

| , (

³

_t _T _t

T t

t

t T t

t

dx y

x p y

x x p

dx y

x x p

1 :

1 1 :

1 1

1 :

1 1 :

1 1

)

| ( ) ,

| (

)

| ( ) ,

| (

³

_t _T _t

t t

t

t T t

T t

t

dx y

x p y

x x p

y p

1 :

1 1 :

1

( | )

)

| (

)

| , (

³ ^p _p ^x

^t

_x ^x

^t

_y ^y

^t

^p ^x

^t

^y

^T

^dx

^t

1 :

1 : 1

1 1

: 1

: 1 1

)

| ) (

,

| ( )

| (

)

| (

³

^t ^t

^t ^t ^t

^t ^T ^t

t t

dx y

x y p

x x p y

x p

y x p

1

1 :

1 1 :

1 1

)

| ) (

| (

)

| ) (

| (

³

t t

t T t

t t

x d x p

y y p

x p

Filter Dist. Smoothing Dist.

22 1

: 1 1 :

1 1 : 1

1

( | )

)

| (

)

| ) (

|

(

³

_t _T _t

t t

t t t

t

p x y dx

y x p y p

x p

Prediction Dist.

19-3/42

(12)

Sequential Data Assimilation

)

|

( x

_t₁

y

₁_:_t₁

p p ( x

_t

| y

₁_:_t₁

) y

t Simulation

)

| ( x

_t

y

₁_:_t

p p ( x

_t₁

| y

₁_:_t

)

Simulation

) , , ,

| ( )

|

(x_i y₁_:_k p x_i y₁ y₂ y_k

p

䠄䠅

Estimate PDF of state vector or its moments (mean, variance, …) sequentially on each observation

1

y

t

xt

1

y

t

20/42

– : All variables in simulation model – : All observed variables

– : Stochastic part to represent uncertainty of model (boundary condition, …)

– : Observation error

– : Normally Gaussian

&KDOOHQJLQJSUREOHP +XJHGLPHQVLRQDQGLQYHUVLRQ

t t t t

w x H y

x v x F x

, )

(

₁ ₀

y^tt

x

t

t w

v ,

• Data Assimilation = Estimation problem of state vector 䠖

vt

wt (system model)

(observation model)

dimension

x

_t:

10 ⁴ ~10 ⁶ ^y

^t ^:10²^~10⁵

x

t

t t

t

h x w

y

or

) dim(

)

dim( x

_t

!! y

_t

x⁰: Initial condition

(13)

25 25 25

Numerical representation of distribution

True distribution

)

| ( ),

|

( x

_t

y

₁_:_t ₁

p x

_t

y

₁_:_t

p x

_t

y

₁_:_T

p

Monte Carlo approximation

Represent pdf by the actual realizations.

N: # of particles

> @

>

⁽| ⁾

@

) 2 (

| ) 1 (

|

| :

1

) (

1

| )

2 (

1

| ) 1 (

1

| 1

: 1

, , ,

)

| (

, , ,

)

| (

N t t t

t t t t

t t

t

N t t t

t t

t

x x

x X

y x p

x x

x X

y x p

{

#

{

#

22/42

^ ` x

t⁽_|ⁱt⁾₁ _i^N₁

^ x

t⁽ⁱ⁾₁_|t₁

`

_i^N₁

) , ( _t⁽ⁱ⁾₁_|_t ₁ _t⁽ⁱ⁾

t x v

f

x

t

^Time

State

Prediction Step (Common in EnKF and PF)

) 1 (

1

| 1 t

xt ) 2 (

1

| 1

t

xt

) (

1

| 1 N t

xt

) 2 (

1

|t

xt

) 1 (

1

|t

xt ) (

1

| N t

xt ) (

1

| i t

xt

Prediction step

: ensemble member of predictive PDF : ensemble member of filtered PDF simulation

1 t

23/42

(14)

^ ` x

t⁽_|ⁱt⁾₁ _i^N₁

x

t

^Time

State

Filtering Step of PF

) 2 (

1

|t

xt

) 1 (

1

|t

xt ) (

1

| N t

xt

^ ` x

t⁽_|ⁱt⁾ _i^N₁

Observation:

y

t

Filtered by a resample proportional to likelihood Calculate

likelihood for each particle likelihood

Filtering step

¸¸

¸

¹

·

¨¨

¨

©

§

¦

j

j t t t

i t t i t

t

t p y x

x y p

)

| (

)

| (

) (

1

| ) (

1 ) |

(

Z|

: ensemble member of predictive PDF : ensemble member of filtered PDF

) 2 (

|t

Zt

) 1 (

|t

Zt ) (

| N t

Zt

24/42

㽢㽢㽢㽢

1

| 1 1

: 1

1| )

( _t _t #X_t_t p x y

1

| 1 :

1 )

|

t t t

t X

p(x |y₁_:)# _| timet-1

timet

y

t

) (i

vt ^{䝅䝇䝔䝮䝜䜲䝈䝪䞊䝹}

() Ft

) (

1

| i t t

Z

N

i t t

) 1

(

Z|

25/42

(15)

Family of particle filters

• Kalman filter (EKF: Extended Kalman filter)

• EnKF: Ensemble Kalman filter (Evensen 1994)

• Particle filter

– SIR filter (e.g., Gordon et al. 1993, Kitagawa 1993)䖩 – Gaussian particle filter (e.g., Kotecha and Djuric 2003) – Kernel filter (e.g., Hurzeler and Kunsh 1998)

– Merging particle filter (MPF)

䖩It encounters a problem called ‘degeneration’ in applying to high-dimensional models. (i.e., the diversity of an ensemble is lost after repeating resampling procedures. )y p g p g p )

ͤGaussian PF: 1) Each particle in filtered ensemble is drawn from a Gaussian function with the mean and covariance of the forecast ensemble. 2) It requires high computational cost due to a factorization of a high-dimensional covariance matrix in generating Gaussian samples.

29

g g g p

26/42

Particle filter

(SIR: Sequential Importance Resampling)

A posterior (filtered) ensemble is obtained by resampling the forecast ensemble with weights of likelihood. Thus, an ensemble member is duplicated in the filtered ensemble according to its likelihood.

27/42

(16)

31

Merging particle filter (MPF)

28/42

We draw samples from the forecast ensemble with weights of , and obtain an ensemble: . A subset from this samples satisfies

because it is a filtered ensemble obtained using normal PF.

Each number of a filtered ensemble is generated as a weighted sum of n samples from the sample set as:

n N u w

i

n N u

MPF algorithm (1)

(17)

MPF algorithm (2)

In order to ensure that the newly generated ensemble

approximately preserves the mean and covariances of the filtered PDF the merging weights

D

are set to satisfy

PDF, the merging weights are set to satisfy

1

2

1 ( 3 such that 0 for all )

n n

n j

D D d D z

¦ ¦

D

j

h h i l b

1 1

1 , 1 ( 3 such tha t 0 for all )

j j

j j j

n j

D D d D z

¦ ¦

where each is a real number.

Then, a new ensemble approximation of the filtered PDF

D

j

|

1:

(

_t _t

) p x y

is obtained as

30/42

)ORZFKDUWRI3)

A concept of the “Islands” in GA is similar, but different.

31/42

(18)

0HWDSDUWLFOHILOWHU

'3)'LVWULEXWHGSDUWLFOHILOWHU

Inter-node resampling:

Resampling between ensembles (not ensemble member) each of which consists of many particles in a node. This ensemble is called

“super-particle”.

When a weight for any super- particles (i.e., nodes) exceeds 0.3, the inter-node resampling procedure is applied.

32/42

] [

1

| j t

t

:

] [

1

| j t

t

: ]

)[

( 1

| j i t

t

Z

5HVXOW

FRPSXWDWLRQDOWLPH

1 core

Each core has 4,098particles.

2 cores 4 cores 64 cores

Each core has 256 particles.

16 times

(19)

Inside GPU: ¹²⁸ ^䡚 240 parallel computing

GPGPU’s power is equivalent to a computer with 128 1.2GHz processors

3(^memory 3( ^memory

3( ^memory 3( ^memory

3( ^memory 3(^memory

3(^memory 3(^memory

4Kb=1,000 variables

Shared

memory 4,000ኚᩘ

3( 3URFHVVLQJ(OHPHQW On tip memory

Device memory 䠐GB

# of multiprocessor䠍䠒~䠏䠌

$FFHVVVSHHG

register

[ 3(

0XOWLSURFHVVRU

34/42

㽢㽢㽢㽢

1

| 1 1

: 1

1| )

1

| 1 :

1 )

|

t t t

t X

p(x |y₁_:)# _| timet-1

timet

timet+1

y

t

) (i

vt ^{䝅䝇䝔䝮䝜䜲䝈䝪䞊䝹}

() Ft

) (

1

| i t t

Z

N

i t t

) 1

(

Z|

෌ᥖ/42

(20)

㻻㼜㼠㼑㼞㼛㼚㻗㻌㼀㼑㼟㼘㼍㻌㻗㻌㻯㼘㼑㼍㼞㻿㼜㼑㼑㼐㻬㻴㼕㼓㼡㼏㼔㼕㻌㻸㼍㼎㻚

Host machine (CPU)

HP xw9400 Workstation

Dual-Core AMD Opteron(tm) Processor 2220

Tesla (GPU/Cuda)

nVIDIA S1070 1U GPU computing server (1.6GiB/1.6GHz/410GB/s)

ClearSpeed e 710)

35/42

SISmeetsGPGPU.

• SISonGPGPUdesignedforparameterestimation.

– SimulationiscarriedoutonGPGPU.

– ParameterestimationiscarriedoutonCPU.

particles PF

Opteron 2220 1core

PF

Opteron2220 1core +GPGPU(Tesla C870)

SIS

Opteron2220 1core +GPGPU(Tesla C870) 100,000,000

(1൨)

8Days (6.8㽢10⁵sec)

12Hours (4.5㽢10⁴sec)

3Hours (1.0㽢10⁴sec)

䠍㽢䠍䠑㽢䠒䠓

1,000,000,000 (10൨)

2.6Months? 5Days? 28Hours

(1.0㽢10⁵sec) (Nakamura et al., 2009)

(21)

HPC of our group

1HKDOHP

;HRQ

# of Cores

²⁰⁰⁰

200

37/42

# of Cores

Next-Generation of Supercomputer in Japan at Kobe

䕔 Grand Challenge:

-- Nanotech

(Institute for Molecular Science)

-- Life Science

^(RIKEN)

Japanese Government will spend more than 1 billion US$ for this national project. It has more than 600,000cores.

38/42

(22)

43

Research projects in progress by our group

࣭ Coupled Ocean-Atmosphere model

䞉 3D structure of ring current

࣭ Tsunami model

࣭ Ocean tide

࣭ Genome informatics

࣭ Marketing (with agent simulations)

39/42

TIPS:Achoiceofthedataassimilationmethods

Reductionofdegreeoffreedom

Degree of Freedom of the system

Information provided by data at each time- integration step of the simulation.

Nonlinearity

Small Large

Genome Informatics

Oceanandatmospheric science

(23)

Contact

Email: [email protected] Homepage:

http://daweb ism ac jp/

http://daweb.ism.ac.jp/

42/42

Introduction to Sequential Data assimilation methods: