Introduction to Sequential Data assimilation methods:
Their mathematical basis and recent development
Tomoyuki Higuchi
Research Organization of Information and Systems The Institute of Statistical Mathematics/JST CREST
1/42
Deductive Approach
Inductive Approach
TESD: Four Kinds of Methodology of Science
T:Theory E:Experiment
S:Simulation D 䠖 Massive Data Analysis
(Axle)
Data Assimilation
Drive Force for Science
2/42 Red colorindicates a slide used in the last year’s presentation
3
What is Data Assimilation?
• Emerging subject in meteorology and oceanography.
• Methodology to synthesize numerical simulation model and observed data
– Simulation model can not reflect real physics accurately.
• 䠄e.g.䠅Accurate weather forecast needs good initial conditions.
• Uncertainty in the model (boundary condition, initial condition, unknown parameters, unknown dynamics...) exists.
– Observation data have some physical/budgetary restrictions.
Correct variables in numerical simulation model using observation data. = Data Assimilation
Simulation model Observation data
3/42
Objects of Data Assimilation from a viewpoint of Meteorology and Oceanography
[1] To produce the best (better) initial condition for forecasting. It is actually realized in the real weather forecast (ex., Japan Meteorological Agency).
[2] To find the best (better) boundary condition in constructing a simulation model. This procedure includes a setting of appropriate boundary conditions necessary for dealing with a coupled phenomena.
[3] To attain an optimal parametervector that appears in an empirical law (scheme) employed for describing complicated phenomena with the different time and spatial scales. A validationof the empirically given values is
regarded as this problem.
[4] To inter/extrapolate (estimate) an physical quantity at times and locations without observations based on a numerical simulation model. This procedure is called “a generation of re-analysis dataset (product)”. This dataset is used to discover a new scientific findings by general geophysical researchers.
[5] To conduct an experiment with a virtual observation network and perform a sensitivity analysis in an attempt to construct an effective observation
5
Outline
• Mathematical basis and Bayesian computation
• Sequential data assimilation
• Ensemble-based nonlinear filtering method
䞊Particle filter (PF)
• Advanced methods for PF
䞊Merging PF 䞊Meta PF
䞊PF with GPGPU
• Conclusions
5/42
Construction of Simulation Model
Observation points and observed variables are limited.
) , , ( i i i
i T U V
[
i 1 i
physical variable vector is assigned at each grid point. temperature
Wind vector
1
[i
[i
) , ( t 1 t
t F x v
x
) ( t1
t F x
x
vt
w
w 2 t cx x
䠄simplified meteorological model around Japan䠅
PDE : Partial differential equation
(time varying)
6/42
777
Past and
Present Present
and Future State
x t
State Vector 䠖 Contact point between past and future
) ( t 1
t F x
x
Simulation Model
State of time t-1
State of timet
7/42
From one path to PDF (=Probability Distribution Function)
Simulation model
1 t t
t F x
x
1
| t t
t F x
x
Simulation model with uncertainty
x
tx
t t t1t F x
x
p G
xt
p
t t
t h x
y |
Relation to observed data yt
xt yt
p |
Estimate
DA
xtxt yy
p |
1 t
t x
F
)
| (
t tp x y
Conditional distribution
Next slide
999
Conditional and Joint Probabilities
9/42
䠝
B䠞
A
{ ) A (
p Probability of A
) { B A, (
p Probability of A and B
{ ) B
| A (
p Probability of B given A
Conditional Probability Joint ProbabilityTotal: 100
# of consumers to buy a bag of coffee grounds : 30
# of consumers to buy milk: 60
# of consumers to buy a coffee bag and milk: 10
A=1: Buy a bag of coffee grounds
=0: Not buy B=1: Buy milk
=0: Not buy
A B
20 10 50
30 60
20 )
10 20 (
10 30
) 10 1 A
| 1 B ( 100, ) 10 1 B , 1 A ( 100, ) 30 1 A
( p p
p
¦¦
41 6
1
1 ) B , A (
i j
j i
p
10 10 10
Marginalization
10/42
¦
61
) B
, 3 A ( )
3 A (
j
j p
p p ( A i , B j )
Joint probability
A=3
B=1 B=6
A
A=1: Yahoo
=2: Google
=3: Microsoft
=4: Others
B=1: NEC, =2: Fujitsu, =3: Dell
=4: Toshiba, =5: Apple , =6: Others Others
B
possible
¦
B ') B B , A ( )
A A (
j j
j i
i p
p
PC Web Search Engine
³ ( A, B ) B
) A
( p d
p
11 11 11
Bayes’ Theorem
11/42
䠝
B䠞
A
B A
20 10 50
30 60
20
¦
¦
) A ( A)
| (B
) A ( A)
| (B )
B A, (
) B A, ( )
B (
B) , ) (A
B
| A (
) A ( ) A
| B ( B) , ) (A
A (
B) , ) (A
A
| B (
A A
p p
p p
p p p
p p
p p
p p p p
䚷䚷 䚷䚷䚷䚷
䛾䛸䜛ྍ⬟ᛶ
50 10
10 100
70 70 50 100
30 30
10 100 30 30 10
0) (A 0) A
| 1 (B 1) (A 1) A
| 1 (B
1) (A 1) A
| 1 (B ) 1 B
| 1 A (
p p
p p
p p
p
It is
easy
tocalculate.
A=1: Buy a bag of coffee grounds (Search Engine type) B=1: Buy milk (PC type)
Generative Model, Inversion with Bayes’ theorem, and Data Assimilation
y x x y p( | )
Data distribution :Forward Posterior distribution:
Inverse
y x y x p( | ) x
x p( )
Prior distribution :Forward
Build a generative model and Use Bayes’ theorem Simulation Fitness of Simulation to Data
x
Latent Variables䠖
) , (
) ( )
| (
) (
) ( )
| ) (
| (
y x p
x p x y p
y p
x p x y y p
x p
v
{
䚷䚷䚷䚷
Bayes’ Theorem
䚷䚷䚷䚷
x䠖simulation variables
y䠖data
Prior dist.
Data dist.䠄likelihood function䠅
Posterior dist.
) ,
(
) ,
( 1
t t
t
t t
t
w x
h y
v x
f
x
Data Assimilation in Generalized State Space Model
State Vector䠄Simulation variables䠅
Stochastic simulation model
Observation model
Measurement model
map nonlinear :
L L
t t
t t
G G
!!
' '
1 step time simulation :
ns observatio of
time sampling :
14/42
G t
' t
Missing value (Outlier)time integration
%D\HVLDQ HVWLPDWLRQ
Prior Belief about values of x Posterior
Improved knowledge about values of x
13 Likelihood
Feasibility of realization of y for given x
) ( )
| ( )
|
( x y p y x p x
p v
13/42
Cyclical structure
x
1x
2x
3) ( x
1x p
)
| ( X x
T
p
:
x
1X {
x
t) (
tx p x
x
TState Vector and Concatenated State Vectors
15 15
Chain Structure Graphical Model
x
0y
2y
tx
2x
tLatent Vector
Observable Vector
y
1Observation model
x
1System model
^ `
^
x0,yx11,,,,yxtt,,,,yxTT`
)
| (
]) ,
, , [
| (
]) ,
, , [
| (
]) ,
, , [
| (
: 1 :
1
2 1 :
1
2 1 :
1
1 2
1 1
: 1
T T
T T
t
t t
t
t t
t
p
y y
y p
y y
y p
y y
y p
y x
y x
y x
y x
{
{
{
Today’s economic situation given
yesterday’s stock market data Today’s economic situation estimated by the stock market data up to today Today’s economic situation analyzed by
using all available data when we look back on the today in future
Suppose a statistical inference problem on a daily economic status given daily stock market data
15/42
Find X=[x1, …, xT ] that maximize
y
Tp X |
1:For t= 1, …,T :
estimate p
xt | y1:Tx1 xt1 xt xt1 xT
y1 yt1 yt yt1 yT
x1 xt1 xt xt1 xT
y1 yt1 yt yt1 yT
Variational DA method Sequential DA method
•Adjoint method (4DVar)
•Representer method
•Kalman filter (KF), smoother
•Extended KF (EKF)
•Ensemble KF (EnKF)
•Particle filter (PF)
Two ways of DA method
Smoothing dist.
17/42
Optimization and Statistical Inference
x
tt t t
t
t t
d p
p f
x x x
x
x x
³ ( )
ˆ
) ( )
(
^ ( ) `
ˆ max ) (
X X
X
f f
Dimension of Xis huge
^ `
^ ` ^
f T`
T f
T
h f exp ( )/
/ ) ( exp
/ ) ( ) exp
( ș
ș
ș ș v
¦
18/42
Recursive formula
)
|
di ti d it (
Conditional Distribution
)
| (
)
|
(
t 1t: 1x
p x p
y y
predictive density:
filter density:
Today’s economic situation given yesterday’s stock market data Today’s economic situation
)
| (
)
| (
: 1
: 1
T t
t t
x p
x p
y y
filter density:
smoother density:
estimated by the stock market data up to today
Today’s economic situation analyzed by using all available data when we
)
| (
t 1T:p y
y
)
| ( x
j 1:kp y j
y g
look back on the today in future
)
| ( x
t 1y
1:t 1p (
t1| y
1:tprediction1) p ( x
t| y
1t: 1) y
1:t{ { y
1, , y
t} p
k p ( x
t1| y
1:t)
)
| (
ty
1:t1p
)
| ( x
ty
1t:p
filtering
)
| ( x
n1y
1:n1p
19 19
smoothing 19
)
| ( x
t 1y
1:Tp
)
| ( x
ty
1T:p p ( x
T| y
1T:)
19/42
Prediction
1 :
1
)
| ( x
ty
tp
1 1
: 1 1 1
: 1
)
| ,
(
|
³
t t t tt t
dx y
x x p y
1 1
: 1 1 1
: 1
1
, ) ( | )
|
(
³ p x
tx
ty
tp x
ty
tdx
t³
)
| ( ) ,
|
(xt xt1 y1:t1 p xt xt1
p Markov property䠄䠍䠅
1 1
: 1 1
1
) ( | )
|
(
³ p x
tx
tp x
ty
tdx
tFilter pdf at time t-1
filtering
t
t
y
x
p ( |
1:) Posterior, Belief
t t t
t t t
y y x p
y y x p
)
| , (
) ,
| (
1 : 1
1 : 1
t t
t t t
y y p
y y p
)
| ( )
| (
)
| (
)
| , (
1 : 1
1 : 1
Markov Property䠄䠎䠅
t t
t t t
t t
y y p
y x p y
x y p
)
| (
)
| ( ) ,
| (
1 : 1
1 : 1 1
:
1 p(yt |xt,y1:t1) p(yt |xt)
Markov Property䠄䠎䠅
t t tt
y y p
y x p x y p
)
| (
)
| ( )
|
(
1: 1
t tt t
t t
y x p x y p
y y p
)
| ( )
| (
)
| (
1 : 1 1
: 1
³ p ( y
t| x
t) p ( x
t| y
1:t1) dx
t 21 19-2/42Smoothing
: 1
)
| (
)
| (
³
T t
dx y
x x p y x p
1 :
1 1 :
1 1
1 :
1 1
)
| ( ) ,
| (
)
| , (
³
³
t T tT t
t
t T t
t
dx y
x p y
x x p
dx y
x x p
1 :
1 1 :
1 1
1 :
1 1 :
1 1
)
| ( ) ,
| (
)
| ( ) ,
| (
³
³
t T tt t
t
t T t
T t
t
dx y
x p y
x x p
y p
y p
1 :
1 1 :
1
1
( | )
)
| (
)
| , (
³ p p x
tx x
ty y
t p x
ty
Tdx
t1 :
1 : 1
1 1
: 1
: 1 1
)
| ) (
,
| ( )
| (
)
| (
³
t t
t t t
t T tt t
dx y
x y p
x x p y
x p
y x p
1
1 :
1 1 :
1 1
)
| ) (
| ) (
| (
)
| ) (
| (
³
³
t t
t T t
t t
x d x p
y y p
x p
Filter Dist. Smoothing Dist.
22 1
: 1 1 :
1 1 : 1
1
( | )
)
| (
)
| ) (
|
(
³
t T tt t
t t t
t
p x y dx
y x p y p
x p
Prediction Dist.
19-3/42
Sequential Data Assimilation
)
|
( x
t1y
1:t1p p ( x
t| y
1:t1) y
t Simulation)
| ( x
ty
1:tp p ( x
t1| y
1:t)
Simulation
) , , ,
| ( )
|
(xi y1:k p xi y1 y2 yk
p
䠄 䠅
Estimate PDF of state vector or its moments (mean, variance, …) sequentially on each observation
1
y
txt
1
y
t20/42
– : All variables in simulation model – : All observed variables
– : Stochastic part to represent uncertainty of model (boundary condition, …)
– : Observation error
– : Normally Gaussian
&KDOOHQJLQJSUREOHP +XJHGLPHQVLRQDQGLQYHUVLRQ
t t t t
t t t t
w x H y
x v x F x
, )
(
1 0ytt
x
t
t w
v ,
• Data Assimilation = Estimation problem of state vector 䠖
vt
wt (system model)
(observation model)
dimension
x
t:10 4 ~10 6 y
t :102~105x
tt t
t
t
h x w
y
or
) dim(
)
dim( x
t!! y
tx0: Initial condition
25 25 25
Numerical representation of distribution
True distribution
)
| ( ),
| ( ),
|
( x
ty
1:t 1p x
ty
1:tp x
ty
1:Tp
Monte Carlo approximation
Represent pdf by the actual realizations.
N: # of particles
> @
>
(| )@
) 2 (
| ) 1 (
|
| :
1
) (
1
| )
2 (
1
| ) 1 (
1
| 1
| 1
: 1
, , ,
)
| (
, , ,
)
| (
N t t t
t t t t
t t
t
N t t t
t t
t t
t t
t
x x
x X
y x p
x x
x X
y x p
{
#
{
#
22/42
^ ` x
t(|it)1 iN1^ x
t(i)1|t1`
iN1) , ( t(i)1|t 1 t(i)
t x v
f
x
t
TimeState
Prediction Step (Common in EnKF and PF)
) 1 (
1
| 1 t
xt ) 2 (
1
| 1
t
xt
) (
1
| 1 N t
xt
) 2 (
1
|t
xt
) 1 (
1
|t
xt ) (
1
| N t
xt ) (
1
| i t
xt
Prediction step
: ensemble member of predictive PDF : ensemble member of filtered PDF simulation
1 t
23/42
^ ` x
t(|it)1 iN1x
t
TimeState
Filtering Step of PF
) 2 (
1
|t
xt
) 1 (
1
|t
xt ) (
1
| N t
xt
^ ` x
t(|it) iN1Observation:
y
tFiltered by a resample proportional to likelihood Calculate
likelihood for each particle likelihood
Filtering step
¸¸
¸
¹
·
¨¨
¨
©
§
¦
j
j t t t
i t t i t
t
t p y x
x y p
)
| (
)
| (
) (
1
| ) (
1 ) |
(
Z|
: ensemble member of predictive PDF : ensemble member of filtered PDF
) 2 (
|t
Zt
) 1 (
|t
Zt ) (
| N t
Zt
24/42
㽢 㽢 㽢 㽢
1
| 1 1
: 1
1| )
( t t #Xtt p x y
1
| 1 :
1 )
|
( t t #Xtt p x y
t t t
t X
p(x |y1:)# | timet-1
timet
y
t) (i
vt 䝅䝇䝔䝮䝜䜲䝈䝪䞊䝹
() Ft
) (
1
| i t t
Z
N
i t t
) 1
(
Z|
25/42
Family of particle filters
• Kalman filter (EKF: Extended Kalman filter)
• EnKF: Ensemble Kalman filter (Evensen 1994)
• Particle filter
– SIR filter (e.g., Gordon et al. 1993, Kitagawa 1993)䖩 – Gaussian particle filter (e.g., Kotecha and Djuric 2003) – Kernel filter (e.g., Hurzeler and Kunsh 1998)
– Merging particle filter (MPF)
䖩It encounters a problem called ‘degeneration’ in applying to high-dimensional models. (i.e., the diversity of an ensemble is lost after repeating resampling procedures. )y p g p g p )
ͤGaussian PF: 1) Each particle in filtered ensemble is drawn from a Gaussian function with the mean and covariance of the forecast ensemble. 2) It requires high computational cost due to a factorization of a high-dimensional covariance matrix in generating Gaussian samples.
29
g g g p
26/42
Particle filter
(SIR: Sequential Importance Resampling)
A posterior (filtered) ensemble is obtained by resampling the forecast ensemble with weights of likelihood. Thus, an ensemble member is duplicated in the filtered ensemble according to its likelihood.
27/42
31
Merging particle filter (MPF)
28/42
We draw samples from the forecast ensemble with weights of , and obtain an ensemble: . A subset from this samples satisfies
because it is a filtered ensemble obtained using normal PF.
Each number of a filtered ensemble is generated as a weighted sum of n samples from the sample set as:
n N u w
in N u
n N u
MPF algorithm (1)
MPF algorithm (2)
In order to ensure that the newly generated ensemble
approximately preserves the mean and covariances of the filtered PDF the merging weights
D
are set to satisfyPDF, the merging weights are set to satisfy
1
21 ( 3 such that 0 for all )
n n
n j
D D d D z
¦ ¦
D
jh h i l b
1 1
1 , 1 ( 3 such tha t 0 for all )
j j
j j j
n j
D D d D z
¦ ¦
where each is a real number.
Then, a new ensemble approximation of the filtered PDF
D
j|
1:(
t t) p x y
is obtained as
30/42
)ORZFKDUWRI3)
A concept of the “Islands” in GA is similar, but different.
31/42
0HWDSDUWLFOHILOWHU
'3)'LVWULEXWHGSDUWLFOHILOWHU
Inter-node resampling:
Resampling between ensembles (not ensemble member) each of which consists of many particles in a node. This ensemble is called
“super-particle”.
When a weight for any super- particles (i.e., nodes) exceeds 0.3, the inter-node resampling procedure is applied.
32/42
] [
1
| j t
t
:
] [
1
| j t
t
: ]
)[
( 1
| j i t
t
Z
5HVXOW
FRPSXWDWLRQDOWLPH
1 core
Each core has 4,098particles.
2 cores 4 cores 64 cores
Each core has 256 particles.
16 times
Inside GPU: 128 䡚 240 parallel computing
GPGPU’s power is equivalent to a computer with 128 1.2GHz processors
3(memory 3( memory
3( memory 3( memory
3( memory 3(memory
3(memory 3(memory
4Kb=1,000 variables
Shared
memory 4,000ኚᩘ
3( 3URFHVVLQJ(OHPHQW On tip memory
Device memory 䠐GB
# of multiprocessor䠍䠒~䠏䠌
$FFHVVVSHHG
register
[ 3(
0XOWLSURFHVVRU
34/42
㽢 㽢 㽢 㽢
1
| 1 1
: 1
1| )
( t t #Xtt p x y
1
| 1 :
1 )
|
( t t #Xtt p x y
t t t
t X
p(x |y1:)# | timet-1
timet
timet+1
y
t) (i
vt 䝅䝇䝔䝮䝜䜲䝈䝪䞊䝹
() Ft
) (
1
| i t t
Z
N
i t t
) 1
(
Z|
ᥖ/42
㻻㼜㼠㼑㼞㼛㼚 㻗㻌㼀㼑㼟㼘㼍㻌㻗㻌㻯㼘㼑㼍㼞㻿㼜㼑㼑㼐㻬 㻴㼕㼓㼡㼏㼔㼕㻌㻸㼍㼎㻚
Host machine (CPU)
HP xw9400 Workstation
Dual-Core AMD Opteron(tm) Processor 2220
Tesla (GPU/Cuda)
nVIDIA S1070 1U GPU computing server (1.6GiB/1.6GHz/410GB/s)
ClearSpeed e 710)
35/42
SISmeetsGPGPU.
• SISonGPGPUdesignedforparameterestimation.
– SimulationiscarriedoutonGPGPU.
– ParameterestimationiscarriedoutonCPU.
particles PF
Opteron 2220 1core
PF
Opteron2220 1core +GPGPU(Tesla C870)
SIS
Opteron2220 1core +GPGPU(Tesla C870) 100,000,000
(1൨)
8Days (6.8㽢105sec)
12Hours (4.5㽢104sec)
3Hours (1.0㽢104sec)
䠍 㽢䠍䠑 㽢䠒䠓
1,000,000,000 (10൨)
2.6Months? 5Days? 28Hours
(1.0㽢105sec) (Nakamura et al., 2009)
HPC of our group
1HKDOHP
;HRQ
# of Cores
2000200
37/42
# of Cores
Next-Generation of Supercomputer in Japan at Kobe
䕔 Grand Challenge:
-- Nanotech
(Institute for Molecular Science)-- Life Science
(RIKEN)Japanese Government will spend more than 1 billion US$ for this national project. It has more than 600,000cores.
38/42
43
Research projects in progress by our group
࣭ Coupled Ocean-Atmosphere model
䞉 3D structure of ring current
࣭ Tsunami model
࣭ Ocean tide
࣭ Genome informatics
࣭ Marketing (with agent simulations)
39/42
TIPS:Achoiceofthedataassimilationmethods
Reductionofdegreeoffreedom
Degree of Freedom of the system
Information provided by data at each time- integration step of the simulation.
Nonlinearity
Small Large
Genome Informatics
Oceanandatmospheric science