リーマン計量調整に基づくTucker多様体の幾何の提案と最適化問題への応用 (最適化技法の最先端と今後の展開)

(1)

リーマン計量調整に基づく

Tucker

多様体の

幾何の提案と最適化問題への応用

電気通信大学大学院情報理工学研究科情報ネットワーク工学専攻笠井裕之

Amazon

Development

Centre

_India,

Bamdev Mishra

Hiroyuki

Kasai

Department

of

_Computer

and Network

_Engineering,

The

_University

of Electro‐Communications

Bamdev Mishra

Amazon

_Development

Centre India

概要

本稿では,低ランクテンソルTucker分解のための新し

い幾何空間

Scaled Tucker.Manifold による

テンソル補完

問題

の効率的な手法を提案した論文

_[1]

の概要を記す.提案

手法は,.一般的なテンソル回帰問題に対して,Scaled

Tucker

Manifold

により効率的な解法を確立することが可能となる.

Scaled TuckerManifol

の導出にあたっては,Tucker

分解の

対称構造と回帰問題の最小自乗構造に着目した新しいリーマ

ン計量を提案し,幾何空間を定義する数々の構成要素を導出

している.

1 Introduction

This _paper addresses the

_problem

of low‐rank tensor

_completion

when the rank is a

priori

knownorestimated. Without lossof

_generality,

wefocuson3‐ordertensors. Given

atensor\mathcal{X}^{n}1^{\mathrm{X}n}2\times n_{3},whose entries

\mathcal{X}_{i_{1},i_{2},i_{3}}^{\star}

are

only

known forsomeindices

(i_{1}, i_{2}, i_{3})\in $\Omega$,

where $\Omega$ is a subset of the

complete

set of indices

\{(i_{1}, i_{2}, i_{3})

:

i_{d}\in\{1, . . . , n_{d}\},

d\in

\{1

,

2,

3 the

fixed‐rank

tensor

completion problem

isformulatedas

\displaystyle \min_{\mathcal{X}\in \mathbb{R}^{n}1^{\mathrm{X}n}2^{\mathrm{X}n}3}\frac{1}{| $\Omega$|}\Vert \mathcal{P}_{ $\Omega$}(\mathcal{X})-P_{ $\Omega$}(\mathcal{X}^{\star})\Vert_{F}^{2}

subject

to rank

_{(\mathcal{X})=\mathrm{r},}

where the

_operator

_{\mathcal{P}_{ $\Omega$}(\mathcal{X})_{i_{1}i_{2}i_{3}}=\mathcal{X}_{i_{1}i_{2}i_{3}}}

if

_{(i_{1}, i_{2}, i_{3})\in $\Omega$}

and

_{\mathcal{P}_{ $\Omega$}(X)_{i_{1}i_{2}i_{3}}=0}

otherwise and

_(with

a

slight

abuse of

notation) \Vert \Vert_{F}

is the Frobenius norm.

rank(X) (=\mathrm{r}=

(r_{1},

r_{2}, r_{3} called the multilinear rank of \mathcal{X}, istheset of the ranks of for each of mode‐

d

_unfolding

matrices. r_{d}\ll n_{d} enforces a low‐rank structure. The mode is a matrix

obtained

_by

_{concatenating}

the mode‐d fibers

_along

column and mode‐d

_unfolding

of \mathcal{X} is

(2)

The

_optimization

_problem

₍₁₎

has _many

_variants,

and one of those is

extending

the

nuclear norm

regularization approach

from the matrixcase

[2]

tothe tensorcase. While

this

_{generalization}

leadsto

_good

results

_[3−5],

its

_{scalabilityto large‐scale}

instances isnot

trivial, especially

due tothe

_necessity

of

_{high‐dimensional singular}

value

_{decomposition}

computations.

A different

_{approach exploits}

Tucker

_{decomposition}

_[6,

Section

_4]

of a

low‐rank tensor \mathcal{X} to

_{develop large‐scale algorithms}

for

_(1),

e.g., in

[7, 8].

The

present

paper

exploits

both the

symmetry

present inTucker

_{decomposition}

and the

_{least‐squares}

structureof thecostfunction of

₍₁₎

_by

_using

the

_concept

of

_{preconditioning.}

While_precon‐

ditioning

inunconstrained

_optimization

iswell studied

_[9,

_Chapter

_5],

_{preconditioning}

on

constraints with

_symmetries,

_owing

to

_{non‐uniqueness}

of Tucker

_{decomposition[6,}

Sec‐

tion

_4.3],

is not

_{straightforward.}

_{We build upon the} recent work

_[10]

that

_suggests

to

use Riemannian

preconditioning

witha tailored metric

(inner product)

inthe Rieman‐

nian

_optimization

frameworkon

quotient

manifolds

[11−13].

Our

proposed preconditioned

nonlinear

_conjugate

_{gradient algorithm}

is

_implemented

intheMatlab toolbox

_Manopt

_[14]

and it

_outperforms

state‐of‐the‐art methods. \mathrm{I}\mathrm{n}\cdot \mathrm{t}\mathrm{h}\mathrm{e}

_{supplementary}

material

_section,

we show concretemathematical derivations and additional numerical

_comparisons.

We also

provide

a

generic

Manopt factory

(a

manifold

description

Matlab

file)

with additional

support

for second‐order

_{implementations,}

_{e.g., the}

_{trust‐region}

method.

2 _Exploiting

the

_problem

structure

We focusonthetwofundamentalstructures

_present

in

_(1):

_symmetry

inthe

_constraints,

and the

_{least‐squares}

structure of the cost function.

_Finally,

anovel metric is

proposed.

The

_quotient

and

_{least‐squares}

structures. The Tucker

_{decomposition}

ofatensor

X\in \mathbb{R}^{n}1\times 2 of rank \mathrm{r}

(=(r_{1}, r_{2}, r_{3}))

is

[6,

Section

4.1]

X=\mathcal{G}\times 1\mathrm{U}_{1}\times 2\mathrm{U}_{2^{\times}3}\mathrm{U}_{3}

,where

\mathrm{U}_{d}\in \mathrm{S}\mathrm{t}(r_{d}, n_{d})

for

_d\in\{1

,

2,

3

\}

belongs

tothe

Stiefel

manifold

of matrices of size n_{d}\times r_{d}

with

_orthogonal

columns and

_{\mathcal{G}\in \mathbb{R}^{r_{1}\times r_{2}\times r_{3}}}

.

Here,

\mathcal{W}\times d\mathrm{V}\in \mathbb{R}^{n1\times}\ldots n_{d-1}\times m\times n_{d+N}\mathrm{u}\times\cdots n

computes

the d‐mode

_product

of a tensor \mathcal{W}\in \mathbb{R}^{n1\times\cdots \mathrm{x}n}\backslash .N _and

a matrix V \in \mathbb{R}^{m\times n}d.

Tucker

_{decomposition}

is not

_unique

as X remains

unchanged

under the transforma‐

tion

(\mathrm{U}_{1}, \mathrm{U}_{2}, \mathrm{U}_{3}, \mathcal{G})\mapsto(\mathrm{U}_{1}\mathrm{O}_{1}, \mathrm{U}_{2}\mathrm{O}_{2}, \mathrm{U}_{33,1123}\mathrm{O}\mathcal{G}\times \mathrm{O}^{T}\mathrm{x}_{2}\mathrm{O}^{T}\times \mathrm{O}_{3}^{T})

for all

_{\mathrm{O}_{d}\in O(r_{d})}

,

which is the set of

_orthogonal

matrices of size of _{r_{d}\times r_{d}}. The classical

remedy

to

remove this

indeterminacy

is to have additional structures on

\mathcal{G}

like

sparsity

or re‐

stricted

_orthogonal

rotations

_[6,

Section

_4.3].

In

_contrast,

we encode the transforma‐

tion in an abstract search space of

equivalence classes,

defined _as,

[(\mathrm{U}_{1}, \mathrm{U}_{2}, \mathrm{U}_{3}, \mathcal{G})]

:=

\{(\mathrm{U}_{1}\mathrm{O}_{1}, \mathrm{U}_{2}\mathrm{O}_{2}, \mathrm{U}_{3}\mathrm{O}_{3}, \mathcal{G}\times \mathrm{O}^{T_{\mathrm{X}_{2}}}\mathrm{O}^{T}\times \mathrm{O}_{3}^{T}) : 0_{d}\in \mathcal{O}(r_{d})\}|

. Thesetof

equivalence

classes

isthe

_quotient

manifold

_[15,

Theorem

_9.16]

\mathcal{M}/\sim :=\mathcal{M}/(\mathcal{O}(r_{1})\times \mathcal{O}(r_{2})\times \mathcal{O}(r_{3}))

,

where\mathcal{M} iscalled the totalspace

(computational space)

that is the

product

space\mathcal{M} :=

(3)

composition,

the local minima of

₍₁₎

in\mathcal{M} are not

_isolated,

but

_they

become isolated on

\mathcal{M}/\sim

.

Consequently,

the

problem

(1)

is an

optimization

problem

on a

quotient

mani‐

fold for which

_systematic

_procedures

are

proposed

in

[11−13]

by endowing

\mathcal{M}/\sim

with a

Riemannianstructure. Wecall

_{\mathcal{M}/\sim}

the Tucker

_manifold.

Another structure that is

_present

in

₍₁₎

is the

_{least‐squares}

structure of thecostfunc‐

tion. A way to

exploit

it is to endow the search space with a metric

(inner product)

induced

_by

the Hessian of the cost function

_[9].

This induced metric

_(or

its

_approxi‐

mation)

resolves _convergence issues of first‐order

_optimization

_{algorithms. Specifically}

for the case of

_quadratic

_optimization

with rank constraint

(matrix case),

Mishra and

Sepulchre

[10,

Section

_5]

_proposea

family

of Riemannian metricsfrom the Hessian of the

costfunction. Since

_applying

this

_{approach directly}

for

₍₁₎

is

_{computationally costly,}

we

consider a

simplified

cost function

_{by assuming}

that $\Omega$ contains the full set of

_indices,

i.e.,

we focus on

\Vert \mathcal{X}-X^{\star}\Vert_{F}^{2}

_{to propose} a metric candidate. A

good

candidate is

by

considering only

the block

_diagonal

elements of the Hessian of

_{\Vert X-X^{\star}\Vert_{F}^{2}}

. It should

emphasized

that the cost function

_{||X-X^{\star}\Vert_{F}^{2}}

is convex and

quadratic

in X. Conse‐

quently,

it is also convex and

quadratic

in the

_arguments

(\mathrm{U}_{1}, \mathrm{U}_{2}, \mathrm{U}_{3}, \mathcal{G})

individually.

The block

_diagonal

_{approximation}

of the Hessian of

_{\Vert \mathcal{X}-\mathcal{X}^{\star}\Vert_{F}^{2}}

in

_{(\mathrm{U}_{1}, \mathrm{U}_{2}, \mathrm{U}_{3}, \mathcal{G})}

is

((\mathrm{G}_{1}\mathrm{G}_{1}^{T})\otimes \mathrm{I}_{n}1, (\mathrm{G}_{2}\mathrm{G}_{2}^{T})\otimes \mathrm{I}_{n}2, (\mathrm{G}_{3}\mathrm{G}_{3}^{T})\otimes \mathrm{I}_{n3}, \mathrm{I}_{r1r2r}3)

,where

\mathrm{G}_{d}

isthe mode‐d

unfolding

of

\mathcal{G}

and is assumed to be full rank. The terms

\mathrm{G}_{d}\mathrm{G}_{d}^{T}

for

_d\in\{1

,

2,

3

\}

are

positive

definite

when

_{r_{1}\leq r_{2}r_{3}, r_{2}\leq r_{1}r_{3}}

, and

r_{3}\leq r_{1}r_{2}.

A novel Riemannian metric and its motivation. An elementx inthe total_space

\mathcal{M} has the matrix

_{representation}

_{(\mathrm{U}_{1}, \mathrm{U}_{2}, \mathrm{U}_{3}, \mathcal{G})}

.

Consequently,

the

tangent

space

T_{x}\mathcal{M}

is the Cartesian

_product

of the

_tangent

_spaces of the individual

_manifolds,

_{i.e., T_{x}\mathcal{M}}

has the matrix characterization

_{[13] T_{x}\mathcal{M}=\{(\mathrm{Z}_{\mathrm{U}_{1}}, \mathrm{Z}_{\mathrm{U}_{2}}, \mathrm{Z}_{\mathrm{U}_{3}}, \mathrm{Z}_{\mathcal{G}})\in \mathbb{R}^{n\mathrm{x}r_{1}}1\times \mathbb{R}^{n2\times r2}\times}

\mathbb{R}^{n3\times r}3\times \mathbb{R}^{r\mathrm{x}r2\times r}13 :

\mathrm{U}_{d}^{T}\mathrm{Z}_{\mathrm{U}_{d}}+\mathrm{Z}_{\mathrm{U}_{d}}^{T}\mathrm{U}_{d}=0

, for

d\in\{1

,

2,

3 The earher discussion on

symmetry

and

_{least‐squares}

structureleads tothe novel metric_{g_{x}} :

T_{x}\mathcal{M}\times T_{x}\mathcal{M}\rightarrow \mathbb{R}

g_{x}($\xi$_{x}, $\eta$_{x}) =\langle$\xi$_{\mathrm{U}_{1}}, $\eta$_{\mathrm{U}_{1}}(\mathrm{G}_{1}\mathrm{G}_{1}^{T})\rangle+\langle$\xi$_{\mathrm{U}_{2}}, $\eta$_{\mathrm{U}_{2}}(\mathrm{G}_{2}\mathrm{G}_{2}^{T})\rangle

+\{$\xi$_{\mathrm{U}_{3}}, $\eta$_{\mathrm{U}_{3}}(\mathrm{G}_{3}\mathrm{G}_{3}^{T})\rangle+\{$\xi$_{\mathcal{G}}, $\eta$_{\mathcal{G}}\rangle,

where

_{$\xi$_{x}, $\eta$_{x}\in T_{x}\mathcal{M}}

are

_tangent

vectorswith matrix

_{characterizations,}

_{($\xi$_{\mathrm{U}_{1}}, $\xi$_{\mathrm{U}_{2}}, $\xi$_{\mathrm{U}_{3}}, $\xi$_{\mathcal{G}})}

and

_{($\eta$_{\mathrm{U}_{1}}, $\eta$_{\mathrm{U}_{2}}, $\eta$_{\mathrm{U}_{3}}, $\eta$_{\mathcal{G}})}

,

respectively

and

\rangle

isthe Euclidean inner

product.

Ascontrasts

to the classical Euclidean

_metric,

the metric

₍₂₎

scales the levelsets of thecost function

on the search _space that leads a

preconditioning

effect on the

algorithms developed

on

the Tucker manifold.

3 Notions of

_optimization

on

quotient

manifolds

Each

_point

on a

_quotient

manifold

_represents

an entire

_equivalence

class of matrices

in the total _space. Abstract

_geometric

_objects

on a

quotient

manifold call for matrix

(4)

but under

_appropriate

_{compatibility}

between the Riemannian structure of\mathcal{M} and the

Riemannian structure of the

_quotient

manifold

_{\mathcal{M}/\sim}

,

they

define

algorithms

on the

quotient

manifold. Once we endow

\mathcal{M}/\sim

with aRiemannian

_structure,

the constraint

optimization

problem

(1)

is

_conceptually

transformedintoanunconstrained

optimization

over the Riemannian

quotient

manifold

(2).

When the

points

x and _y in \mathcal{M}

belong

to

the same

equivalence class, they

_represent

a

single

point

[x] :=\{y\in \mathcal{M} : y\sim x\}

onthe

quotient

manifold

_{\mathcal{M}/\sim}

. The abstract

tangent

space

T_{[x]}(\mathcal{M}/\sim)

at

[x]\in \mathcal{M}/\sim

has

the matrix

_{representation}

in

_{T_{x}\mathcal{M}}

, but restricted to the directions that do not induce

a

displacement along

the

equivalence

class

[x]

. This is realized

by decomposing

T_{x}\mathcal{M}

into two

_{complementary subspaces.}

_{The vertical space,}

_{\mathcal{V}_{x}}

is the

_tangent

_space of the

equivalence

class

_[x]

. On the other

hand,

the horizontal space

\mathcal{H}_{x}

is the

orthogonal

subspace

to

_{\mathcal{V}_{x}}

,

i.e., T_{x}\mathcal{M}=\mathcal{V}_{x}\oplus \mathcal{H}_{x}

. The horizontal

subspace provides

a valid matrix

representation

to the abstract

_tangent

_space

_{T_{[x]}(\mathcal{M}/\sim) [}

11, Section

3.5.8].

An abstract

tangent

vector

_{$\xi$_{[x]}\in T_{[x]}(\mathcal{M}/\sim)}

at

_[x]

has a

_unique

element

$\xi$_{x}\in \mathcal{H}_{x}

that is called its

horizontal

_lift.

Endowed with the Riemannian metric

_(2),

the

_quotient

manifold

_{\mathcal{M}/\sim}

is a Riemannian submersionof \mathcal{M}. The submersion

principle

then allows to work out

concrete matrix

_{representations}

of abstract

_object

on

\mathcal{M}/\sim

.

Particularly,

starting

from

an

arbitrary

matrix

(with

appropriate

dimensions),

twolinear

projections

areneeded: the

first

_{projection $\Psi$_{x}}

isontothe

_tangent

_space

_{T_{x}\mathcal{M}}

,while the second

projection

$\Pi$_{x}

isonto

the horizontal

_subspace

\mathcal{H}_{x}

. The

computation

costof these

projections

is

O(n_{1}r_{1}^{2}+n_{2}r_{2}^{2}+

n_{3}r_{3}^{2})

.

Finally,

we_propose aRiemannian nonlinear

conjugate

gradient algorithm

for

(1)

that

scales wellto

_{large‐scale}

instances.

_{Specifically,}

we usethe

conjugate

gradient implemen‐

tation of

_Manopt

with the

_ingredients

described in Table??. The convergence

analysis

of this method follows from

_[11,

_16,

_17].

If

_{f(X)=\Vert P_{ $\Omega$}(X)-P_{ $\Omega$}(\mathcal{X}^{\star})\Vert_{F}^{2}/| $\Omega$|}

, then

the Riemannian

_gradient

_{\mathrm{g}\mathrm{r}\mathrm{a}\mathrm{d}_{x}f}

, which has the matrix characterization

$\Psi$(\mathrm{e}\mathrm{g}\mathrm{r}\mathrm{a}\mathrm{d}_{x}\prime f)

,

where

_{\mathrm{e}\mathrm{g}\mathrm{r}\mathrm{a}\mathrm{d}_{\mathrm{x}}f}

is the Euclidean

_gradient

of

_f

. We show a way to compute a

step‐size

guess

effectively.

The total

computational

cost per iterationofour

proposed algorithm

is

O(| $\Omega$|r_{1}r_{2}r_{3})

,where

| $\Omega$|

isthe number of known entries.

4 Numerical

_comparisons

We show numerical

comparisons

ofour

proposed algorithm

with state‐of‐the‐art

algo‐

rithms that include

_TOpt

_[7]

and

_geomCG

_[8],

for

comparisons

with Tucker

_{decomposition}

based

_algorithms,

and HaLRTC

_[3],

Latent

_[4],

and Hard

_[5]

as nuclear norm minimiza‐

tion

_algorithms.

All simulations are

performed

in Matlab on a 2.6 GHz Intel Core i7

machine with 16 GB RAM. For

_specific

_operations

with

_unfoldings

of S, we usethe mex

interfacesthat are

provided

in

geomCG.

For

large‐scale

instances,

our

algorithm

is

only.

compared

with

_geomCG

asother

algorithms

cannot handle these instances. We

_randomly

(5)

sampling

(OS)

ratio,

to create the

_training

set $\Omega$.

Algorithms

(and

problem

instances)

are initialized

randomly,

as in

[8],

and are

stopped

when either the mean _square error

(MSE)

onthe

training

set $\Omega$ isbelow

10^{-12}

orthe number of iterations exceeds 250. We

also evaluate themean _squareerror on atest set $\Gamma$, which is different from $\Omega$. Five runs

are

performed

ineach scenario.

Case 1 considers

_synthetic

small‐scale tensorsof size

_{100\times 100\times 100, 150\times 150\times 150,}

and 200\times 200\times 200 and rank

_{\mathrm{r}=(10,10,10)}

areconsidered. OS is

{10,

_20,

30}.

The result

shows that the convergence behavior of our

_{proposed algorithm}

is either

_competitive

or

faster than the others.

_Next,

Case 2 considers

_{large‐scale}

tensors of size 3000\times 3000\times

3000,

5000\times 5000\times 5000,and 10000\times 10000\times 10000 and ranks

\mathrm{r}=(5,5,5)

and

(10,10,10).

OS is 10. Our

_{proposed algorithm outperforms geomCG.}

Case 3 considers instances

where the dimensions and ranks

_along

certainmodesaredifferent than others. Twocases

areconsidered. Case

(3.a)

considerstensorssize

_{20000\times 7000\times 7000, 30000\times 6000\times 6000,}

and 40000\times 5000\times 5000 with rank

_{\grave{\mathrm{r}}=(5,5,5)}

. Case

(3.b)

considers atensor of size

10000\times 10000\times 10000 with ranks

_(7,

_6,

_6),

_(10,

_5,

_5),

and

_(15,

_4,

_4).

In all the _cases, the

proposed algorithm

converges faster than

geomCG. Finally,

Case4considers MovieLens‐

10\mathrm{M} dataset that contains 10000054

_ratings

_{corresponding}

to 71567 users and 10681

movies. We

_split

the time into

_7‐days

wide bins

_results,

and

_finally,

_get

a tensor of size

71567\times 10681\times 731. The fraction of known entries is less than

0.002%.

We

perform

five

random

_{80/10/10‐train/validation/test}

_partitions.

The maximum iteration is set to 500.

Our

_{proposed algorithm consistently}

_gives

lowertesterrorsthan

geomCG

acrossdifferent ranks.

5 Conclusion and future work

We have

_proposed

a

preconditioned

nonlinear

conjugate

gradient algorithm

for the

tensor

_{completion problem by exploiting}

the fundamentalstructuresof

_symmetry,

dueto

non‐uniqueness

of Tucker

_{decomposition,}

and

_{least‐squares}

of thecost function. A novel

Riemannian metric is

_proposed

that enablestousethe versatile Riemannian

_optimization

framework. Numerical

_{comparisons suggest}

that our

proposed algorithm

has a

superior

performance

ondifferent benchmarks.

参考文献

[1]

H. Kasai and B. Mishra. Low‐ranktensor

_completion:

aRiemannian manifold _pre‐

conditioning approach.

In

_ICML,

2016.

[2]

E. J. Candèsand B. Recht. Exact matrix

_completion

viaconvex

optimization.

Found.

(6)

[3]

J.

_Liu,

P.

_Musialski,

P.

_Wonka,

and.J. Ye. Tensor

_completion

for

_{estimating missing}

values in visual data. IEEE Trans. Pattern Anal. Mach.

_Intell.,

_{35(1):208-220}

, 2013.

[4]

R.

_Tomioka,

K.

_Hayashi,

and H.

Kashimaf

Estimationof low‐ranktensorsviaconvex

optimization.

Technical

_report,

arXiv

_preprint

\mathrm{a}\mathrm{r}\mathrm{X}\mathrm{i}\mathrm{v}:1010.0789, 2011.

[5]

M.

_{Signoretto, Q.}

T.

_Dinh,

L.D.

_Lathauwer,

and J. A. K.

_{Suykens. Learning}

withten‐

sors: aframework based on convex

optimization

and

spectral regularization.

Mach.

Learn.,

94(3):303-351

, 2014.

[6]

T. G. Kolda and B. W. Bader. Tensor

_{decompositions}

and

_{applications.}

SIAM

_Rev.,

51(3):455-500

, 2009.

[7]

M.

_Filipovič

and A. Jukič. Tucker factorization with

_missing

data with

_application

tolow‐n‐ranktensor

_completion.

Multidim.

_{Syst. Sign; P.,}

2013.

[8]

D.

_Kressner,

M.

_{Steinlechner,}

and B.

_{Vandereycken.}

Low‐ranktensor

_{completion by}

Riemannian

_{optimization.}

BIT Numer.

_Math.,

_{54(2):447-468}

, 2014.

[9]

J. Nocedal and S. J.

_Wright.

Numerical

_{optimization,}

volume Second Edition.

Springer,

2006.

[10]

B. Mishra and R.

_Sepulchre.

Riemannian

_{preconditioning.}

SIAM J.

_Optim.,

26(1):635-660

, 2016.

[11]

P.‐A.

_Absil,

R.

_Mahony,

and R.

_{Sepulchre. 0ptimization Algorithms}

onMatrix Man‐

ifold.

s. Princeton

University

Press,

2008.

[12]

S. T. Smith.

_{optimization techniques}

onRiemannian manifold. In A.

Bloch, editor,

Hamiltonian and Gradient

_Flows,

_Algorthms

and

_Control,

volume

_3,

_{pages 113‐136.}

Amer. Math.

_{Soc., Providence, RI,}

1994.

[13]

A.

_Edelman,

T.A.

_Arias,

and S.T. Smith. The

_geometry

of

_algorithms

with

_orthog‐

onality

constraints. SIAMJ. MatrixAnal.

_Appl.,

_{20(2):303-353}

, 1998.

[14]

N.

_Boumal,

B.

_Mishra,

P.‐A.

_Absil,

and R.

_{Sepulchre. Manopt:}

aMatlabtoolbox for

optimization

onmanifolds. J. Mach. Learn.

Res.,

15(1):1455‐1459,

2014.

[15]

J. M. Lee. Introductiontosmooth

_manifolds,

volume 218 of Graduate Texts inMath‐

ematics.

_{Springer‐Verlag,}

New

_York,

second

_edition,

2003.

[16]

H. Sato and T. Iwai. A new,

globally

convergent

Riemannian

_conjugate

_gradient

method.

_{optimization,}

_64(4):

_1011‐1031,

2015.

[17]

W.

_Ring

and B. Wirth.

_optimization

methods onRiemannian manifolds and their

リーマン計量調整に基づくTucker多様体の幾何の提案と最適化問題への応用 (最適化技法の最先端と今後の展開)