• 検索結果がありません。

SIGMODJ no47 session1 2

N/A
N/A
Protected

Academic year: 2018

シェア "SIGMODJ no47 session1 2"

Copied!
24
0
0

読み込み中.... (全文を見る)

全文

(1)

Copyright 2011 NTT Corporation

-- MapReduce ✬ --

1

㝩Ⴆ ⌀

ਥᐙ⎇ⓥຬ㩿․೎⎇ⓥຬ㪀

㪥㪫㪫䉰䉟䊋䊷䉴䊕䊷䉴⎇ⓥᚲ

Copyright 2011 NTT Corporation

⋡ᰴ

MapReduce 䈫䈲䋿

MapReduce 䈱䊂䉱䉟䊮䊌䉺䊷䊮

MapReduce 㜞ㅦൻ䈱⎇ⓥേะ

ᮡḰ஍Ꮕ / PageRank䈱⸘▚଀

MapReduce 䈱㑐ㅪᛛⴚ䈱⎇ⓥേะ

MapReduce 㜞ㅦൻ䈱ข䉍⚵䉂

Map Multi-Reduce: reduce 䈱੐೨ታⴕ

PJoin: ੐೨䊌䊷䊁䉞䉲䊢䊮ಽഀ䈫Ḱ⚿ว䈱೑↪

ⶄᢙಽᨆಣℂ䈮䈍䈔䉎MapReduceᦨㆡൻ

⎇ⓥ䈱ᣇะᕈ

(2)

Copyright 2011 NTT Corporation

MapReduce 䈱᭎ⷐ

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

skip

Copyright 2011 NTT Corporation

MapReduce 䈫䈲?

ಽᢔಣℂ䈱䊒䊨䉫䊤䊚䊮䉫䊝䊂䊦䋧䉲䉴䊁䊛

㜞䉴䉬䊷䊦ᕈ: 1ਁ䊉䊷䊄(䊙䉲䊮)䋬䊕䉺䊋䉟䊃䊂䊷䉺

䉲䊮䊒䊦䈭API: map㑐ᢙ/reduce㑐ᢙ

ಽᢔಣℂ․᦭䈱ⶄ㔀䈘(⽶⩄ಽᢔ䊶น↪ᕈ)䈏㓝䈽䈇

Google 䈏㐿⊒

web䈱ᬌ⚝䉣䊮䉳䊮䈱䊋䉾䉪䉣䊮䊄䈱↪ㅜ䈱䈢䉄㐿⊒

PageRank ⸘▚䋬ォ⟎䊐䉜䉟䊦᭴▽

2008ᐕᤨὐ䈪 20PB/day 䉕 MapReduce䈪ಣℂ

[Dean et al., OSDI 2004, CACM Jan 2008, CACM Jan 2010]

Hadoop project 䈮䈍䈇䈩䉥䊷䊒䊮䉸䊷䉴ൻ

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

(3)

Copyright 2011 NTT Corporation

MapReduce 䈱ㆡ↪଀

n-gram 䈮䉋䉎Web 䉮䊮䊁䊮䉿ಽᨆ

ᣣᧄ⺆ 30ం䊕䊷䉳

web ਄䈪ᄙ䈇ᢥ┨䈲૗䈎䋿

ㄉᦠ: 㖸ჿ⹺⼂䋬ᣣ⧷⠡⸶

Web 䈱䉪䉣䊥䊶䉝䉪䉶䉴䊨䉫

ᬌ⚝ㅪേᐢ๔ (ad-words)

೑↪⠪䈮ㆡ䈚䈢䉮䊮䊁䊮䉿䈲䋿

䊌䊷䉸䊅䊤䉟䉵ᬌ⚝

䊈䉾䊃䊪䊷䉪䊃䊤䊐䉞䉾䉪⸃ᨆ

40Gbps 䊈䉾䊃䊪䊷䉪

QoS ೙ᓮ, 㓚ኂ⸃ᨆ

serve as the incoming 92

serve as the incubator 99

serve as the independent 794

4-gram

䊌䉬䉾䊃ㆃᑧ䋬䊌䉬䉾䊃ᰳ៊

“὾㈪” ĺᳯᚭಾሶ (edo kiriko)

ᬌ⚝䊪䊷䊄 ᬌ⚝ㅪേᐢ๔

Copyright 2011 NTT Corporation

MapReduce 䈱䉝䊷䉨䊁䉪䉼䊞

MapReduce: ಽᢔಣℂ䉲䉴䊁䊛

ಽᢔ䊐䉜䉟䊦䉲䉴䊁䊛 DFS ਄䈪᭴▽

DFS: GFS (Google FS), HDFS (Hadoop DFS)

2䊐䉢䊷䉵 + Shuffle

Map 䊐䉢䊷䉵: ౉ജ䊧䉮䊷䊄Ფ䈮map㑐ᢙ䉕ታⴕ

䊘䉟䊮䊃: 䊙䉲䊮Ფ䈮⁛┙䈚䈢ಣℂ

Shuffle: ห৻key 䈱(key, value) ⟲䉕᧤䈰䉎

Reduce 䊐䉢䊷䉵: ᧤䈰䈢⚿ᨐ䈮 reduce 㑐ᢙ䉕ታⴕ

䊘䉟䊮䊃: ⶄᢙ䊙䉲䊮䈮䉁䈢䈏䉍ᔅⷐ䈭ಣℂ

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

(4)

Copyright 2011 NTT Corporation

MapReduce 䈱䉝䊷䉨䊁䉪䉼䊞(⛯䈐)

User

Program

worker

Input Data

fork fork

fork

Master

worker

worker

assign

map

assign

reduce

local

write

remote read,

sort

Output

File 0

Output

File 1

Split 1

Split 0

Split 2

Split 3

Split 4

read

worker

node

process

file

worker

worker

worker

Map 䉺䉴䉪 Reduce 䉺䉴䉪

Map 䉺䉴䉪 Map task Reduce task

shuffle

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

Copyright 2011 NTT Corporation

䊒䊨䉫䊤䊛଀: wordCount

౉ജ䊧䉮䊷䊄Ფ䈮 map㑐ᢙ䉕ㆡ↪䈜䉎

න⺆Ფ䈮 (word, 1) 䉕಴ജ䈜䉎

න⺆Ფ䈮㗫ᐲ䉕Ⓧ▚䈜䉎

[Dean et al., OSDI 2004]

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

(5)

Copyright 2011 NTT Corporation

MapReduce 䈱䊂䉱䉟䊮䊌䉺䊷䊮

䈅䉎⒳䈱㜞ㅦൻ䈱䊁䉪䊆䉾䉪㓸 + ታ䉝䊒䊥

Data intensive text processing with MapReduce,

[Lin et al., 2010]

Local Aggregation

Pairs and Stripes

Computing Relative Frequencies

Secondary Sorting

Relational Joins

Copyright 2011 NTT Corporation

Computing Relative Frequencies

㪈㪇

᧦ઙઃ䈐⏕₸ P(X|C i )䉕⸘▚䈜䉎

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

(6)

May 26, 2011 Data Mining: Concepts and Techniques 11

Naïve Bayesian Classifier: Training Dataset

Class:

C1:buys_computer = ‘yes’

C2:buys_computer = ‘no’

Data sample

X = (age <=30,

Income = medium,

Student = yes

Credit_rating = Fair)

age income studentcredit_rating_comp

<=30 high no fair no

<=30 high no excellent no

31…40 high no fair yes

>40 medium no fair yes

>40 low yes fair yes

>40 low yes excellent no

31…40 low yes excellent yes

<=30 medium no fair no

<=30 low yes fair yes

>40 medium yes fair yes

<=30 medium yes excellent yes

31…40 medium no excellent yes

31…40 high yes fair yes

>40 medium no excellent no

May 26, 2011 Data Mining: Concepts and Techniques 12

Naïve Bayesian Classifier: An Example

„

P(C

i

): P(buys_computer = “yes”) = 9/14 = 0.643

P(buys_computer = “no”) = 5/14= 0.357

„

Compute P(X|C

i

) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222

P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444

P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4

P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667

P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2

P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667

P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

„

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|C

i

) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|C

i

)*P(C

i

) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028

P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

(7)

Copyright 2011 NTT Corporation

Computing Relative Frequencies

㪈㪊

᧦ઙઃ䈐⏕₸ P(X | C i )䉕⸘▚䈜䉎

P(X | C i ) = P(Xŀ& i ) / P(C i )

䈬䈉䉇䈦䈩 MapReduce ਄䈮ታⵝ䈜䉏䈳⦟䈇䈎䋿

1. ⋥ⷰ⊛䈭ᣇᴺ: iteration 䉕2࿁ታⴕ䈜䉎

• 1 st iteration: P(C i )䉕⸘▚䈜䉎

• 2 nd iteration: P(Xŀ& i ) 䉕⸘▚䈚䈩 P(X | C i ) 䉕ᓧ䉎

2. 䊂䉱䉟䊮䊌䉺䊷䊮䈪䈱ᣇᴺ: iteration 1࿁䈪ታⴕ

• map 㑐ᢙ䈪 2⒳㘃䈱୯䉕 emit : P(C i )↪, P(Xŀ& i ) ↪

• C i 䉕䉨䊷䈫䈚䈩 partition ಽ䈔

• P(C i )䉕వ䈮⸘▚䋬ᰴ䈮 P(Xŀ& i ) 䉕⸘▚䋬 P(X | C i ) 䉕ᓧ䉎

P(C

1

): 䉮䊮䊏䊠䊷䉺⾼౉䈜䉎⏕₸

P(age = “<=30” ŀ&

1

): 30ᱦએਅ䈪⾼౉

Copyright 2011 NTT Corporation

Computing Relative Frequencies

㪈㪋

map (String key, String value){

// value is an input record

for each class in value:

EmitIntermediate(“_C1_”+class, 1);

for each attribute in value:

EmitIntermediate(“_C2_”+class+”:”+attribute, 1);

}

Private HashMap<Integer> PC = new HashMap<Integer>();

reduce (String key, Iterator values){

if prefix(key) == _C1_ then PC[getKey(key)] = compute P(C i );

else if prefix(key) == _C2_ then

compute P(Xŀ& i )

P(X | C i ) = P(Xŀ& i ) / PC[getKey(key)];

}

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

map 㑐ᢙ䈪 2⒳㘃䈱୯䉕 emit

map 㑐ᢙ䈪 2⒳㘃䈱୯䉕 emit

P(C i ) 䉕⸘▚ᓟ䈮P(Xŀ& i )䉕⸘▚

(8)

Copyright 2011 NTT Corporation

䊂䉱䉟䊮䊌䉺䊷䊮䈎䉌ቇ䈼䉎䈖䈫

㪈㪌

MapReduce 䈱䊊䉾䉨䊮䉫䊁䉪䊆䉾䉪

⁁ᴫ䈮ᔕ䈛䈩೑↪䈜䉎䈖䈫䈪ಣℂ䉕㜞ㅦൻน⢻

ታⵝ䊉䉡䊊䉡䉕⫾Ⓧ䈜䉎䈖䈫䈲㊀ⷐ

䈚䈎䈚䋬䊌䉵䊦䉂䈢䈇䈪ᤃ䈚䈒䈲䈭䈇

GoF 䈱䊂䉱䉟䊮䊌䉺䊷䊮䉅᥉ㅢ䈮䈲ᕁ䈇䈧䈎䈭䈇

ⶄ㔀䈭䊉䉡䊊䉡䉕೑↪⠪䈎䉌㓝䈽䈇ൻ䈚䈢䈇

䈅䉎⒳䈱ᦨㆡൻ䈮㑐䉒䉎⎇ⓥ⺖㗴

1. ᣂ䈢䈭䊂䉱䉟䊮䊌䉺䊷䊮䉕⷗䈧䈔䉎

2. 䉲䉴䊁䊛䈏⥄േ⊛䈮ㆡಾ䈭䊂䉱䉟䊮䊌䉺䊷䊮䉕ㆬᛯ䈜䉎

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

Copyright 2011 NTT Corporation

ዋ䈚૛⺣: MapReduce 䈦䈩⚿ዪ૗䋿

㪈㪍

એਅ䈱ⷐ⚛䈎䉌䈭䉎䉲䉴䊁䊛

㑐ᢙဳ⸒⺆䈱䉟䊮䉺䊐䉢䊷䉴

• map (f) [r1,...,rn] = [f(r1),...., f(rn)]

• reduce (Ҷ) [r1,...,rn] = r1 Ҷ .... Ҷ rn

ಽᢔⅣႺ䈻䈱ㆡ↪

• map:䊙䉲䊮ౝ䊂䊷䉺ಣℂ䋬reduce:䊙䉲䊮㑆䊂䊷䉺ಣℂ

• ⺰ℂ䊒䊤䊮 = ‛ℂ䊒䊤䊮

• (key, value) + hash 䈮䉋䉎䉲䊞䉾䊐䊦ಣℂ

• 䉣䊤䊷ಣℂ䈭䈬䈱ಽᢔ․᦭䈱ಣℂ䉕㓝䈽䈇

• 䊐䉜䉟䊦䉕੺䈚䈩䊒䊨䉶䉴㑆ㅢା (㜞ㅦ䈭䊥䉦䊋䊥)

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

(9)

Copyright 2011 NTT Corporation

ዋ䈚૛⺣: Dryad 䉅䈅䉎䉋䈰䋿

㪈㪎

Dryad: 䊙䉟䉪䊨䉸䊐䊃䈱ಽᢔಣℂ䉲䉴䊁䊛

MapReduce 䈱⺖㗴䉕⸃ᶖ䈚䈩䈇䉎

䈚䈎䈚䋬ᄙ᭽䈭ㇱຠ䉕⚵䉂ว䉒䈞䉎䊒䊨䉫䊤䊚䊮䉫䋬

‛ℂ䊒䊤䊮䈱ᕈ⢻䉼䊠䊷䊆䊮䉫䈏ⶄ㔀䈠䈉

Hadoop MapReduce 䈱䉮䊚䊠䊆䊁䉞ⷙᮨ䈱Ꮕ

䊒䊨䉫䊤䊚䊮䉫

䉴䉺䉟䊦

ታⴕ䊒䊤䊮 䊒䊨䉶䉴㑆ㅢା 䊂䊷䉺ಣℂ⸒⺆

㪤㪸㫇㪩㪼㪻㫌㪺㪼 㫄㪸㫇㪆㫉㪼㪻㫌㪺㪼 ⺰ℂ䊒䊤䊮䈫

‛ℂ䊒䊤䊮䈲ห䈛

䊐䉜䉟䊦⚻↱ 㪧㫀㪾㩷㪣㪸㫋㫀㫅㪃㩷

㪟㫀㫍㪼

㪛㫉㫐㪸㪻 㪛㪘㪞㩷䈮䉋䉎ᨵエ

䈭䊪䊷䉪䊐䊨䊷

⸥ㅀ

⺰ℂ䊒䊤䊮䉕ᦨㆡൻ䈚䋬

‛ℂ䊒䊤䊮䉕ዉ಴

㩿ᄙᲑ 㫉㪼㪻㫌㪺㪼 ಣℂઁ㪀

㪫㪚㪧㪄㫇㫀㫇㪼㩷㪸㫅㪻㩷

㫀㫅㪄㫄㪼㫄㫆㫉㫐㩷㪝㪠㪝㪦㪃

䊐䉜䉟䊦⚻↱

㪣㪠㪥㪨

Copyright 2011 NTT Corporation

MapReduce 䈱㜞ㅦൻ䈱⎇ⓥേะ

㪈㪏

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

(10)

Copyright 2011 NTT Corporation

ᮡḰ஍Ꮕ

ᐔဋ୯

䊂䊷䉺ઙᢙ

଀1: ᮡḰ஍Ꮕ䈱⸘▚

㪈㪐

䈬䈉䉇䈦䈩 MapReduce ਄䈮ታⵝ䈜䉏䈳⦟䈇䈎䋿

where ȝ is the mean value of

X = x 1 ,…,x i ,…, x N = ab = a ы b

2. 䉟䊮䉪䊥䊜䊮䉺䊦䈭ᣇᴺ: 1࿁䈱 iteration

(summation form, Ḱหဳ)

Calculating variance and mean with MapReduce (Python)

http://blog.cordiner.net/?p=764

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

1. ⋥ⷰ⊛䈭ᣇᴺ: 2࿁䈱 iteration

• 1

st

iteration: ȝ = sum/count 䈫 N (count) 䉕⸘▚

• 2

nd

iteration: ȝ 䉕૶䈦䈩ቯ⟵ᑼ䈮ᓥ䈇 ı 䉕⸘▚

Copyright 2011 NTT Corporation

଀1: ᮡḰ஍Ꮕ䈱⸘▚

㪉㪇

ı 0 ȝ 0 , n 0

ıȝQRI;

ı i , ȝ i , n i

ı N , ȝ N , n N

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

(11)

Copyright 2011 NTT Corporation

Summary form 䈱ᔕ↪

㪉㪈

䉳䊢䉟䊮Ṷ▚

䉟䊮䉪䊥䊜䊮䉺䊦䊎䊠䊷ᦝᣂ [Gupta et al., IEEE 1995]

R ڇ (S1 ыS2) = R ڇ S1 ы R ڇ S2

ᯏળቇ⠌ [Chu et al, NIPS 2006]

10 ⒳ 㘃 䈱 ઍ ⴫ ⊛ 䈭 ᯏ ળ ቇ ⠌ 䈲 (naive bayse, k-

means, SVM,…) summation form 䈪⴫⃻น⢻

summation form 䈲 MapReduce 䈻䈱ᄌ឵䈏ኈᤃ

⹏ଔታ㛎䈱⚿ᨐ䉮䉝ᢙ䈮⦟䈒䉴䉬䊷䊦䈜䉎⚿ᨐ䈏ᓧ䉌䉏䈢

Copyright 2011 NTT Corporation

଀ 2: PageRank

㪉㪉

ૉ䈚䋬 A 䈲web䉫䊤䊐䈱㓞ធⴕ೉, c 䈲䊤䊮䉻䊛䉳䊞䊮䊒䈱⏕₸䋬

u 䈲䊤䊮䉻䊛䈭䊕䊷䉳䈱䊔䉪䊃䊦

page p 䈱 PageRank v 䈲೑↪⠪䈮䉋䈦䈩 p 䈏䉝䉪䉶䉴䈘䉏䉎⏕₸䋮

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

಴ዷ: The architecture of complexity, ASIS Keynote 2006

(12)

Copyright 2011 NTT Corporation

଀ 2: PageRank

㪉㪊

ૉ䈚䋬 A 䈲web䉫䊤䊐䈱㓞ធⴕ೉, c 䈲䊤䊮䉻䊛䉳䊞䊮䊒䈱⏕₸䋬

u 䈲䊤䊮䉻䊛䈭䊕䊷䉳䈱䊔䉪䊃䊦

v 䈲䊤䊮䉻䊛䉰䊷䊐䉜䊷䊝䊂䊦䈮䉋䉎෼᧤୯䈫䈚䈩ᓧ䉌䉏䉎

» »

»

¼

º

« «

«

¬

ª

v ( 1  c )

» »

»

¼

º

« «

«

¬

ª

A 

» »

»

¼

º

« «

«

¬

ª

v c

» »

»

¼

º

« «

«

¬

ª

u

» »

»

¼

º

« «

«

¬

ª

v

» »

»

¼

º

« «

«

¬

ª

v

» »

»

¼

º

« «

«

¬

ª

v

page p 䈱 PageRank v 䈲೑↪⠪䈮䉋䈦䈩 p 䈏䉝䉪䉶䉴䈘䉏䉎⏕₸䋮

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

Copyright 2011 NTT Corporation

଀ 2: PageRank

㪉㪋

2. 䉅䈦䈫㜞ㅦ䈮ಣℂ䈜䉎䈮䈲䋿

iteration ᢙ䈱೥ᷫ: ⶄᢙ iteration ĺ 1 MR job

•PEGASUS [Kang et al., ICDM 2009]

㓞ធⴕ೉ A 䈍䉋䈶䊔䉪䊃䊦 v 䉕䊑䊨䉾䉪ൻ

ⶄᢙ job 䈮䈍䈇䈩ᄌൻ䈚䈭䈇䊂䊷䉺䉕䉨䊞䉾䉲䊠䈜䉎

•Ha䌌oop [Bu et al, PVLDB 2010]

1. ⋥ⷰ⊛䈭ᣇᴺ: 1 iteration ĺ 1 MapReduce job

• ෼᧤䈜䉎䉁䈪䋬䈅䉎䈇䈲৻ቯ࿁ᢙ䈪⸘▚䉕஗ᱛ

䈬䈉䉇䈦䈩 MapReduce ਄䈮ታⵝ䈜䉏䈳⦟䈇䈎䋿

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

ૉ䈚䋬 A 䈲web䉫䊤䊐䈱㓞ធⴕ೉, c 䈲䊤䊮䉻䊛䉳䊞䊮䊒䈱⏕₸䋬

u 䈲䊤䊮䉻䊛䈭䊕䊷䉳䈱䊔䉪䊃䊦

(13)

Copyright 2011 NTT Corporation

HaLoop [Bu et al, PVLDB 2010]

㪉㪌

䉨䊞䉾䉲䊠+䉟䊮䊂䉾䉪䉴ൻ䈮䉋䉎ᄙᲑ iteration 㜞ㅦൻ

mapper input cache: HDFS read ㊂䉕೥ᷫ

k-means: ౉ജ䊂䊷䉺䉕ో mapper node 䈮䉨䊞䉾䉲䊠

reducer input cache: shuffle ㊂䉕೥ᷫ

PageRank: web㓞ធⴕ೉䈱ฦ partition 䉕䋬ᜂᒰ reducer

node 䈮䉨䊞䉾䉲䊠 + 䉟䊮䊂䉾䉪䉴᭴▽

Copyright 2011 NTT Corporation

MapReduce 䈮㑐䈜䉎⎇ⓥ䈱േะ (2/2)

㪉㪐

㜞ㅦൻ

5. 䉳䊢䉟䊮ಣℂ:

•[Dittrich et al., PVLDB 2010]: co-partitioning join tables

•[Afrati et al., EDBT 2010]: shuffle 䉮䉴䊃ᦨㆡൻ

6. หᦼಣℂ䈱࿁ㆱ:

•MapReduce online [Condie et al., NSDI 2010]

7. 䉴䉬䉳䊠䊷䊤: LATE [Zaharia et al., OSDI 2008]

8. ᦨᣂ䊊䊷䊄䉡䉢䉝: Mars [He et al., PACT 2008]: GPUs

ಽᢔಣℂ䊝䊂䊦

• Pregel [Malewicz, SIGMOD 2010]: 䉫䊤䊐䊝䊂䊦

• Dryad [Isard et al., EuroSys 2007]: 䊂䊷䉺䊐䊨䊷䊝䊂䊦

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

(14)

Copyright 2011 NTT Corporation

⎇ⓥേะ䉁䈪䈱䉁䈫䉄

㪊㪇

MapReduce

䉴䉬䊷䊤䊑䊦 (1ਁบ, 䊕䉺䊋䉟䊃䉴䉬䊷䊦)

䉲䊮䊒䊦䈭API (map/reduce)

⎇ⓥ䈏⋓䉖䈮ⴕ䉒䉏䈩䈇䉎

䊂䊷䉺䊔䊷䉴♽ળ⼏ (SIGMOD, VLDB,…)

OS♽ળ⼏ (OSDI, SOSP, NSDI…)

NL/䊙䉟䊆䊮䉫♽ળ⼏ (NIPS, ICDM…)

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

SOCC

Copyright 2011 NTT Corporation

MapReduce 㜞 㜞ㅦൻ䈱ข䉍⚵䉂

1. PJoin:

Efficient join processing with MapReduce

for OLAP applications

㪊㪈

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

(15)

Copyright 2011 NTT Corporation

㪧㪡㫆㫀㫅 㩿㫇㫉㪼㪄㫇㪸㫉㫋㫀㫋㫀㫆㫅㪄㪹㪸㫊㪼㪻 㫁㫆㫀㫅㪀

㪲೔㆐ὐ㪴 ᄙᰴర䊂䊷䉺ಽᨆ㩿㪦㪣㪘㪧㪀ಣℂ䈮䈍䈇䈩䋬

䉲䊞䉾䊐䊦㊂䉕 㪈㪆㪊 䈮೥ᷫ䈜䉎䈖䈫䈪ಣℂᤨ㑆䉕䋬ᓥ

᧪ᛛⴚ䉋䉍 㪋㪉㪅㪏㩼 㜞ㅦൻ

㪲㪲ᚢ⇛㪴 ⶄᢙ䈱ಽᨆಣℂ䈮䈍䈇䈩౒ㅢ䈜䉎䉲䊞䉾䊐

䊦ಣℂ䉕೨䉅䈦䈩ታⴕ(੐೨ಣℂ)䈜䉎䈖䈫䈪䋬ಽᨆಣ

ℂᤨ䈱䉮䉴䊃䉕೥ᷫ

㪊㪉

Copyright 2011 NTT Corporation

⢛᥊: ᄙᰴర䊂䊷䉺ಽᨆ(OLAP)䈫䈲䋿

㪊㪊

⛔⸘⊛䈭ಽᨆಣℂ䈱ౖဳ⊛䈭ᚻᴺ

ጁᱧ䊶䊃䊤䊮䉱䉪䉲䊢䊮䊂䊷䉺䉕ᩰ⚊䈜䉎 㪽㪸㪺㫋䊁䊷䊑䊦㩿ᄢ㪀

㪽㪸㪺㫋 䊁䊷䊑䊦䉕ᄙⷺ⊛䈭ᰴర䈪㓸⚂Ṷ▚䉕ታᣉ䈜䉎䈢䉄

䈱䋬ⶄᢙ䈱 㪻㫀㫄㪼㫅㫊㫀㫆㫅 䊁䊷䊑䊦

PARTKEY NAME MFGR BRAND TYPE SIZE CONTAINER

COMMENT RETAILPRICE

PARTKEY SUPPKEY AVAILQTY SUPPLYCOST COMMENT

SUPPKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL COMMENT

ORDERKEY PARTKEY SUPPKEY LINENUMBER

RETURNFLAG LINESTATUS SHIPDATE COMMITDATE RECEIPTDATE SHIPINSTRUCT SHIPMODE COMMENT

CUSTKEY ORDERSTATUS TOTALPRICE ORDERDATE ORDER- PRIORITY

SHIP- PRIORITY CLERK

COMMENT CUSTKEY

NAME ADDRESS

PHONE ACCTBAL MKTSEGMENT COMMENT PART (P_)

SF*200,000

PARTSUPP (PS_) SF*800,000

LINEITEM (L_) SF*6,000,000

ORDERS (O_) SF*1,500,000

CUSTOMER (C_) SF*150,000

SUPPLIER (S_) SF*10,000

ORDERKEY

NATIONKEY

EXTENDEDPRICE DISCOUNT TAX QUANTITY

NATIONKEY NAME REGIONKEY

NATION (N_) 25

COMMENT

REGIONKEY NAME COMMENT

REGION (R_) 5

䉴䉺䊷ဳ䉴䉨䊷䊙 (TPC-H) 䉪䉣䊥 9 (TPC-H)

㓸⚂Ṷ▚

ᄙ䊁䊷䊑䊦⚿วಣℂ

䉫䊦䊷䊒ൻ

(16)

Copyright 2011 NTT Corporation

䊊䉾䉲䊠⚿ว䈮䉋䉎ᣇᴺ:

• 䉲䊞䉾䊐䊦㊂䈮⿠࿃䈜䉎ㅢା䉮䉴䊃䊶IO䉮䉴䊃䈏ᄢ

ㅢା䉮䉴䊃䊶IO䉮䉴䊃䈱೥ᷫ䈏ᦨ㊀ⷐ⺖㗴

⢛᥊: ᓥ᧪䈱 MapReduce ⚿วಣℂ䈱⺖㗴

㪊㪋

orders 1

hash(x)

mapper

lineitem n

lineitem 1

Join

reducer

hash(x)

Join

orders n

Join

䉲䊞䉾䊐䊦㊂䈏ᄢ

hash(y)

hash(y)

⵾ຠන૏䈱

ᄁ䉍਄䈕䊨䉫

વ␿

Copyright 2011 NTT Corporation

[ᣇ ᣇ㊎] 䊁䊷䊑䊦⚿วಣℂᤨ䈱䉲䊞䉾䊐䊦㊂䉕೥ᷫ䈜䉎

[೨ឭ] OLAPಽᨆ䈪䈲䋬ᦝᣂ䉋䉍䉅ෳᾖಣℂ䈱ᕈ⢻䈏㊀ⷐ

1ኻᄙ䈱㑐ଥ䈪䊁䊷䊑䊦⚿วಣℂ䈜䉎

[ᚢ⇛]

ⶄᢙ䈱ಽᨆಣℂ䈮䈍䈇䈩౒ㅢ⊛䈭䉲䊞䉾䊐䊦ಣℂ䉕

੐೨ಣℂ䈜䉎䈖䈫䈪䋬ಽᨆಣℂᤨ䈱䉮䉴䊃䉕೥ᷫ

• ⚿ว᧦ઙ㩿ਥ䉨䊷㪀䈮䉋䉎䊁䊷䊑䊦䈱੐ ੐೨䉲䊞䉾䊐䊦

• Ḱ⚿ว䈱ᵴ↪ 㪂 ਛ㑆䊂䊷䉺䈱੐ ੐೨↢ᚑ

ⶄᢙMapReduce 㑆䈪䉲䊞䉾䊐䊦㊂䉕೥ᷫ䈜䉎⚿ว⸘↹

[ലᨐ]

ಽᨆಣℂᤨ䈱䉲䊞䉾䊐䊦㊂䉕೥ᷫ

N䊁䊷䊑䊦⚿วṶ▚䈮䉋䉍MapReduce jobᢙ䉕೥ᷫ

PJoin 䈱᭎ⷐ

㪊㪌

(17)

Copyright 2011 NTT Corporation

㪊㪍

Query execution

lineitem a

orders processing

+

Ḱ⚿ว

mapper

lineitem_

orders n

orders n

lineitem_

orders 1

orders 1

orders processing

+

Ḱ⚿ว

Joining with

liteitem

reducer

Joining with

liteitem

lineitem z

䊁䊷䊑䊦䈱੐೨䉲䊞䉾䊐䊦ታⴕ䋬Ḱ⚿วਛ㑆䊂䊷䉺䈱੐೨↢ᚑ

mapper 䈪Ḱ⚿วಣℂᓟ䈮䋬reducer 䈪ᱷಣℂ䉕ታⴕ

Pre-computation

lineitem

orders

hash(x)

hash(y)

lineitem b

lineitem a

lineitem z

orders 1

orders n

lineitem_

orders n

lineitem_

orders 1

hash(y)

lineitem primary key &

foreign key (orders primary key)

DFS read

shuffle

વ␿

⵾ຠන૏䈱

ᄁ䉍਄䈕䊨䉫

Copyright 2011 NTT Corporation

ᓥ᧪ᚻᴺ䈫 PJoin 䈱Ყセ

㪊㪎

PJoin

lineitem a

orders processing

+

Ḱ⚿ว

mapper

lineitem_

orders n

orders n

lineitem_

orders 1

orders 1

orders processing

+

Ḱ⚿ว

Joining with

liteitem

reducer

Joining with

liteitem

lineitem z

䉲䊞䉾䊐䊦㊂䈏೥ᷫ

䉲䊞䉾䊐䊦㊂䈏೥ᷫ

DFS read

shuffle

orders 1

hash(x)

mapper

lineitem z

lineitem a

Join

reducer

hash(x)

Join

orders n

Join

䉲䊞䉾䊐䊦㊂䈏ᄢ

hash(y)

hash(y)

ᓥ᧪ᚻᴺ: 䊊䉾䉲䊠⚿ว

DFS read 䈏Ⴧട

(18)

Copyright 2011 NTT Corporation

PJoin 䈱․ᓽ㽳: N䊁䊷䊑䊦⚿ว

㪊㪏

PJoin

䉴䉺䊷ဳ䉴䉨䊷䊙䈱႐ว

ⶄᢙ mapper 䉕ታⴕ

ో mapper ⚿ᨐ䈲 S 䈱ਥ䉨䊷䈪䉲䊞䉾䊐䊦

reducer 䈲䋱䈧䈪ಣℂน⢻

mapper (T 䈱ਥ䉨䊷䈪⚿วಣℂ) reducer (S 䈱ਥ䉨䊷䈪⚿วಣℂ)

T

one many

S

T 1

one many

S

T N

one many

T 䈱ਥ䉨䊷䈪⚿วಣℂ

N䊁䊷䊑䊦⚿ว䈏 1䈧䈱

MapReduce job 䈪ታ⃻น⢻

⵾ຠන૏䈱

ᄁ䉍਄䈕䊨䉫

વ␿

⵾ຠන૏䈱

ᄁ䉍਄䈕䊨䉫

વ␿

ዊᄁᐫ

Copyright 2011 NTT Corporation

PJoin 䈱․ᓽ㽴: 䉴䉺䊷ဳ䉪䉣䊥⚿ว⸘↹

㪊㪐

ᄙᰴర䊂䊷䉺ಽᨆ䈱䉴䉺䊷ဳ䉴䉨䊷䊙

dimension 䊁䊷䊑䊦䈎䉌ಣℂ䉕㐿ᆎ䈚䋬fact 䊁䊷䊑䊦䉕ᦨ

ᓟ䈮ታⴕ䈜䉎䈖䈫䈪䋬ਛ㑆䊂䊷䉺䉕೥ᷫ䈜䉎

㫃㫀㫅㪼㫀㫋㪼㫄

㫆㫉㪻㪼㫉㫊

㫇㪸㫉㫋㫊㫌㫇㫇 㫇㪸㫉㫋

㫊㫌㫇㫇㫃㫀㪼㫉

㫅㪸㫋㫀㫆㫅

*

*

* *

*

㫃㫀㫅㪼㫀㫋㪼㫄㪃

㫆㫉㪻㪼㫉㫊㪃㩷

㩻㪡㪉

㫇㪸㫉㫋㫊㫌㫇㫇㪃㩷

㫇㪸㫉㫋㪃㩷

㩻㪡㪈

㫊㫌㫇㫇㫃㫀㪼㫉㪃

㫅㪸㫋㫀㫆㫅

J1

J2

J3

Q9 䈪䉝䉪䉶䉴䈜䉎䊁䊷䊑䊦᭴ᚑ Q9 ⚿ว⸘↹

J1

J2

J3 fact 䊁䊷䊑䊦䈱⚿ว䉕ᦨᓟ䈮ታⴕ

(19)

Copyright 2011 NTT Corporation

⹏ଔⅣႺ

Linux (CentOS5.4, 2.8GHz, ਥ⸥ᙘ 8GB)㬍50บ

Java1.6, Hadoop 0.19.2

䊔䊮䉼䊙䊷䉪: TPC-H 䊔䊮䉼䊙䊷䉪

[䊂䊷䉺] 104GB, 207GB, 311GB

Ḱ⚿วਛ㑆䊂䊷䉺: 83GB, 167GB, 250GB

[ಽᨆ䉪䉣䊥] TPC-H䈮䈍䈔䉎⚿วṶ▚䉕೑↪䈜䉎 17䉪䉣䊥

⹏ଔⷰὐ: PJoin 䈫ᓥ᧪䈱 Join ᚻᴺ䈱Ყセ

• ో૕: PJoin 䈫䋬ᓥ᧪䈱2䈧䈱 Join ᚻᴺ䈱ᔕ╵ᕈ⢻Ყセ

• PJoin vs reduce-side join (䉴䉺䊷ဳ⚿ว⸘↹, Hive ⸘↹)

• ⹦⚦

• PJoin 䈮䉋䉎 䉲䊞䉾䊐䊦㊂䋬HDFS read/write ㊂䈱ᓇ㗀

• 䉴䉬䊷䊦ᕈ⹏ଔ

㪋㪇

Copyright 2011 NTT Corporation

ᔕ╵ᤨ㑆 (104GB, 50บ)

ᓥ᧪ᚻᴺ䉋䉍33.4% (star planᲧ), 42.8% (Hive planᲧ)

ᡷༀ䈱ലᨐ: MapReduce job ᢙ೥ᷫ 䋬HDFS+䉲䊞䉾䊐䊦㊂೥ᷫ

ᕈ⢻ = {(HDFS read) + (HDFS write) + 2㬍(shuffle)}㬍ଥᢙ

㪋㪈

0

1

2

3

4

5

6

re sponse tim e (m in)

PJoin

reduce-side join (star plan)

reduce-side join (Hive plan)

(20)

Copyright 2011 NTT Corporation

䉲䊞䉾䊐䊦㊂ (104GB, 50บ)

ᓥ᧪ᚻᴺ䉋䉍62.6% (star plan), 62.2% (Hive plan)ᡷༀ

Q18 䈲ᖡൻ: WHERE᧦ઙ䈱ㆬᛯലᨐ䈱ૐਅ䈏ේ࿃

㪋㪉

0

5

10

15

20

25

30

35

40

45

Shuffle (GB)

PJoin

reduce-side join (star plan)

reduce-side join (Hive plan)

Copyright 2011 NTT Corporation

HDFS read㊂ (104GB, 50บ)

ᓥ᧪ᚻᴺ䉋䉍51.4% (star plan), 52.2 % (Hive plan)Ⴧട

PJoin 䈪䈲Ḱ⚿วਛ㑆䊂䊷䉺䉕ෳᾖ䈜䉎䈢䉄

㪋㪊

0

10

20

30

40

50

60

HDFS Read (GB)

PJoin

reduce-side join (star plan)

reduce-side join (Hive plan)

(21)

Copyright 2011 NTT Corporation

䊂䊷䉺㊂: PJoin 䈱ᣇ䈏䉴䉬䊷䊦ᕈ䈏⧯ᐓఝ䉏䉎

䊙䉲䊮ᢙ: PJoin䉅ᓥ᧪ᚻᴺ䉅บᢙലᨐ䈏಴䈩䈇䉎

㪋㪋

0

10

20

30

40

50

60

70

㪇 㪈㪇㪇 㪉㪇㪇 㪊㪇㪇 㪋㪇㪇

re sponse tim e (m in)

䊂䊷䉺㊂ GB (50บ)

PJoin

reduce-side join (star plan)

0

50

100

150

200

250

300

350

400

㪇 㪉㪇 㪋㪇 㪍㪇

re sponse tim e (m in)

䊙䉲䊮บᢙ (䊂䊷䉺㊂ 311GB)

PJoin

reduce-side join (star plan)

Copyright 2011 NTT Corporation

㪋㪌

PJoin 䉁䈫䉄

․ᓽ:

䉲䊞䉾䊐䊦ಣℂ䈱੐೨ታⴕ(pre-partitioning)䋬Ḱ⚿

วਛ㑆䊂䊷䉺䈱੐೨↢ᚑ

Ḱ ⚿ ว 䉕 mapper 䈪 ታ ⴕ 䋬 ᱷ 䉍 䈱 ⚿ ว ಣ ℂ 䉕

reducer 䈪ታⴕ

ലᨐ:

TPC-H 䈮䈍䈇䈩 30-40%ᔕ╵ᕈ⢻䉕ᡷༀ

䉲䊞䉾䊐䊦㊂䈲1/3 䈮೥ᷫ

੹ᓟ䈱⺖㗴:

ᦨㆡ䊒䊤䊮ㆬᛯ

MapReduce 䈱⿷࿁䉍䈱⺖㗴 (HDFS read/write)

back

(22)

Copyright 2011 NTT Corporation

⎇ⓥ䈱ᣇะᕈ

㪋㪍

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

Copyright 2011 NTT Corporation

⎇ⓥ䈱ᣇะᕈ

㪋㪎

MapReduce 䈱䊜䊥䉾䊃

ૐ䉮䉴䊃䈪◲න䈮(ᄢⷙᮨ䈭)ಽᢔಣℂ䈏น⢻

Hadoop 䉮䊚䊠䊆䊁䉞䈮ෳട䈪䈐䉎

Amazon 䉕૶䈋䈳ᄢⷙᮨⅣႺ䉅೑↪䈚䉇䈜䈇

䉪䊤䉡䊄♽⎇ⓥ䈱㔍䈚䈘

Hadoop 䈲1000บⷙᮨ䈓䉌䈇䉁䈪䈲䉴䉬䊷䊦䈜䉎

ᄢⷙᮨ䈭䈾䈬䉴䉬䊷䊦䊜䊥䉾䊃䈏↢䈐䈩䈒䉎

ᄢⷙᮨ䈭ⅣႺ䈪䈱ᛛⴚ⊛䈭⺖㗴䈮ᚻ䈏ዯ䈎䈭䈇

• The Next Generation of Apache Hadoop MapReduce

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

(23)

Copyright 2011 NTT Corporation

⎇ⓥ䈱ᣇะᕈ

㪋㪏

⎇ⓥ䈱ᣇะᕈ

MR䈱ᡷༀ䈱૛࿾䈏䈅䉍DBᛛⴚ䈱ㆡ↪䈲⛯䈒

• ข䉍⚵䉂䉇䈜䈇䈏ᄢ䈐䈇⽸₂䈲㔍䈚䈇

MapReduce ⁛⥄䈱ⷰὐ

• shuffle ㊂䈱೥ᷫ䋬ᄙᲑ job ಣℂ

• ᛩᯏታⴕ䋬㓚ኂ࿁ᓳ

ಽᢔಣℂ䈮䈍䈔䉎↢↥ᕈะ਄

• 䊄䊜䉟䊮․ൻဳ䈱ಽᢔ䊝䊂䊦 (Pregel, Parcolator)

• SQL䉇㑐ᢙဳ⸒⺆䈱ᦨㆡൻᛛⴚ䉕ᵴ↪

• ት⸒⊛⸒⺆䈎䉌ታⴕ䊒䊤䊮䉕ዉ಴

• 㑐ᢙဳ⸒⺆: ⚵ൻ䋬Ⲣว

Copyright 2011 NTT Corporation

䈍䉒䉍䈮

㪋㪐

Hadoop䈱OSS䉕䊔䊷䉴䈮ข䉍⚵䉄䉎䊜䊥䉾䊃

䉮䊚䊠䊆䊁䉞䈻䈱⽸₂

ᄙ䈒䈱೑↪⠪䈲 Hadoop䈏ਁ⢻䈣䈫⠨䈋䈩䈇䉎

PBⷙᮨ䈱䊂䊷䉺䈲䊙䉲䊮䈏ᄙ䈒䈩䉅ኈᤃ䈪䈲䈭䈇

NTT ⎇ⓥᚲ䈪䈱ข䉍⚵䉂

R&D 䉪䊤䉡䊄 1500บⅣႺ

ᔕ↪

ᄢⷙᮨ web䊂䊷䉺, 䉪䉣䊥䊨䉫䋬䉝䉪䉶䉴䊨䉫

䊃䊤䊐䉞䉾䉪䊂䊷䉺⸃ᨆ

㖸ᭉ≮ᴺ䊂䊷䉺䊶䊙䉟䊆䊮䉫

㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ

⥝๧䈱䈅䉎ᣇ䋬ᤚ㕖⼏⺰䈚䉁䈚䉊䈉䋣

(24)

Copyright 2011 NTT Corporation

ᢥ₂

[Abouzeid et al., PVLDB 2009] HadoopDB: An Architectural Hybrid of MapReduce and DBMS

Technologies for Analytical Workloads, A. Abouzeid et al., In PVLDB 2(1), 2009

[Afrati et al., EDBT 2010] Optimizing joins in a map-reduce environment, F. N. Afrati, J. D. Ullman, In

EDBT 2010

[Agrawal et al., VLDB 2010] Big Data and Cloud Computing:

New Wine or just New Bottles?, D. Agrawal, S. Das, A. Abbadi, In VLDB tutorial 2010

[Blanas et al., SIGMOD 2010] A Comparison of Join Algorithms for Log Processing in MapReduce, S.

Blanas, J.M. Patel, V.Ercegovac, J. Rao, In SIGMOD 2010

[Bu et al., PVLDB 2010] HaLoop: Efficient Iterative Data Processing on Large Clusters, S. Blanas, J.M.

Patel, V.Ercegovac, J. Rao, In SIGMOD 2010

[Chen, PVLDB 2010] Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce, S.

Chen, In PVLDB 3(2), 2010

[Chu et al., NIPS 2006] Map-reduce for machine learning on multicore, C. Chu, S. Kim, Y. Lin, Y. Yu, G.

Bradski, A. Ng, K. Olukotun, In NIPS 2006

[Condie et al., NSDI 2010] MapReduce Online, T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, In

NSDI2010

[Dean et al., ODSI 2004] MapReduce: Simplified Data Processing on Large Clusters, J. Dean, S.

Ghemawat, In OSDI 2004

[Dean et al., CACM 2008] MapReduce: Simplified Data Processing on Large Clusters, J. Dean, S.

Ghemawat, In CACM Jan 2008

[Dean et al., CACM 2010] MapReduce: a flexible data processing tool, J. Dean, S. Ghemawat, In CACM

Jan 2010

[DeWitt, 2009] Mapreduce: A major step backwards, D. DeWitt, M. Stonebraker, The Database column

2009

㪌㪇

Copyright 2011 NTT Corporation

ᢥ₂

[Dittrich et al., PVLDB 2010] Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even

Noticing), J. Dittrich, J. Quiane-Ruiz, A. Jindal, Y. Kargin, V. Setty. J. Schad, In PVLDB 3(1), 2010

[Gupta et al., IEEE 1995] Maintenance of materialized views: Problems, techniques and applications, UA.

Gupta, I.S. Mumick, IEEE Data Eng. 18(2) 1995

[Isard et al., EuroSys 2007] Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks,

M. Budiu, Y. Yu, A. Birrell, D.Fetterly, In EuroSys 2007

[He et al., PACT 2008] Mars: A MapReduce framework on graphics processors, B. He, W. Fang, Q. Luo,

N.K. Govindaraju, T. Wang, In PACT 2008

[Kang et al., ICDM 2009] PEGASUS: A Peta-Scale Graph Mining System Implementation and

Observations, U. Kang, C. Tsourakakis, C. Faloutsos, In ICDM 2009

[Lang et al., PVLDB 2010] Energy Management for MapReduce Clusters, W. Lang, J. M. Patel, In VLDB

3(1), 2010

[Lin et al., 2010] Data intensive text processing with MapReduce, J. Lin and C. Dyer, Morgan & Claypool

2010

[Malewicz, SIGMOD 2010] Pregel: a system for large-scale graph processing, G. Malewicz, M. Austern, A.

Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski, In SIGMOD 2010

[Nykiel et al., PVLDB 2010] MRShare: Sharing across multiple queries in MapReduce, T. Nykiel, M.

Potamias, C. Mishra, G. Kollios, N. Kodas, In VLDB 3(1), 2010

[Panda et al., PVLDB 2009] PLANET: Massively parallel learning of tree ensembles with MapReduce, B.

Panda, J.S. Herbach, S. Basu, R.J. Bayardo, In PVLDB 2(2), 2009

[Zaharia et al. OSDI 2008] Improving MapReduce performance in heterogeneous environments, M.

Zaharia, A. Konwinski, A. Joseph, In OSDI 2008

㪌㪈

参照

関連したドキュメント

High quality fuel (exclude coal) ratio in the whole city energy consumption : 45.4.. % in 2000 to 76%

Block copolymers containing cate- chol moieties have also been synthesized reversible addition-fragmentation transfer (RAFT) polymer- ization of 3,4-dihydroxy styrene and

(5) As explained in Note 17 to the accompanying consolidated financial statements, expenses and/or losses for scrapping Fukushima Nuclear Power Station Units 1 through 4

Due to suspension of this power panel, transfer of accumulated water at Unit 2 turbine building vertical shaft to the Centralized Radiation Waste Treatment Facility (Process

- At 9:46 am, August 30, we stopped transfer of accumulated water from the basement of turbine building of unit 3 to Centralized Radiation Waste Treatment Facility (Process

[r]

䇭䊶㪥㪢⸽ᦠ⊒ⴕ䈮ᔅⷐ䈭ᦠ㘃䈱 㩷㩷㩷㩷ឭଏ 䇭䊶㪡㪞ឭ಴㪥㪢䊧䊘䊷䊃䊄䊤䊐䊃 㩷㩷㩷㩷૞ᚑଐ㗬 㩷㩷㩷䋨᭎䈰䊐䊤䉾䉫䊋䉾䉪䈱䋱ㅳ㑆

䋤䋱㪩㪆㪙 䋤䋱㪫㪆㪙 䋤䋲㪩㪆㪙 䋤䋲㪫㪆㪙 䋤䋳㪩㪆㪙 䋤䋳㪫㪆㪙 䋤䋴㪩㪆㪙 䋤䋴㪫㪆㪙 䋤䋵㪩㪆㪙 䋤䋵㪫㪆㪙 䋤䋶㪩㪆㪙 䋤䋶㪫㪆㪙 䋤䋷㪩㪆㪙 䋤䋷㪫㪆㪙