Copyright 2011 NTT Corporation
-- MapReduce ✬ --
1
㝩Ⴆ ⌀
ਥᐙ⎇ⓥຬ㩿․⎇ⓥຬ㪀
㪥㪫㪫䉰䉟䊋䊷䉴䊕䊷䉴⎇ⓥᚲ
Copyright 2011 NTT Corporation
⋡ᰴ
MapReduce 䈫䈲䋿
MapReduce 䈱䊂䉱䉟䊮䊌䉺䊷䊮
MapReduce 㜞ㅦൻ䈱⎇ⓥേะ
ᮡḰᏅ / PageRank䈱⸘▚
MapReduce 䈱㑐ㅪᛛⴚ䈱⎇ⓥേะ
MapReduce 㜞ㅦൻ䈱ข䉍⚵䉂
Map Multi-Reduce: reduce 䈱೨ታⴕ
PJoin: ೨䊌䊷䊁䉞䉲䊢䊮ಽഀ䈫Ḱ⚿ว䈱↪
ⶄᢙಽᨆಣℂ䈮䈍䈔䉎MapReduceᦨㆡൻ
⎇ⓥ䈱ᣇะᕈ
㪉
Copyright 2011 NTT Corporation
MapReduce 䈱ⷐ
㪊
㪤
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
skip
Copyright 2011 NTT Corporation
MapReduce 䈫䈲?
ಽᢔಣℂ䈱䊒䊨䉫䊤䊚䊮䉫䊝䊂䊦䋧䉲䉴䊁䊛
㜞䉴䉬䊷䊦ᕈ: 1ਁ䊉䊷䊄(䊙䉲䊮)䋬䊕䉺䊋䉟䊃䊂䊷䉺
䉲䊮䊒䊦䈭API: map㑐ᢙ/reduce㑐ᢙ
ಽᢔಣℂ․䈱ⶄ㔀䈘(⽶⩄ಽᢔ䊶น↪ᕈ)䈏㓝䈽䈇
Google 䈏㐿⊒
web䈱ᬌ⚝䉣䊮䉳䊮䈱䊋䉾䉪䉣䊮䊄䈱↪ㅜ䈱䈢䉄㐿⊒
PageRank ⸘▚䋬ォ⟎䊐䉜䉟䊦᭴▽
2008ᐕᤨὐ䈪 20PB/day 䉕 MapReduce䈪ಣℂ
[Dean et al., OSDI 2004, CACM Jan 2008, CACM Jan 2010]
Hadoop project 䈮䈍䈇䈩䉥䊷䊒䊮䉸䊷䉴ൻ
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
㪌
MapReduce 䈱ㆡ↪
n-gram 䈮䉋䉎Web 䉮䊮䊁䊮䉿ಽᨆ
ᣣᧄ⺆ 30ం䊕䊷䉳
web 䈪ᄙ䈇ᢥ┨䈲䈎䋿
ㄉᦠ: 㖸ჿ⼂䋬ᣣ⧷⠡⸶
Web 䈱䉪䉣䊥䊶䉝䉪䉶䉴䊨䉫
ᬌ⚝ㅪേᐢ๔ (ad-words)
↪⠪䈮ㆡ䈚䈢䉮䊮䊁䊮䉿䈲䋿
䊌䊷䉸䊅䊤䉟䉵ᬌ⚝
䊈䉾䊃䊪䊷䉪䊃䊤䊐䉞䉾䉪⸃ᨆ
40Gbps 䊈䉾䊃䊪䊷䉪
QoS ᓮ, 㓚ኂ⸃ᨆ
serve as the incoming 92
serve as the incubator 99
serve as the independent 794
…
4-gram
䊌䉬䉾䊃ㆃᑧ䋬䊌䉬䉾䊃ᰳ៊
“㈪” ĺᳯᚭಾሶ (edo kiriko)
ᬌ⚝䊪䊷䊄 ᬌ⚝ㅪേᐢ๔
Copyright 2011 NTT Corporation
MapReduce 䈱䉝䊷䉨䊁䉪䉼䊞
㪍
MapReduce: ಽᢔಣℂ䉲䉴䊁䊛
ಽᢔ䊐䉜䉟䊦䉲䉴䊁䊛 DFS 䈪᭴▽
DFS: GFS (Google FS), HDFS (Hadoop DFS)
2䊐䉢䊷䉵 + Shuffle
Map 䊐䉢䊷䉵: ജ䊧䉮䊷䊄Ფ䈮map㑐ᢙ䉕ታⴕ
䊘䉟䊮䊃: 䊙䉲䊮Ფ䈮⁛┙䈚䈢ಣℂ
Shuffle: ห৻key 䈱(key, value) ⟲䉕᧤䈰䉎
Reduce 䊐䉢䊷䉵: ᧤䈰䈢⚿ᨐ䈮 reduce 㑐ᢙ䉕ታⴕ
䊘䉟䊮䊃: ⶄᢙ䊙䉲䊮䈮䉁䈢䈏䉍ᔅⷐ䈭ಣℂ
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
MapReduce 䈱䉝䊷䉨䊁䉪䉼䊞(⛯䈐)
㪎
User
Program
worker
Input Data
fork fork
fork
Master
worker
worker
assign
map
assign
reduce
local
write
remote read,
sort
Output
File 0
Output
File 1
Split 1
Split 0
Split 2
Split 3
Split 4
read
worker
node
process
file
worker
worker
worker
Map 䉺䉴䉪 Reduce 䉺䉴䉪
Map 䉺䉴䉪 Map task Reduce task
shuffle
㪤
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
䊒䊨䉫䊤䊛: wordCount
ജ䊧䉮䊷䊄Ფ䈮 map㑐ᢙ䉕ㆡ↪䈜䉎
න⺆Ფ䈮 (word, 1) 䉕ജ䈜䉎
㪏
න⺆Ფ䈮㗫ᐲ䉕Ⓧ▚䈜䉎
[Dean et al., OSDI 2004]
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
MapReduce 䈱䊂䉱䉟䊮䊌䉺䊷䊮
㪐
䈅䉎⒳䈱㜞ㅦൻ䈱䊁䉪䊆䉾䉪㓸 + ታ䉝䊒䊥
Data intensive text processing with MapReduce,
[Lin et al., 2010]
Local Aggregation
Pairs and Stripes
Computing Relative Frequencies
Secondary Sorting
Relational Joins
Copyright 2011 NTT Corporation
Computing Relative Frequencies
㪈㪇
᧦ઙઃ䈐⏕₸ P(X|C i )䉕⸘▚䈜䉎
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
May 26, 2011 Data Mining: Concepts and Techniques 11
Naïve Bayesian Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data sample
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
age income studentcredit_rating_comp
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
May 26, 2011 Data Mining: Concepts and Techniques 12
Naïve Bayesian Classifier: An Example
P(C
i): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
Compute P(X|C
i) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|C
i) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|C
i)*P(C
i) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Copyright 2011 NTT Corporation
Computing Relative Frequencies
㪈㪊
᧦ઙઃ䈐⏕₸ P(X | C i )䉕⸘▚䈜䉎
P(X | C i ) = P(Xŀ& i ) / P(C i )
䈬䈉䉇䈦䈩 MapReduce 䈮ታⵝ䈜䉏䈳⦟䈇䈎䋿
1. ⋥ⷰ⊛䈭ᣇᴺ: iteration 䉕2࿁ታⴕ䈜䉎
• 1 st iteration: P(C i )䉕⸘▚䈜䉎
• 2 nd iteration: P(Xŀ& i ) 䉕⸘▚䈚䈩 P(X | C i ) 䉕ᓧ䉎
2. 䊂䉱䉟䊮䊌䉺䊷䊮䈪䈱ᣇᴺ: iteration 1࿁䈪ታⴕ
• map 㑐ᢙ䈪 2⒳㘃䈱୯䉕 emit : P(C i )↪, P(Xŀ& i ) ↪
• C i 䉕䉨䊷䈫䈚䈩 partition ಽ䈔
• P(C i )䉕వ䈮⸘▚䋬ᰴ䈮 P(Xŀ& i ) 䉕⸘▚䋬 P(X | C i ) 䉕ᓧ䉎
P(C
1): 䉮䊮䊏䊠䊷䉺⾼䈜䉎⏕₸
P(age = “<=30” ŀ&
1): 30ᱦએਅ䈪⾼
Copyright 2011 NTT Corporation
Computing Relative Frequencies
㪈㪋
map (String key, String value){
// value is an input record
for each class in value:
EmitIntermediate(“_C1_”+class, 1);
for each attribute in value:
EmitIntermediate(“_C2_”+class+”:”+attribute, 1);
}
Private HashMap<Integer> PC = new HashMap<Integer>();
reduce (String key, Iterator values){
if prefix(key) == _C1_ then PC[getKey(key)] = compute P(C i );
else if prefix(key) == _C2_ then
compute P(Xŀ& i )
P(X | C i ) = P(Xŀ& i ) / PC[getKey(key)];
}
㪤
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
map 㑐ᢙ䈪 2⒳㘃䈱୯䉕 emit
map 㑐ᢙ䈪 2⒳㘃䈱୯䉕 emit
P(C i ) 䉕⸘▚ᓟ䈮P(Xŀ& i )䉕⸘▚
Copyright 2011 NTT Corporation
䊂䉱䉟䊮䊌䉺䊷䊮䈎䉌ቇ䈼䉎䈖䈫
㪈㪌
MapReduce 䈱䊊䉾䉨䊮䉫䊁䉪䊆䉾䉪
⁁ᴫ䈮ᔕ䈛䈩↪䈜䉎䈖䈫䈪ಣℂ䉕㜞ㅦൻน⢻
ታⵝ䊉䉡䊊䉡䉕⫾Ⓧ䈜䉎䈖䈫䈲㊀ⷐ
䈚䈎䈚䋬䊌䉵䊦䉂䈢䈇䈪ᤃ䈚䈒䈲䈭䈇
GoF 䈱䊂䉱䉟䊮䊌䉺䊷䊮䉅᥉ㅢ䈮䈲ᕁ䈇䈧䈎䈭䈇
ⶄ㔀䈭䊉䉡䊊䉡䉕↪⠪䈎䉌㓝䈽䈇ൻ䈚䈢䈇
䈅䉎⒳䈱ᦨㆡൻ䈮㑐䉒䉎⎇ⓥ⺖㗴
1. ᣂ䈢䈭䊂䉱䉟䊮䊌䉺䊷䊮䉕䈧䈔䉎
2. 䉲䉴䊁䊛䈏⥄േ⊛䈮ㆡಾ䈭䊂䉱䉟䊮䊌䉺䊷䊮䉕ㆬᛯ䈜䉎
㪤
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
ዋ䈚⺣: MapReduce 䈦䈩⚿ዪ䋿
㪈㪍
એਅ䈱ⷐ⚛䈎䉌䈭䉎䉲䉴䊁䊛
㑐ᢙဳ⸒⺆䈱䉟䊮䉺䊐䉢䊷䉴
• map (f) [r1,...,rn] = [f(r1),...., f(rn)]
• reduce (Ҷ) [r1,...,rn] = r1 Ҷ .... Ҷ rn
ಽᢔⅣႺ䈻䈱ㆡ↪
• map:䊙䉲䊮ౝ䊂䊷䉺ಣℂ䋬reduce:䊙䉲䊮㑆䊂䊷䉺ಣℂ
• ⺰ℂ䊒䊤䊮 = ‛ℂ䊒䊤䊮
• (key, value) + hash 䈮䉋䉎䉲䊞䉾䊐䊦ಣℂ
• 䉣䊤䊷ಣℂ䈭䈬䈱ಽᢔ․䈱ಣℂ䉕㓝䈽䈇
• 䊐䉜䉟䊦䉕䈚䈩䊒䊨䉶䉴㑆ㅢା (㜞ㅦ䈭䊥䉦䊋䊥)
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
ዋ䈚⺣: Dryad 䉅䈅䉎䉋䈰䋿
㪈㪎
Dryad: 䊙䉟䉪䊨䉸䊐䊃䈱ಽᢔಣℂ䉲䉴䊁䊛
MapReduce 䈱⺖㗴䉕⸃ᶖ䈚䈩䈇䉎
䈚䈎䈚䋬ᄙ᭽䈭ㇱຠ䉕⚵䉂ว䉒䈞䉎䊒䊨䉫䊤䊚䊮䉫䋬
‛ℂ䊒䊤䊮䈱ᕈ⢻䉼䊠䊷䊆䊮䉫䈏ⶄ㔀䈠䈉
Hadoop MapReduce 䈱䉮䊚䊠䊆䊁䉞ⷙᮨ䈱Ꮕ
䊒䊨䉫䊤䊚䊮䉫
䉴䉺䉟䊦
ታⴕ䊒䊤䊮 䊒䊨䉶䉴㑆ㅢା 䊂䊷䉺ಣℂ⸒⺆
㪤㪸㫇㪩㪼㪻㫌㪺㪼 㫄㪸㫇㪆㫉㪼㪻㫌㪺㪼 ⺰ℂ䊒䊤䊮䈫
‛ℂ䊒䊤䊮䈲ห䈛
䊐䉜䉟䊦⚻↱ 㪧㫀㪾㩷㪣㪸㫋㫀㫅㪃㩷
㪟㫀㫍㪼
㪛㫉㫐㪸㪻 㪛㪘㪞㩷䈮䉋䉎ᨵエ
䈭䊪䊷䉪䊐䊨䊷
⸥ㅀ
⺰ℂ䊒䊤䊮䉕ᦨㆡൻ䈚䋬
‛ℂ䊒䊤䊮䉕ዉ
㩿ᄙᲑ 㫉㪼㪻㫌㪺㪼 ಣℂઁ㪀
㪫㪚㪧㪄㫇㫀㫇㪼㩷㪸㫅㪻㩷
㫀㫅㪄㫄㪼㫄㫆㫉㫐㩷㪝㪠㪝㪦㪃
䊐䉜䉟䊦⚻↱
㪣㪠㪥㪨
Copyright 2011 NTT Corporation
MapReduce 䈱㜞ㅦൻ䈱⎇ⓥേะ
㪈㪏
㪤
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
ᮡḰᏅ
ᐔဋ୯
䊂䊷䉺ઙᢙ
1: ᮡḰᏅ䈱⸘▚
㪈㪐
䈬䈉䉇䈦䈩 MapReduce 䈮ታⵝ䈜䉏䈳⦟䈇䈎䋿
where ȝ is the mean value of
X = x 1 ,…,x i ,…, x N = ab = a ы b
2. 䉟䊮䉪䊥䊜䊮䉺䊦䈭ᣇᴺ: 1࿁䈱 iteration
(summation form, Ḱหဳ)
Calculating variance and mean with MapReduce (Python)
http://blog.cordiner.net/?p=764
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
1. ⋥ⷰ⊛䈭ᣇᴺ: 2࿁䈱 iteration
• 1
stiteration: ȝ = sum/count 䈫 N (count) 䉕⸘▚
• 2
nditeration: ȝ 䉕䈦䈩ቯ⟵ᑼ䈮ᓥ䈇 ı 䉕⸘▚
Copyright 2011 NTT Corporation
1: ᮡḰᏅ䈱⸘▚
㪉㪇
ı 0 ȝ 0 , n 0
ıȝQRI;
ı i , ȝ i , n i
ı N , ȝ N , n N
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
Summary form 䈱ᔕ↪
㪉㪈
䉳䊢䉟䊮Ṷ▚
䉟䊮䉪䊥䊜䊮䉺䊦䊎䊠䊷ᦝᣂ [Gupta et al., IEEE 1995]
R ڇ (S1 ыS2) = R ڇ S1 ы R ڇ S2
ᯏળቇ⠌ [Chu et al, NIPS 2006]
10 ⒳ 㘃 䈱 ઍ ⊛ 䈭 ᯏ ળ ቇ ⠌ 䈲 (naive bayse, k-
means, SVM,…) summation form 䈪น⢻
summation form 䈲 MapReduce 䈻䈱ᄌ឵䈏ኈᤃ
⹏ଔታ㛎䈱⚿ᨐ䉮䉝ᢙ䈮⦟䈒䉴䉬䊷䊦䈜䉎⚿ᨐ䈏ᓧ䉌䉏䈢
Copyright 2011 NTT Corporation
2: PageRank
㪉㪉
ૉ䈚䋬 A 䈲web䉫䊤䊐䈱㓞ធⴕ, c 䈲䊤䊮䉻䊛䉳䊞䊮䊒䈱⏕₸䋬
u 䈲䊤䊮䉻䊛䈭䊕䊷䉳䈱䊔䉪䊃䊦
page p 䈱 PageRank v 䈲↪⠪䈮䉋䈦䈩 p 䈏䉝䉪䉶䉴䈘䉏䉎⏕₸䋮
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
ዷ: The architecture of complexity, ASIS Keynote 2006
Copyright 2011 NTT Corporation
2: PageRank
㪉㪊
ૉ䈚䋬 A 䈲web䉫䊤䊐䈱㓞ធⴕ, c 䈲䊤䊮䉻䊛䉳䊞䊮䊒䈱⏕₸䋬
u 䈲䊤䊮䉻䊛䈭䊕䊷䉳䈱䊔䉪䊃䊦
v 䈲䊤䊮䉻䊛䉰䊷䊐䉜䊷䊝䊂䊦䈮䉋䉎᧤୯䈫䈚䈩ᓧ䉌䉏䉎
» »
»
¼
º
« «
«
¬
ª
v ( 1 c )
» »
»
¼
º
« «
«
¬
ª
A
» »
»
¼
º
« «
«
¬
ª
v c
» »
»
¼
º
« «
«
¬
ª
u
» »
»
¼
º
« «
«
¬
ª
v
» »
»
¼
º
« «
«
¬
ª
v
» »
»
¼
º
« «
«
¬
ª
v
page p 䈱 PageRank v 䈲↪⠪䈮䉋䈦䈩 p 䈏䉝䉪䉶䉴䈘䉏䉎⏕₸䋮
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
2: PageRank
㪉㪋
2. 䉅䈦䈫㜞ㅦ䈮ಣℂ䈜䉎䈮䈲䋿
iteration ᢙ䈱ᷫ: ⶄᢙ iteration ĺ 1 MR job
•PEGASUS [Kang et al., ICDM 2009]
㓞ធⴕ A 䈍䉋䈶䊔䉪䊃䊦 v 䉕䊑䊨䉾䉪ൻ
ⶄᢙ job 䈮䈍䈇䈩ᄌൻ䈚䈭䈇䊂䊷䉺䉕䉨䊞䉾䉲䊠䈜䉎
•Ha䌌oop [Bu et al, PVLDB 2010]
1. ⋥ⷰ⊛䈭ᣇᴺ: 1 iteration ĺ 1 MapReduce job
• ᧤䈜䉎䉁䈪䋬䈅䉎䈇䈲৻ቯ࿁ᢙ䈪⸘▚䉕ᱛ
䈬䈉䉇䈦䈩 MapReduce 䈮ታⵝ䈜䉏䈳⦟䈇䈎䋿
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
ૉ䈚䋬 A 䈲web䉫䊤䊐䈱㓞ធⴕ, c 䈲䊤䊮䉻䊛䉳䊞䊮䊒䈱⏕₸䋬
u 䈲䊤䊮䉻䊛䈭䊕䊷䉳䈱䊔䉪䊃䊦
Copyright 2011 NTT Corporation
HaLoop [Bu et al, PVLDB 2010]
㪉㪌
䉨䊞䉾䉲䊠+䉟䊮䊂䉾䉪䉴ൻ䈮䉋䉎ᄙᲑ iteration 㜞ㅦൻ
mapper input cache: HDFS read ㊂䉕ᷫ
k-means: ജ䊂䊷䉺䉕ో mapper node 䈮䉨䊞䉾䉲䊠
reducer input cache: shuffle ㊂䉕ᷫ
PageRank: web㓞ធⴕ䈱ฦ partition 䉕䋬ᜂᒰ reducer
node 䈮䉨䊞䉾䉲䊠 + 䉟䊮䊂䉾䉪䉴᭴▽
Copyright 2011 NTT Corporation
MapReduce 䈮㑐䈜䉎⎇ⓥ䈱േะ (2/2)
㪉㪐
㜞ㅦൻ
5. 䉳䊢䉟䊮ಣℂ:
•[Dittrich et al., PVLDB 2010]: co-partitioning join tables
•[Afrati et al., EDBT 2010]: shuffle 䉮䉴䊃ᦨㆡൻ
6. หᦼಣℂ䈱࿁ㆱ:
•MapReduce online [Condie et al., NSDI 2010]
7. 䉴䉬䉳䊠䊷䊤: LATE [Zaharia et al., OSDI 2008]
8. ᦨᣂ䊊䊷䊄䉡䉢䉝: Mars [He et al., PACT 2008]: GPUs
ಽᢔಣℂ䊝䊂䊦
• Pregel [Malewicz, SIGMOD 2010]: 䉫䊤䊐䊝䊂䊦
• Dryad [Isard et al., EuroSys 2007]: 䊂䊷䉺䊐䊨䊷䊝䊂䊦
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
⎇ⓥേะ䉁䈪䈱䉁䈫䉄
㪊㪇
MapReduce
䉴䉬䊷䊤䊑䊦 (1ਁบ, 䊕䉺䊋䉟䊃䉴䉬䊷䊦)
䉲䊮䊒䊦䈭API (map/reduce)
⎇ⓥ䈏⋓䉖䈮ⴕ䉒䉏䈩䈇䉎
䊂䊷䉺䊔䊷䉴♽ળ⼏ (SIGMOD, VLDB,…)
OS♽ળ⼏ (OSDI, SOSP, NSDI…)
NL/䊙䉟䊆䊮䉫♽ળ⼏ (NIPS, ICDM…)
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
SOCC
Copyright 2011 NTT Corporation
MapReduce 㜞 㜞ㅦൻ䈱ข䉍⚵䉂
1. PJoin:
Efficient join processing with MapReduce
for OLAP applications
㪊㪈
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
㪧㪡㫆㫀㫅 㩿㫇㫉㪼㪄㫇㪸㫉㫋㫀㫋㫀㫆㫅㪄㪹㪸㫊㪼㪻 㫁㫆㫀㫅㪀
㪲㆐ὐ㪴 ᄙᰴర䊂䊷䉺ಽᨆ㩿㪦㪣㪘㪧㪀ಣℂ䈮䈍䈇䈩䋬
䉲䊞䉾䊐䊦㊂䉕 㪈㪆㪊 䈮ᷫ䈜䉎䈖䈫䈪ಣℂᤨ㑆䉕䋬ᓥ
᧪ᛛⴚ䉋䉍 㪋㪉㪅㪏㩼 㜞ㅦൻ
㪲㪲ᚢ⇛㪴 ⶄᢙ䈱ಽᨆಣℂ䈮䈍䈇䈩ㅢ䈜䉎䉲䊞䉾䊐
䊦ಣℂ䉕೨䉅䈦䈩ታⴕ(೨ಣℂ)䈜䉎䈖䈫䈪䋬ಽᨆಣ
ℂᤨ䈱䉮䉴䊃䉕ᷫ
㪊㪉
Copyright 2011 NTT Corporation
⢛᥊: ᄙᰴర䊂䊷䉺ಽᨆ(OLAP)䈫䈲䋿
㪊㪊
⛔⸘⊛䈭ಽᨆಣℂ䈱ౖဳ⊛䈭ᚻᴺ
ጁᱧ䊶䊃䊤䊮䉱䉪䉲䊢䊮䊂䊷䉺䉕ᩰ⚊䈜䉎 㪽㪸㪺㫋䊁䊷䊑䊦㩿ᄢ㪀
㪽㪸㪺㫋 䊁䊷䊑䊦䉕ᄙⷺ⊛䈭ᰴర䈪㓸⚂Ṷ▚䉕ታᣉ䈜䉎䈢䉄
䈱䋬ⶄᢙ䈱 㪻㫀㫄㪼㫅㫊㫀㫆㫅 䊁䊷䊑䊦
PARTKEY NAME MFGR BRAND TYPE SIZE CONTAINER
COMMENT RETAILPRICE
PARTKEY SUPPKEY AVAILQTY SUPPLYCOST COMMENT
SUPPKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL COMMENT
ORDERKEY PARTKEY SUPPKEY LINENUMBER
RETURNFLAG LINESTATUS SHIPDATE COMMITDATE RECEIPTDATE SHIPINSTRUCT SHIPMODE COMMENT
CUSTKEY ORDERSTATUS TOTALPRICE ORDERDATE ORDER- PRIORITY
SHIP- PRIORITY CLERK
COMMENT CUSTKEY
NAME ADDRESS
PHONE ACCTBAL MKTSEGMENT COMMENT PART (P_)
SF*200,000
PARTSUPP (PS_) SF*800,000
LINEITEM (L_) SF*6,000,000
ORDERS (O_) SF*1,500,000
CUSTOMER (C_) SF*150,000
SUPPLIER (S_) SF*10,000
ORDERKEY
NATIONKEY
EXTENDEDPRICE DISCOUNT TAX QUANTITY
NATIONKEY NAME REGIONKEY
NATION (N_) 25
COMMENT
REGIONKEY NAME COMMENT
REGION (R_) 5
䉴䉺䊷ဳ䉴䉨䊷䊙 (TPC-H) 䉪䉣䊥 9 (TPC-H)
㓸⚂Ṷ▚
ᄙ䊁䊷䊑䊦⚿วಣℂ
䉫䊦䊷䊒ൻ
Copyright 2011 NTT Corporation
䊊䉾䉲䊠⚿ว䈮䉋䉎ᣇᴺ:
• 䉲䊞䉾䊐䊦㊂䈮࿃䈜䉎ㅢା䉮䉴䊃䊶IO䉮䉴䊃䈏ᄢ
ㅢା䉮䉴䊃䊶IO䉮䉴䊃䈱ᷫ䈏ᦨ㊀ⷐ⺖㗴
⢛᥊: ᓥ᧪䈱 MapReduce ⚿วಣℂ䈱⺖㗴
㪊㪋
orders 1
hash(x)
mapper
…
lineitem n
…
lineitem 1
Join
reducer
…
hash(x)
Join
orders n
Join
…
䉲䊞䉾䊐䊦㊂䈏ᄢ
hash(y)
hash(y)
ຠන䈱
ᄁ䉍䈕䊨䉫
વ
Copyright 2011 NTT Corporation
[ᣇ ᣇ㊎] 䊁䊷䊑䊦⚿วಣℂᤨ䈱䉲䊞䉾䊐䊦㊂䉕ᷫ䈜䉎
[೨ឭ] OLAPಽᨆ䈪䈲䋬ᦝᣂ䉋䉍䉅ෳᾖಣℂ䈱ᕈ⢻䈏㊀ⷐ
1ኻᄙ䈱㑐ଥ䈪䊁䊷䊑䊦⚿วಣℂ䈜䉎
[ᚢ⇛]
ⶄᢙ䈱ಽᨆಣℂ䈮䈍䈇䈩ㅢ⊛䈭䉲䊞䉾䊐䊦ಣℂ䉕
೨ಣℂ䈜䉎䈖䈫䈪䋬ಽᨆಣℂᤨ䈱䉮䉴䊃䉕ᷫ
• ⚿ว᧦ઙ㩿ਥ䉨䊷㪀䈮䉋䉎䊁䊷䊑䊦䈱 ೨䉲䊞䉾䊐䊦
• Ḱ⚿ว䈱ᵴ↪ 㪂 ਛ㑆䊂䊷䉺䈱 ೨↢ᚑ
ⶄᢙMapReduce 㑆䈪䉲䊞䉾䊐䊦㊂䉕ᷫ䈜䉎⚿ว⸘↹
[ലᨐ]
ಽᨆಣℂᤨ䈱䉲䊞䉾䊐䊦㊂䉕ᷫ
N䊁䊷䊑䊦⚿วṶ▚䈮䉋䉍MapReduce jobᢙ䉕ᷫ
PJoin 䈱ⷐ
㪊㪌
㽲
㽳
㽴
Copyright 2011 NTT Corporation
㪊㪍
Query execution
lineitem a
orders processing
+
Ḱ
Ḱ⚿ว
mapper
…
lineitem_
orders n
orders n
…
lineitem_
orders 1
orders 1
orders processing
+
Ḱ⚿ว
Joining with
liteitem
reducer
…
Joining with
liteitem
lineitem z
䊁䊷䊑䊦䈱೨䉲䊞䉾䊐䊦ታⴕ䋬Ḱ⚿วਛ㑆䊂䊷䉺䈱೨↢ᚑ
mapper 䈪Ḱ⚿วಣℂᓟ䈮䋬reducer 䈪ᱷಣℂ䉕ታⴕ
Pre-computation
lineitem
orders
hash(x)
hash(y)
…
lineitem b
lineitem a
lineitem z
orders 1
orders n
…
lineitem_
orders n
lineitem_
orders 1
hash(y)
lineitem primary key &
foreign key (orders primary key)
DFS read
shuffle
વ
ຠන䈱
ᄁ䉍䈕䊨䉫
Copyright 2011 NTT Corporation
ᓥ᧪ᚻᴺ䈫 PJoin 䈱Ყセ
㪊㪎
PJoin
lineitem a
orders processing
+
Ḱ
Ḱ⚿ว
mapper
…
lineitem_
orders n
orders n
…
lineitem_
orders 1
orders 1
orders processing
+
Ḱ⚿ว
Joining with
liteitem
reducer
…
Joining with
liteitem
lineitem z
䉲䊞䉾䊐䊦㊂䈏ᷫ
䉲䊞䉾䊐䊦㊂䈏ᷫ
DFS read
shuffle
orders 1
hash(x)
mapper
…
lineitem z
…
lineitem a
Join
reducer
…
hash(x)
Join
orders n
Join
…
䉲䊞䉾䊐䊦㊂䈏ᄢ
hash(y)
hash(y)
ᓥ᧪ᚻᴺ: 䊊䉾䉲䊠⚿ว
DFS read 䈏Ⴧട
Copyright 2011 NTT Corporation
PJoin 䈱․ᓽ㽳: N䊁䊷䊑䊦⚿ว
㪊㪏
PJoin
䉴䉺䊷ဳ䉴䉨䊷䊙䈱႐ว
ⶄᢙ mapper 䉕ታⴕ
ో mapper ⚿ᨐ䈲 S 䈱ਥ䉨䊷䈪䉲䊞䉾䊐䊦
reducer 䈲䋱䈧䈪ಣℂน⢻
mapper (T 䈱ਥ䉨䊷䈪⚿วಣℂ) reducer (S 䈱ਥ䉨䊷䈪⚿วಣℂ)
T
one manyS
T 1
one manyS
T N
one many
T 䈱ਥ䉨䊷䈪⚿วಣℂ
N䊁䊷䊑䊦⚿ว䈏 1䈧䈱
MapReduce job 䈪ታน⢻
ຠන䈱
ᄁ䉍䈕䊨䉫
વ
ຠන䈱
ᄁ䉍䈕䊨䉫
વ
ዊᄁᐫ
Copyright 2011 NTT Corporation
PJoin 䈱․ᓽ㽴: 䉴䉺䊷ဳ䉪䉣䊥⚿ว⸘↹
㪊㪐
ᄙᰴర䊂䊷䉺ಽᨆ䈱䉴䉺䊷ဳ䉴䉨䊷䊙
dimension 䊁䊷䊑䊦䈎䉌ಣℂ䉕㐿ᆎ䈚䋬fact 䊁䊷䊑䊦䉕ᦨ
ᓟ䈮ታⴕ䈜䉎䈖䈫䈪䋬ਛ㑆䊂䊷䉺䉕ᷫ䈜䉎
㫃㫀㫅㪼㫀㫋㪼㫄
㫆㫉㪻㪼㫉㫊
㫇㪸㫉㫋㫊㫌㫇㫇 㫇㪸㫉㫋
㫊㫌㫇㫇㫃㫀㪼㫉
㫅㪸㫋㫀㫆㫅
*
*
* *
*
㫃㫀㫅㪼㫀㫋㪼㫄㪃
㫆㫉㪻㪼㫉㫊㪃㩷
㩻㪡㪉
㫇㪸㫉㫋㫊㫌㫇㫇㪃㩷
㫇㪸㫉㫋㪃㩷
㩻㪡㪈
㫊㫌㫇㫇㫃㫀㪼㫉㪃
㫅㪸㫋㫀㫆㫅
J1
J2
J3
Q9 䈪䉝䉪䉶䉴䈜䉎䊁䊷䊑䊦᭴ᚑ Q9 ⚿ว⸘↹
J1
J2
J3 fact 䊁䊷䊑䊦䈱⚿ว䉕ᦨᓟ䈮ታⴕ
Copyright 2011 NTT Corporation
⹏ଔⅣႺ
Linux (CentOS5.4, 2.8GHz, ਥ⸥ᙘ 8GB)㬍50บ
Java1.6, Hadoop 0.19.2
䊔䊮䉼䊙䊷䉪: TPC-H 䊔䊮䉼䊙䊷䉪
[䊂䊷䉺] 104GB, 207GB, 311GB
Ḱ⚿วਛ㑆䊂䊷䉺: 83GB, 167GB, 250GB
[ಽᨆ䉪䉣䊥] TPC-H䈮䈍䈔䉎⚿วṶ▚䉕↪䈜䉎 17䉪䉣䊥
⹏ଔⷰὐ: PJoin 䈫ᓥ᧪䈱 Join ᚻᴺ䈱Ყセ
• ో: PJoin 䈫䋬ᓥ᧪䈱2䈧䈱 Join ᚻᴺ䈱ᔕ╵ᕈ⢻Ყセ
• PJoin vs reduce-side join (䉴䉺䊷ဳ⚿ว⸘↹, Hive ⸘↹)
• ⚦
• PJoin 䈮䉋䉎 䉲䊞䉾䊐䊦㊂䋬HDFS read/write ㊂䈱ᓇ㗀
• 䉴䉬䊷䊦ᕈ⹏ଔ
㪋㪇
Copyright 2011 NTT Corporation
ᔕ╵ᤨ㑆 (104GB, 50บ)
ᓥ᧪ᚻᴺ䉋䉍33.4% (star planᲧ), 42.8% (Hive planᲧ)
ᡷༀ䈱ലᨐ: MapReduce job ᢙᷫ 䋬HDFS+䉲䊞䉾䊐䊦㊂ᷫ
ᕈ⢻ = {(HDFS read) + (HDFS write) + 2㬍(shuffle)}㬍ଥᢙ
㪋㪈
0
1
2
3
4
5
6
re sponse tim e (m in)
PJoin
reduce-side join (star plan)
reduce-side join (Hive plan)
Copyright 2011 NTT Corporation
䉲䊞䉾䊐䊦㊂ (104GB, 50บ)
ᓥ᧪ᚻᴺ䉋䉍62.6% (star plan), 62.2% (Hive plan)ᡷༀ
Q18 䈲ᖡൻ: WHERE᧦ઙ䈱ㆬᛯലᨐ䈱ૐਅ䈏ේ࿃
㪋㪉
0
5
10
15
20
25
30
35
40
45
Shuffle (GB)
PJoin
reduce-side join (star plan)
reduce-side join (Hive plan)
Copyright 2011 NTT Corporation
HDFS read㊂ (104GB, 50บ)
ᓥ᧪ᚻᴺ䉋䉍51.4% (star plan), 52.2 % (Hive plan)Ⴧട
PJoin 䈪䈲Ḱ⚿วਛ㑆䊂䊷䉺䉕ෳᾖ䈜䉎䈢䉄
㪋㪊
0
10
20
30
40
50
60
HDFS Read (GB)
PJoin
reduce-side join (star plan)
reduce-side join (Hive plan)
Copyright 2011 NTT Corporation
䊂䊷䉺㊂: PJoin 䈱ᣇ䈏䉴䉬䊷䊦ᕈ䈏⧯ᐓఝ䉏䉎
䊙䉲䊮ᢙ: PJoin䉅ᓥ᧪ᚻᴺ䉅บᢙലᨐ䈏䈩䈇䉎
㪋㪋
0
10
20
30
40
50
60
70
㪇 㪈㪇㪇 㪉㪇㪇 㪊㪇㪇 㪋㪇㪇
re sponse tim e (m in)
䊂䊷䉺㊂ GB (50บ)
PJoin
reduce-side join (star plan)
0
50
100
150
200
250
300
350
400
㪇 㪉㪇 㪋㪇 㪍㪇
re sponse tim e (m in)
䊙䉲䊮บᢙ (䊂䊷䉺㊂ 311GB)
PJoin
reduce-side join (star plan)
Copyright 2011 NTT Corporation
㪋㪌
PJoin 䉁䈫䉄
․ ․ᓽ:
䉲䊞䉾䊐䊦ಣℂ䈱೨ታⴕ(pre-partitioning)䋬Ḱ⚿
วਛ㑆䊂䊷䉺䈱೨↢ᚑ
Ḱ ⚿ ว 䉕 mapper 䈪 ታ ⴕ 䋬 ᱷ 䉍 䈱 ⚿ ว ಣ ℂ 䉕
reducer 䈪ታⴕ
ല ലᨐ:
TPC-H 䈮䈍䈇䈩 30-40%ᔕ╵ᕈ⢻䉕ᡷༀ
䉲䊞䉾䊐䊦㊂䈲1/3 䈮ᷫ
ᓟ䈱⺖㗴:
ᦨㆡ䊒䊤䊮ㆬᛯ
MapReduce 䈱⿷࿁䉍䈱⺖㗴 (HDFS read/write)
back
Copyright 2011 NTT Corporation
⎇ⓥ䈱ᣇะᕈ
㪋㪍
㪤
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
⎇ⓥ䈱ᣇะᕈ
㪋㪎
MapReduce 䈱䊜䊥䉾䊃
ૐ䉮䉴䊃䈪◲න䈮(ᄢⷙᮨ䈭)ಽᢔಣℂ䈏น⢻
Hadoop 䉮䊚䊠䊆䊁䉞䈮ෳട䈪䈐䉎
Amazon 䉕䈋䈳ᄢⷙᮨⅣႺ䉅↪䈚䉇䈜䈇
䉪䊤䉡䊄♽⎇ⓥ䈱㔍䈚䈘
Hadoop 䈲1000บⷙᮨ䈓䉌䈇䉁䈪䈲䉴䉬䊷䊦䈜䉎
ᄢⷙᮨ䈭䈾䈬䉴䉬䊷䊦䊜䊥䉾䊃䈏↢䈐䈩䈒䉎
ᄢⷙᮨ䈭ⅣႺ䈪䈱ᛛⴚ⊛䈭⺖㗴䈮ᚻ䈏ዯ䈎䈭䈇
• The Next Generation of Apache Hadoop MapReduce
㪤
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
Copyright 2011 NTT Corporation
⎇ⓥ䈱ᣇะᕈ
㪋㪏
⎇ⓥ䈱ᣇะᕈ
MR䈱ᡷༀ䈱䈏䈅䉍DBᛛⴚ䈱ㆡ↪䈲⛯䈒
• ข䉍⚵䉂䉇䈜䈇䈏ᄢ䈐䈇⽸₂䈲㔍䈚䈇
MapReduce ⁛⥄䈱ⷰὐ
• shuffle ㊂䈱ᷫ䋬ᄙᲑ job ಣℂ
• ᛩᯏታⴕ䋬㓚ኂ࿁ᓳ
ಽᢔಣℂ䈮䈍䈔䉎↢↥ᕈะ
• 䊄䊜䉟䊮․ൻဳ䈱ಽᢔ䊝䊂䊦 (Pregel, Parcolator)
• SQL䉇㑐ᢙဳ⸒⺆䈱ᦨㆡൻᛛⴚ䉕ᵴ↪
• ት⸒⊛⸒⺆䈎䉌ታⴕ䊒䊤䊮䉕ዉ
• 㑐ᢙဳ⸒⺆: ⚵ൻ䋬Ⲣว
Copyright 2011 NTT Corporation
䈍䉒䉍䈮
㪋㪐
Hadoop䈱OSS䉕䊔䊷䉴䈮ข䉍⚵䉄䉎䊜䊥䉾䊃
䉮䊚䊠䊆䊁䉞䈻䈱⽸₂
ᄙ䈒䈱↪⠪䈲 Hadoop䈏ਁ⢻䈣䈫⠨䈋䈩䈇䉎
PBⷙᮨ䈱䊂䊷䉺䈲䊙䉲䊮䈏ᄙ䈒䈩䉅ኈᤃ䈪䈲䈭䈇
NTT ⎇ⓥᚲ䈪䈱ข䉍⚵䉂
R&D 䉪䊤䉡䊄 1500บⅣႺ
ᔕ↪
ᄢⷙᮨ web䊂䊷䉺, 䉪䉣䊥䊨䉫䋬䉝䉪䉶䉴䊨䉫
䊃䊤䊐䉞䉾䉪䊂䊷䉺⸃ᨆ
㖸ᭉ≮ᴺ䊂䊷䉺䊶䊙䉟䊆䊮䉫
㪤㪸㫇㪩㪼㪻㫌㪺㪼㩷䈫䈲䋿 㜞ㅦൻ䈱⎇ⓥേะ 㜞ㅦൻ䈱ข䉍⚵䉂 ⎇ⓥ䈱ᣇะᕈ
⥝䈱䈅䉎ᣇ䋬ᤚ㕖⼏⺰䈚䉁䈚䉊䈉䋣
Copyright 2011 NTT Corporation
ᢥ₂
•
[Abouzeid et al., PVLDB 2009] HadoopDB: An Architectural Hybrid of MapReduce and DBMS
Technologies for Analytical Workloads, A. Abouzeid et al., In PVLDB 2(1), 2009
•
[Afrati et al., EDBT 2010] Optimizing joins in a map-reduce environment, F. N. Afrati, J. D. Ullman, In
EDBT 2010
•
[Agrawal et al., VLDB 2010] Big Data and Cloud Computing:
New Wine or just New Bottles?, D. Agrawal, S. Das, A. Abbadi, In VLDB tutorial 2010
•
[Blanas et al., SIGMOD 2010] A Comparison of Join Algorithms for Log Processing in MapReduce, S.
Blanas, J.M. Patel, V.Ercegovac, J. Rao, In SIGMOD 2010
•
[Bu et al., PVLDB 2010] HaLoop: Efficient Iterative Data Processing on Large Clusters, S. Blanas, J.M.
Patel, V.Ercegovac, J. Rao, In SIGMOD 2010
•
[Chen, PVLDB 2010] Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce, S.
Chen, In PVLDB 3(2), 2010
•
[Chu et al., NIPS 2006] Map-reduce for machine learning on multicore, C. Chu, S. Kim, Y. Lin, Y. Yu, G.
Bradski, A. Ng, K. Olukotun, In NIPS 2006
•
[Condie et al., NSDI 2010] MapReduce Online, T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, In
NSDI2010
•
[Dean et al., ODSI 2004] MapReduce: Simplified Data Processing on Large Clusters, J. Dean, S.
Ghemawat, In OSDI 2004
•
[Dean et al., CACM 2008] MapReduce: Simplified Data Processing on Large Clusters, J. Dean, S.
Ghemawat, In CACM Jan 2008
•
[Dean et al., CACM 2010] MapReduce: a flexible data processing tool, J. Dean, S. Ghemawat, In CACM
Jan 2010
•
[DeWitt, 2009] Mapreduce: A major step backwards, D. DeWitt, M. Stonebraker, The Database column
2009
㪌㪇
Copyright 2011 NTT Corporation
ᢥ₂
•
[Dittrich et al., PVLDB 2010] Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even
Noticing), J. Dittrich, J. Quiane-Ruiz, A. Jindal, Y. Kargin, V. Setty. J. Schad, In PVLDB 3(1), 2010
•
[Gupta et al., IEEE 1995] Maintenance of materialized views: Problems, techniques and applications, UA.
Gupta, I.S. Mumick, IEEE Data Eng. 18(2) 1995
•
[Isard et al., EuroSys 2007] Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks,
M. Budiu, Y. Yu, A. Birrell, D.Fetterly, In EuroSys 2007
•
[He et al., PACT 2008] Mars: A MapReduce framework on graphics processors, B. He, W. Fang, Q. Luo,
N.K. Govindaraju, T. Wang, In PACT 2008
•
[Kang et al., ICDM 2009] PEGASUS: A Peta-Scale Graph Mining System Implementation and
Observations, U. Kang, C. Tsourakakis, C. Faloutsos, In ICDM 2009
•
[Lang et al., PVLDB 2010] Energy Management for MapReduce Clusters, W. Lang, J. M. Patel, In VLDB
3(1), 2010
•
[Lin et al., 2010] Data intensive text processing with MapReduce, J. Lin and C. Dyer, Morgan & Claypool
2010
•
[Malewicz, SIGMOD 2010] Pregel: a system for large-scale graph processing, G. Malewicz, M. Austern, A.
Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski, In SIGMOD 2010
•
[Nykiel et al., PVLDB 2010] MRShare: Sharing across multiple queries in MapReduce, T. Nykiel, M.
Potamias, C. Mishra, G. Kollios, N. Kodas, In VLDB 3(1), 2010
•
[Panda et al., PVLDB 2009] PLANET: Massively parallel learning of tree ensembles with MapReduce, B.
Panda, J.S. Herbach, S. Basu, R.J. Bayardo, In PVLDB 2(2), 2009
•