MapReduce, YARN and now Spark –Hadoop competing with itself

Spark officially sets a new record in large-scale sorting

Reference:  http://sortbenchmark.org/ApacheSpark2014.pdf

Hadoop MR
Record
Spark
Record
Spark
1 PB
Data Size 102.5 TB 100 TB 1000 TB
Elapsed Time 72 mins 23 mins 234 mins
# Nodes 2100 206 190
# Cores 50400 physical 6592 virtualized 6080 virtualized
Cluster disk throughput 3150 GB/s
(est.)
618 GB/s 570 GB/s
Sort Benchmark Daytona Rules Yes Yes No
Network dedicated data center, 10Gbps virtualized (EC2) 10Gbps network virtualized (EC2) 10Gbps network
Sort rate 1.42 TB/min 4.27 TB/min 4.27 TB/min
Sort rate/node 0.67 GB/min 20.7 GB/min 22.5 GB/min
Advertisements