MapReduce, YARN and now Spark –Hadoop competing with itself

Spark officially sets a new record in large-scale sorting

Reference:  http://sortbenchmark.org/ApacheSpark2014.pdf

Hadoop MR
Record
Spark
Record
Spark
1 PB
Data Size 102.5 TB 100 TB 1000 TB
Elapsed Time 72 mins 23 mins 234 mins
# Nodes 2100 206 190
# Cores 50400 physical 6592 virtualized 6080 virtualized
Cluster disk throughput 3150 GB/s
(est.)
618 GB/s 570 GB/s
Sort Benchmark Daytona Rules Yes Yes No
Network dedicated data center, 10Gbps virtualized (EC2) 10Gbps network virtualized (EC2) 10Gbps network
Sort rate 1.42 TB/min 4.27 TB/min 4.27 TB/min
Sort rate/node 0.67 GB/min 20.7 GB/min 22.5 GB/min
Advertisements

Author: rajukv

Hadoop(BigData) Architect and Hadoop Security Architect can design and build hadoop system to meet various data science projects.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s