MapReduce, YARN and now Spark –Hadoop competing with itself

Spark officially sets a new record in large-scale sorting


Hadoop MR
1 PB
Data Size 102.5 TB 100 TB 1000 TB
Elapsed Time 72 mins 23 mins 234 mins
# Nodes 2100 206 190
# Cores 50400 physical 6592 virtualized 6080 virtualized
Cluster disk throughput 3150 GB/s
618 GB/s 570 GB/s
Sort Benchmark Daytona Rules Yes Yes No
Network dedicated data center, 10Gbps virtualized (EC2) 10Gbps network virtualized (EC2) 10Gbps network
Sort rate 1.42 TB/min 4.27 TB/min 4.27 TB/min
Sort rate/node 0.67 GB/min 20.7 GB/min 22.5 GB/min

Author: rajukv

Hadoop(BigData) Architect and Hadoop Security Architect can design and build hadoop system to meet various data science projects.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s