Hadoop Terasort VirtualBox benchmarks

Setup: 3-node hadoop (hdp v2.6.1), installed via Ambari 2.5.1 and running in VirtualBox 5.1 on one physical machine.

The terasort is traditionally run on 1 terabyte of random data (SIZE=10,000,000,000). In 2008, Yahoo! ran terasort on 1TB of data in 209 seconds, on a cluster of 900 nodes.

However, this benchmark is for significantly less data (SIZE=10,000), or 1/1,000,000th the amount of data and significantly fewer nodes (1 physical machine).

teragen: 340 seconds
terasort: 352 seconds
teravalidate: 6 seconds

Each (virtual) node:
8GB RAM, 2CPU, 80GB separate /hadoop/hdfs/data mount, Ubuntu 14.04.2 LTS

Physical machine:
32 GB RAM, 8-core AMD FX-8350 4.0GHz, 1TB WD Black HDD, Ubuntu 16.04 LTS xenial

This entry was posted in Storage. Bookmark the permalink.