Microsoft cloud Azure hadoop component versions

Microsoft cloud Azure hadoop component versions

Component HDInsight version 3.5 HDInsight version 3.4 (Default) HDInsight Version 3.3 HDInsight Version 3.2 HDInsight Version 3.1 HDInsight Version 3.0
Hortonworks Data Platform 2.5 2.4 2.3 2.2 2.1.7 2.0
Apache Hadoop & YARN 2.7.3 2.7.1 2.7.1 2.6.0 2.4.0 2.2.0
Apache Tez 0.7.0 0.7.0 0.7.0 0.5.2 0.4.0
Apache Pig 0.16.0 0.15.0 0.15.0 0.14.0 0.12.1 0.12.0
Apache Hive & HCatalog 1.2.1 1.2.1 0.14.0 0.13.1 0.12.0
Apache HBase 1.1.2 1.1.2 1.1.1 0.98.4 0.98.0
Apache Sqoop 1.4.6 1.4.6 1.4.6 1.4.5 1.4.4 1.4.4
Apache Oozie 4.2.0 4.2.0 4.2.0 4.1.0 4.0.0 4.0.0
Apache Zookeeper 3.4.6 3.4.6 3.4.6 3.4.6 3.4.5 3.4.5
Apache Storm 1.0.1 0.10.0 0.10.0 0.9.3 0.9.1
Apache Mahout 0.9.0+ 0.9.0+ 0.9.0+ 0.9.0 0.9.0
Apache Phoenix 4.7.0 4.4.0 4.4.0 4.2.0
Apache Spark 1.6.2 + 2.0 (Linux only) 1.6.0 (Linux only) 1.5.2 (Linux only/Experimental build) 1.3.1 (Windows-only)




If possible join this Webinar

How Johnson Controls Moved From Proof of Concept to a Global Big Data Solution

Install Yarn on Ubuntu Cluster via Scripts

There is beautiful blog on using scripts to install YARN on Ubuntu

You can use the script to install hadoop on centos systems.

I come across these steps when reading Arun C Murthy and Vinod Kumar Vavillapalli’s  book Apache Hadoop Yarn.



Extend AWS RHEL 6 instance root file system size

AWS: Free tier t2.micro 2 instance A and B

OS: RHEL 6.6

root FS: Ext4

Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      5.8G  4.1G  1.5G  74% /

Using: AWS CLI (They can be done easily from aws console with web ui)

Task: Root file system of instance B from 6G to available space.

Disclaimer: Please be careful while performing these steps as they are too risk and can cause loss of entire system.


  1.  Stop the instance B #aws ec2 stop-instances –instance-ids <instance B ID>
  2. Detach the root volume from instance B  #aws ec2 detach-volume –volume-id=”vol-xxxxxxxxx”
  3. Create snashot (For recovery in case you last the file system while doing below steps) $aws ec2 create-snapshot –volume-id “vol-xxxxxxxxx” –description “instance B root vol backup”

–Note: This is an extremly important to create snapshot for the safety to recover

  1. Attach to Instance A #aws ec2 attach-volume –volume-id=”vol-0d61acbe3bff8d115″ –instance-id=”i-05c133fd050dbd04d” –device=”/dev/sdf”
  2. Connect to the instance A through ssh $ssh -i “mykey.pem” ec2-user@<public ip of instanc A>
  3. Use command “lsblk” to verify the list of devices $sudo lsblk
    xvda    202:0    0  10G  0 disk
    └─xvda1 202:1    0   6G  0 part /
    xvdb    202:16   0  10G  0 disk /hadoop
    xvdf    202:80   0  30G  0 disk
    └─xvdf1 202:81   0  6G  0 part
  4.  Using parted, recreated the 6G partition xvdf1 to 10G $sudo parted /dev/xvdf
  5.  Change units to sectors (parted) unit s
  6. Print the current partion
    (parted) print
    Model: Xen Virtual Block Device (xvd)
    Disk /dev/xvdf: 62914560s
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    Number Start End Size File system Name Flags
    1 2048s 12584959s 12582912s ext4 ext4 boot
  7. Delete the partion (parted) rm 1
  8. Create new partition (parted) mkpart ext4 2048s 100%  Warning: You requested a partition from 2048s to 62914559s. The closest location we can manage is 2048s to 20971486s.Is this still acceptable to you?Yes/No? Yes
  9. Print New Partition and verify(parted) print
    Model: Xen Virtual Block Device (xvd)
    Disk /dev/xvdf: 62914560s
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt

    Number Start End Size File system Name Flags
    1 2048s 20971486s 20969439s ext4 ext4

  10. Make it bootable  (parted) set 1 boot on
  11. Quit from parted (parted) quit
  12. Run e2fsck $sudo e2fsck -f /dev/xvdf1
    e2fsck 1.41.12 (17-May-2010)
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    /dev/xvdf1: 58700/393216 files (0.1% non-contiguous), 631205/1572864 blocks
  13.  Extend using resize2fs command $sudo resize2fs /dev/xvdf1
    resize2fs 1.41.12 (17-May-2010)
    Resizing the filesystem on /dev/xvdf1 to 2621179 (4k) blocks.
    The filesystem on /dev/xvdf1 is now 2621179 blocks long.
  14. Verify attaching the extended partition $sudo mount -t ext4 /dev/xvdf1 /mnt
  15. Verify the file system with ls and $sudo df -h /mnt
    Filesystem Size Used Avail Use% Mounted on
    /dev/xvdf1 9.8G 2.2G 7.1G 24% /mnt

Now detach the ebs volume from instance A and attach it to instance B as root device i.e. “/dev/sda”

16. aws ec2 detach-volume –volume-id=”vol-*********”

17. aws ec2 attach-volume –volume-id=”vol-*********” –instance-id=”Instance B id” –device=”/dev/sda1
Note: attaching device as /dev/sda1 is import, as aws treats that as root device.

18. Start the instance B

aws ec2 start-instances –instance-id=”Instance B id”

19. Connect to the instance B through ssh and check file system size.

$ssh -i “mykey.pem” ec2-user@<public ip of instanc B>

df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 9.8G 2.2G 7.1G 24% /
tmpfs 498M 0 498M 0% /dev/shm




Hive Tuning

10 Best Practices for Apache Hive