TERASORT – benchmark using hadoop-mapreduce-examples.jar

hadoop jar /usr/hdp/ teragen 10000000000 /teraInput
# hdfs dfs -mv /teraInput /user/root/10000000
# hadoop jar /usr/hdp/ terasort 10000000 /teraInput /teraOutput
# hdfs dfs -mv /teraInput /teraOutput
# hadoop jar /usr/hdp/ teravalidate /teraOutput /teraValidate

REF: Running-TeraSort-MapReduce-Benchmark

Ambari LDAP setup and Kerberos Setup

 Ambari steps to configure LDAP

Note: These steps performed on Ambari version 2.2.2 with HDP 2.3.2 hortonworks hadoop version.
  1. Configure /etc/ambari-server/conf/ambari.properties

Used ambari-ldap-ad.sh to update ambari.properties file

cat <<-‘EOF’ | sudo tee -a /etc/ambari-server/conf/ambari.properties

Now run the command:

#ambari-server setup-ldap

It will read all required properties from the ambari.properties file which got setup above. Some important properties are:

Primary URL* (host:port): (activedirectory.example.com:389)

Base DN* (dc=example,dc=com)

Manager DN* (cn=ldap-connect,ou=users,ou=hdp,dc=example,dc=com)

Enter Manager Passwrod*: ******

Re-Enter passwrod: ******

ambari-server start

create users.csv or groups.csv with required users and groups to be sync with Ambari.

echo “user1,user2,user3” > users.txt

echo “group1,group2,group3” > groups.txt

ambari-server sync-ldap --user users.txt

ambari-server sync-ldap --group groups.txt

Enter Ambari Admin login: admin

Enter Ambari Admin password: *******


Pre requisite: Get the Service Principal (Ad service account if AD is configured for Kerberos)

Steps to create new service principal, setpassword and create keytab  (AD with centrify configuration)
  1.  Create Ambari Service Principal (Service account in Active directory, typically we take help of AD admin team to create this AD service account)


adkeytab --new --upn ambari-adm@example.com --keytab ambari-adm.keytab --container "OU=Hadoop,OU=Application,DC=example,DC=com" -V ambari-adm --user adadmin@example.com --ignore

3. Set passwod for the new principal (Ad service account)

adpasswd  -a adadmin@example.com ambari-adm@example.com

4.Generate Keytab file for this user account (Again AD admin will help)

adkeytab -A --ignore -u adadmin@example.com -K ambari-adm.keytab -e arcfour-hmac-md5 --fource --newpassword P@$$w0rd -S ambari-adm  ambari-adm -V
Now setup ambari with kerberos
 ambari-server setup-security

Select option: 3

Setup Ambari Kerberos JAAS configuration.

Enter Ambari Server’s kerberos Principal Name: amabri-adm@example.com

Enter keytab path: /root/ambari-adm@example.com

Note: keep 600 permissions the keytab file

Once setup is done, need to configure kerberos principal

Hive View configuration:

Hive Authentication=auth=KERBEROS;principal=hive/<hive host fqdn>@EXAMPLE.COM;hive.server2.proxy.user=$(username)

WebHDFS Authentication=auth=KERBEROS;proxyuser=ambari-adm@EXAMPLE.COM

It requires proxy user configuration (personification) in HADOOP configuration: setup_HDFS_proxy_user




HADOOP ISSUES – dncp_block_verification.log.prev too large

dncp_block_verification.log.prev too large

Seen this bug in hdp2.3.2

Using hadoop version : Hadoop 2.0.0-cdh4.7.0

can see on one datanode, dncp_block_verification.log.prev is too large.

Is it safe to delete this file?

-rw-r--r-- 1 hdfs hdfs 1166438426181 Oct 31 09:34 dncp_block_verification.log.prev
-rw-r--r-- 1 hdfs hdfs     138576163 Dec 15 22:16 dncp_block_verification.log.curr

This is similar to HDFS-6114 but that is for dncp_block_verification.log.curr file.





Microsoft cloud Azure hadoop component versions

Microsoft cloud Azure hadoop component versions

Component HDInsight version 3.5 HDInsight version 3.4 (Default) HDInsight Version 3.3 HDInsight Version 3.2 HDInsight Version 3.1 HDInsight Version 3.0
Hortonworks Data Platform 2.5 2.4 2.3 2.2 2.1.7 2.0
Apache Hadoop & YARN 2.7.3 2.7.1 2.7.1 2.6.0 2.4.0 2.2.0
Apache Tez 0.7.0 0.7.0 0.7.0 0.5.2 0.4.0
Apache Pig 0.16.0 0.15.0 0.15.0 0.14.0 0.12.1 0.12.0
Apache Hive & HCatalog 1.2.1 1.2.1 0.14.0 0.13.1 0.12.0
Apache HBase 1.1.2 1.1.2 1.1.1 0.98.4 0.98.0
Apache Sqoop 1.4.6 1.4.6 1.4.6 1.4.5 1.4.4 1.4.4
Apache Oozie 4.2.0 4.2.0 4.2.0 4.1.0 4.0.0 4.0.0
Apache Zookeeper 3.4.6 3.4.6 3.4.6 3.4.6 3.4.5 3.4.5
Apache Storm 1.0.1 0.10.0 0.10.0 0.9.3 0.9.1
Apache Mahout 0.9.0+ 0.9.0+ 0.9.0+ 0.9.0 0.9.0
Apache Phoenix 4.7.0 4.4.0 4.4.0 4.2.0
Apache Spark 1.6.2 + 2.0 (Linux only) 1.6.0 (Linux only) 1.5.2 (Linux only/Experimental build) 1.3.1 (Windows-only)



Install Yarn on Ubuntu Cluster via Scripts

There is beautiful blog on using scripts to install YARN on Ubuntu


You can use the script hadoop2-install-scripts.zip to install hadoop on centos systems.

I come across these steps when reading Arun C Murthy and Vinod Kumar Vavillapalli’s  book Apache Hadoop Yarn.



ambari-server setup-security

ambari-server setup-security
Using python  /usr/bin/python
Security setup options…
Choose one of the following options:
[1] Enable HTTPS for Ambari server.
[2] Encrypt passwords stored in ambari.properties file.
[3] Setup Ambari kerberos JAAS configuration.
[4] Setup truststore.
[5] Import certificate to truststore.
Enter choice, (1-5): 3
Setting up Ambari kerberos JAAS configuration to access secured Hadoop daemons…
Enter ambari server’s kerberos principal name (ambari@EXAMPLE.COM): qa-ambari-server@ABC.COM
Enter keytab path for ambari server’s kerberos principal: /root/qa-ambari-server.keytab
Ambari Server ‘setup-security’ completed successfully.