Hadoop Recovery

Hadoop Recovery

Following Hadoop Recovery topics are covered in this blog. These steps are useful for hadoop admins.


Excellent blog on HDFS (Hadoop Distributed File System)

The Architecture of Open Source Applications_ The Hadoop Distributed File System

This article answers many questions on HDFS for those who want to understand HDFS in depth.

How to deploy custom jar files in apache hive (hortonworks hdp)

Below activity need to be performed in all hive servers, hive metastore and hive client nodes.

  1. Create the folder if not exists  “/usr/hdp/”
  2. copy the custom build jar into this folder “customserde.jar”
  3. Restart the hive service
  4. verify with “ps -ef|grep -hive|grep customserde”. Hive process should have loaded this file along with path in section “–hiveconf hive.aux.jars.path=”

Post rolling upgrade hdp2.5.3 deleting files from hdfs not recovered the space

Recently upgraded from hdp 2.3.2 to 2.5.3 using rolling upgrade method. Wile upgrade status in Pause status, pending for commit. HDFS preserving all data in hdfs even after using -skipTrash option.

hdfs dfs -du -s -h give correct size after deletion. But hdfs dfsadmin -report show higher value.

Identified root cause working with hwx is : It keeps data in “trash” folder in datanode disks allocated for dfs storage.

This will be cleared after commit the rolling upgrade through Ambari or manually ” dfsadmin -finalizeUpgrade”



TERASORT – benchmark using hadoop-mapreduce-examples.jar

hadoop jar /usr/hdp/ teragen 10000000000 /teraInput
# hdfs dfs -mv /teraInput /user/root/10000000
# hadoop jar /usr/hdp/ terasort 10000000 /teraInput /teraOutput
# hdfs dfs -mv /teraInput /teraOutput
# hadoop jar /usr/hdp/ teravalidate /teraOutput /teraValidate

REF: Running-TeraSort-MapReduce-Benchmark

Ambari LDAP setup and Kerberos Setup

 Ambari steps to configure LDAP

Note: These steps performed on Ambari version 2.2.2 with HDP 2.3.2 hortonworks hadoop version.
  1. Configure /etc/ambari-server/conf/ambari.properties

Used ambari-ldap-ad.sh to update ambari.properties file

cat <<-‘EOF’ | sudo tee -a /etc/ambari-server/conf/ambari.properties

Now run the command:

#ambari-server setup-ldap

It will read all required properties from the ambari.properties file which got setup above. Some important properties are:

Primary URL* (host:port): (activedirectory.example.com:389)

Base DN* (dc=example,dc=com)

Manager DN* (cn=ldap-connect,ou=users,ou=hdp,dc=example,dc=com)

Enter Manager Passwrod*: ******

Re-Enter passwrod: ******

ambari-server start

create users.csv or groups.csv with required users and groups to be sync with Ambari.

echo “user1,user2,user3” > users.txt

echo “group1,group2,group3” > groups.txt

ambari-server sync-ldap --user users.txt

ambari-server sync-ldap --group groups.txt

Enter Ambari Admin login: admin

Enter Ambari Admin password: *******


Pre requisite: Get the Service Principal (Ad service account if AD is configured for Kerberos)

Steps to create new service principal, setpassword and create keytab  (AD with centrify configuration)
  1.  Create Ambari Service Principal (Service account in Active directory, typically we take help of AD admin team to create this AD service account)


adkeytab --new --upn ambari-adm@example.com --keytab ambari-adm.keytab --container "OU=Hadoop,OU=Application,DC=example,DC=com" -V ambari-adm --user adadmin@example.com --ignore

3. Set passwod for the new principal (Ad service account)

adpasswd  -a adadmin@example.com ambari-adm@example.com

4.Generate Keytab file for this user account (Again AD admin will help)

adkeytab -A --ignore -u adadmin@example.com -K ambari-adm.keytab -e arcfour-hmac-md5 --fource --newpassword P@$$w0rd -S ambari-adm  ambari-adm -V
Now setup ambari with kerberos
 ambari-server setup-security

Select option: 3

Setup Ambari Kerberos JAAS configuration.

Enter Ambari Server’s kerberos Principal Name: amabri-adm@example.com

Enter keytab path: /root/ambari-adm@example.com

Note: keep 600 permissions the keytab file

Once setup is done, need to configure kerberos principal

Hive View configuration:

Hive Authentication=auth=KERBEROS;principal=hive/<hive host fqdn>@EXAMPLE.COM;hive.server2.proxy.user=$(username)

WebHDFS Authentication=auth=KERBEROS;proxyuser=ambari-adm@EXAMPLE.COM

It requires proxy user configuration (personification) in HADOOP configuration: setup_HDFS_proxy_user