Hortonworks from HDP 2.6.3 do not support JDK7 it’s due to due to end of life of JDK7
Hortonworks support Matrices for HDP 2.6.0
Horonworks document for upgrading to JDK8
Cloudera document for upgrading to JDK8 cdh_cm_upgrading_to_jdk81
Below activity need to be performed in all hive servers, hive metastore and hive client nodes.
Recently upgraded from hdp 2.3.2 to 2.5.3 using rolling upgrade method. Wile upgrade status in Pause status, pending for commit. HDFS preserving all data in hdfs even after using -skipTrash option.
hdfs dfs -du -s -h give correct size after deletion. But hdfs dfsadmin -report show higher value.
Identified root cause working with hwx is : It keeps data in “trash” folder in datanode disks allocated for dfs storage.
This will be cleared after commit the rolling upgrade through Ambari or manually ” dfsadmin -finalizeUpgrade”
mysql-config-editor is the answer to manage mysql without displaying password on prompt or passing plain text files.
This tool mysql_config_editor will created encrypted .mylogin.cnf file, this will resolve the age old problem of reading cleartext password for the mysql scripts.
Command has the following options:
set [command options] Sets user name/password/host name/socket/port
for a given login path (section).
remove [command options] Remove a login path from the login file.
print [command options] Print all the options for a specified
reset [command options] Deletes the contents of the login file.
help Display this usage/help information.
hadoop jar /usr/hdp/188.8.131.52-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar teragen 10000000000 /teraInput
# hdfs dfs -mv /teraInput /user/root/10000000 # hadoop jar /usr/hdp/184.108.40.206-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort 10000000 /teraInput /teraOutput
# hdfs dfs -mv /teraInput /teraOutput # hadoop jar /usr/hdp/220.127.116.11-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar teravalidate /teraOutput /teraValidate
Note: These steps performed on Ambari version 2.2.2 with HDP 2.3.2 hortonworks hadoop version.
Used ambari-ldap-ad.sh to update ambari.properties file
cat <<-‘EOF’ | sudo tee -a /etc/ambari-server/conf/ambari.properties
Now run the command:
It will read all required properties from the ambari.properties file which got setup above. Some important properties are:
Primary URL* (host:port): (activedirectory.example.com:389)
Base DN* (dc=example,dc=com)
Manager DN* (cn=ldap-connect,ou=users,ou=hdp,dc=example,dc=com)
Enter Manager Passwrod*: ******
Re-Enter passwrod: ******
create users.csv or groups.csv with required users and groups to be sync with Ambari.
echo “user1,user2,user3” > users.txt
echo “group1,group2,group3” > groups.txt
ambari-server sync-ldap --user users.txt ambari-server sync-ldap --group groups.txt
Enter Ambari Admin login: admin
Enter Ambari Admin password: *******
Pre requisite: Get the Service Principal (Ad service account if AD is configured for Kerberos)
adkeytab --new --upn email@example.com --keytab ambari-adm.keytab --container "OU=Hadoop,OU=Application,DC=example,DC=com" -V ambari-adm --user firstname.lastname@example.org --ignore
3. Set passwod for the new principal (Ad service account)
adpasswd -a email@example.com firstname.lastname@example.org
4.Generate Keytab file for this user account (Again AD admin will help)
adkeytab -A --ignore -u email@example.com -K ambari-adm.keytab -e arcfour-hmac-md5 --fource --newpassword P@$$w0rd -S ambari-adm ambari-adm -V
Select option: 3
Setup Ambari Kerberos JAAS configuration.
Enter Ambari Server’s kerberos Principal Name: firstname.lastname@example.org
Enter keytab path: /email@example.com
Note: keep 600 permissions the keytab file
Once setup is done, need to configure kerberos principal
Hive View configuration:
Hive Authentication=auth=KERBEROS;principal=hive/<hive host fqdn>@EXAMPLE.COM;hive.server2.proxy.user=$(username)
It requires proxy user configuration (personification) in HADOOP configuration: setup_HDFS_proxy_user
hadoop.proxyuser.ambari-adm.hosts=* hadoop.proxyuser.ambari-adm.groups=* hadoop.proxyuser.ambari-adm.users=*
> When a single drive fails on a worker node in HDFS, can this adversely affect performance of jobs running on this node?
The answer to this question is it depends. If this node is running a job that is accessing blocks on the failed volume, then yes. it is also possible that the job would be treated as failed if the dfs.datanode.failed.volumes.tolerated is not greater than 0. If it is not a value greater than zero, then HDFS treats a loss of a volume as catastrophic and marks the datanode as failed. If this is set to a value greater than zero, then node will work well until we lose more volumes.
> If this could cause a performance impact, how can our customers monitor for these drive failures in order to take corrective action?
Now this is a hard question to answer without further details. I am tempted to answer that the performance benefit you are going to get by monitoring and relying on a human being to take a corrective action is very doubtful. YARN / MR or whatever execution engine you are using is probably going to be much more efficient at re-scheduling your jobs.
>Or does the DataNode process quickly mark the drive and its HDFS blocks as “unusable”.
Datanode does mark the volume as failed , and namenode will learn that all the blocks on that failed volume are not available on that datanode any more. This happens via something called “block reports”. Once namenode learns that data node has lost the replica of a block then namenode will initiate appropriate replication. Since namenode knows about the loss of blocks, further jobs that need access to those block would most probably not be scheduled on that node. This again depends on the scheduler and its policies.