Hadoop: hdfs block size, replication

Block Size Parameters:

 hdfs dfs -D dfs.blocksize=1024 -put lfile /user/test
put: Specified block size is less than configured minimum value (dfs.namenode.fs-limits.min-block-size): 1024 < 1048576
$hdfs dfs -D dfs.blocksize=1048576 -put lfile /user/test

dfs.namenode.fs-limits.min-block-size 1048576 Minimum block size in bytes, enforced by the Namenode at create time. This prevents the accidental creation of files with tiny block sizes (and thus many blocks), which can degrade performance.
dfs.namenode.fs-limits.max-blocks-per-file 1048576 Maximum number of blocks per file, enforced by the Namenode on write. This prevents the creation of extremely large files which can degrade performance.
dfs.blocksize 134217728 The default block size for new files, in bytes. You can use the following suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in bytes (such as 134217728 for 128 MB).
dfs.client.block.write.retries 3

Replication Parameters:

dfs.replication 3 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
dfs.replication.max 512 Maximal block replication.
dfs.namenode.replication.min 1 Minimal block replication.

HADOOP: hdfs quotas

Two Types of Quotas:

  1. Name Quota – limit number of files/directories
  2. Space Quota – limits size(storage) of the directory

hdfs dfsadmin -help|grep quota
-setQuota <quota> <dirname>…<dirname>: Set the quota <quota> for each directory <dirName>.
The directory quota is a long integer that puts a hard limit
For each directory, attempt to set the quota. An error will be reported if
Note: A quota of 1 would force the directory to remain empty.
-clrQuota <dirname>…<dirname>: Clear the quota for each directory <dirName>.
For each directory, attempt to clear the quota. An error will be reported if
It does not fault if the directory has no quota.
-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>…<dirname>: Set the space quota <quota> for each directory <dirName>.
The space quota is a long integer that puts a hard limit
a 1GB file with replication of 3 consumes 3GB of the quota.
For each directory, attempt to set the quota. An error will be reported if
The storage type specific quota is set when -storageType option is specified.
-clrSpaceQuota [-storageType <storagetype>] <dirname>…<dirname>: Clear the space quota for each directory <dirName>.
For each directory, attempt to clear the quota. An error will be reported if
It does not fault if the directory has no quota.
The storage type specific quota is cleared when -storageType option is specified.

Examples:

SPACE QUOTA

 $hdfs dfsadmin -setQuota 10 /user/test
$hadoop fs -count -q /user/test
10               7            none             inf            2            1          209715200 /user/test

for i in 1 2 3 4 5 6 7 8 9;do hdfs dfs -mkdir /user/test/file_$i; done
mkdir: The NameSpace quota (directories and files) of directory /user/test is exceeded: quota=10 file count=11
mkdir: The NameSpace quota (directories and files) of directory /user/test is exceeded: quota=10 file count=11
mkdir: The NameSpace quota (directories and files) of directory /user/test is exceeded: quota=10 file count=11

Let’s clear the name quota:

$hdfs dfsadmin -clrQuota /user/test
$hadoop fs -count -q -h /user/test
none             inf            none             inf            9            1              200 M /user/test

SPACE QUOTA

hdfs dfsadmin -setSpaceQuota 300m /user/test
$ hadoop fs -count -q -h /user/test
none             inf           300 M          -100 M            9            1              200 M /user/test

hdfs dfs -put /tmp/lfile /user/test

put: The DiskSpace quota of /user/test is exceeded: quota = 314572800 B = 300 MB but diskspace consumed = 343932928 B = 328 MB

Let’s clear the space quota

hdfs dfsadmin -clrSpaceQuota /user/test
$ hadoop fs -count -q -h /user/test
none             inf            none             inf            2            0                  0 /user/test

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html

HADOOP: Kereberos Setup and configuration using Ambari Wizard

Install

yum install krb5-server krb5-libs krb5-auth-dialog krb5-workstation

Edit realm

# vi /etc/krb5.conf

[realms]
EXAMPLE.COM = {
kdc = kerberos.example.com
admin_server = kerberos.example.com
}

Create database

# kdb5_util create -s
Loading random data
Initializing database ‘/var/kerberos/krb5kdc/principal’ for realm ‘EXAMPLE.COM’,
master key name ‘K/M@EXAMPLE.COM’
You will be prompted for the database Master Password.
It is important that you NOT FORGET this password.
Enter KDC database master key:
Re-enter KDC database master key to verify:

Start KDC server and kdb admin

#/etc/rc.d/init.d/krb5kdc start && /etc/rc.d/init.d/kadmin start
Starting Kerberos 5 KDC:                                   [  OK  ]
Starting Kerberos 5 Admin Server:                          [  OK  ]

Create KDC Admin

$kadmin.local -q “addprinc admin/admin”
Authenticating as principal root/admin@EXAMPLE.COM with password.
WARNING: no policy specified for admin/admin@EXAMPLE.COM; defaulting to no policy
Enter password for principal “admin/admin@EXAMPLE.COM”:
Re-enter password for principal “admin/admin@EXAMPLE.COM”:
Principal “admin/admin@EXAMPLE.COM” created.

more /var/kerberos/krb5kdc/kadm5.acl
*/admin@EXAMPLE.COM    *

Now create principals and keytabs manually in KDC using kadmin:

–Ambari kerberos enable wizard provides a csv file, which will have principal names  and keytab file names along with ownership deatils.

Created Principals using that csv file:

for PRN in `cut -d “,” -f3 kerberos.csv`;do kadmin.local -q “addprinc -randkey $PRN”; done

Generate the keytab files

Generate genkeytab.txt file using below awk step from kerberos.csv file and pass it to kadmin.local

awk -F”,” ‘{print “xst -k”,$6,” “,$3}’ kerberos.csv > genkeytab.txtkadmin.local < genkeytab.txt genkeytab.txt content:
xst -k /etc/security/keytabs/spnego.service.keytab   HTTP/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/smokeuser.headless.keytab   ambari-qa@EXAMPLE.COM
xst -k /etc/security/keytabs/ams.collector.keytab   amshbase/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/ams-hbase.master.keytab   amshbase/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/ams-hbase.regionserver.keytab   amshbase/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/ams-zk.service.keytab   amszk/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/atlas.service.keytab   atlas/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/dn.service.keytab   dn/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/falcon.service.keytab   falcon/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/hbase.service.keytab   hbase/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/hbase.headless.keytab   hbase@EXAMPLE.COM
xst -k /etc/security/keytabs/hdfs.headless.keytab   hdfs@EXAMPLE.COM
xst -k /etc/security/keytabs/hive.service.keytab   hive/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/jhs.service.keytab   jhs/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/kafka.service.keytab   kafka/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/knox.service.keytab   knox/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/nimbus.service.keytab   nimbus/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/nm.service.keytab   nm/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/nn.service.keytab   nn/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/oozie.service.keytab   oozie/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/rm.service.keytab   rm/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/spark.headless.keytab   spark@EXAMPLE.COM
xst -k /etc/security/keytabs/storm.service.keytab   storm@EXAMPLE.COM
xst -k /etc/security/keytabs/yarn.service.keytab   yarn/sandbox.hortonworks.com@EXAMPLE.COM
xst -k /etc/security/keytabs/zk.service.keytab   zookeeper/sandbox.hortonworks.com@EXAMPLE.COM

Change ownership of keytab file:

chown root:hadoop   /etc/security/keytabs/spnego.service.keytab
chown ambari-qa:hadoop   /etc/security/keytabs/smokeuser.headless.keytab
chown ams:hadoop   /etc/security/keytabs/ams.collector.keytab
chown ams:hadoop   /etc/security/keytabs/ams-hbase.master.keytab
chown ams:hadoop   /etc/security/keytabs/ams-hbase.regionserver.keytab
chown ams:hadoop   /etc/security/keytabs/ams-zk.service.keytab
chown atlas:hadoop   /etc/security/keytabs/atlas.service.keytab
chown hdfs:hadoop   /etc/security/keytabs/dn.service.keytab
chown falcon:hadoop   /etc/security/keytabs/falcon.service.keytab
chown hbase:hadoop   /etc/security/keytabs/hbase.service.keytab
chown hbase:hadoop   /etc/security/keytabs/hbase.headless.keytab
chown hdfs:hadoop   /etc/security/keytabs/hdfs.headless.keytab
chown hive:hadoop   /etc/security/keytabs/hive.service.keytab
chown mapred:hadoop   /etc/security/keytabs/jhs.service.keytab
chown kafka:hadoop   /etc/security/keytabs/kafka.service.keytab
chown knox:hadoop   /etc/security/keytabs/knox.service.keytab
chown storm:hadoop   /etc/security/keytabs/nimbus.service.keytab
chown yarn:hadoop   /etc/security/keytabs/nm.service.keytab
chown hdfs:hadoop   /etc/security/keytabs/nn.service.keytab
chown oozie:hadoop   /etc/security/keytabs/oozie.service.keytab
chown yarn:hadoop   /etc/security/keytabs/rm.service.keytab
chown spark:hadoop   /etc/security/keytabs/spark.headless.keytab
chown storm:hadoop   /etc/security/keytabs/storm.service.keytab
chown yarn:hadoop   /etc/security/keytabs/yarn.service.keytab
chown zookeeper:hadoop   /etc/security/keytabs/zk.service.keytab

Changer permission:

chmod 440 /etc/security/keytabs/spnego.service.keytab
chmod 440 /etc/security/keytabs/smokeuser.headless.keytab
chmod 400 /etc/security/keytabs/ams.collector.keytab
chmod 400 /etc/security/keytabs/ams-hbase.master.keytab
chmod 400 /etc/security/keytabs/ams-hbase.regionserver.keytab
chmod 400 /etc/security/keytabs/ams-zk.service.keytab
chmod 400 /etc/security/keytabs/atlas.service.keytab
chmod 400 /etc/security/keytabs/dn.service.keytab
chmod 400 /etc/security/keytabs/falcon.service.keytab
chmod 400 /etc/security/keytabs/hbase.service.keytab
chmod 440 /etc/security/keytabs/hbase.headless.keytab
chmod 440 /etc/security/keytabs/hdfs.headless.keytab
chmod 400 /etc/security/keytabs/hive.service.keytab
chmod 400 /etc/security/keytabs/jhs.service.keytab
chmod 400 /etc/security/keytabs/kafka.service.keytab
chmod 400 /etc/security/keytabs/knox.service.keytab
chmod 400 /etc/security/keytabs/nimbus.service.keytab
chmod 400 /etc/security/keytabs/nm.service.keytab
chmod 400 /etc/security/keytabs/nn.service.keytab
chmod 400 /etc/security/keytabs/oozie.service.keytab
chmod 400 /etc/security/keytabs/rm.service.keytab
chmod 400 /etc/security/keytabs/spark.headless.keytab
chmod 400 /etc/security/keytabs/storm.service.keytab
chmod 400 /etc/security/keytabs/yarn.service.keytab
chmod 400 /etc/security/keytabs/zk.service.keytab

Continue Kerberos Enable Wizard in AMBARI.

I noticed it performing below checks.

Performing kinit using ambari-qa@EXAMPLE.COM
2015-10-11 05:39:12,400 - Execute['/usr/bin/kinit -c /var/lib/ambari-agent/data/tmp/kerberos_service_check_cc_1f6a760d597577f9618bf539df44a098 -kt /etc/security/keytabs/smokeuser.headless.keytab ambari-qa@EXAMPLE.COM'] {}

After restart hdfs is working fine (but given some issue with initially, which listed below)

$hdfs dfsadmin -fs hdfs://sandbox.hortonworks.com:8020 -safemode get
Safe mode is OFF

Kerberos Troubleshooting checks:

ERRORS:

5/10/11 06:46:52 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
15/10/11 06:47:00 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

15/10/11 06:47:09 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

Verify Keytab for hdfs service

$/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs

(Returns no error means, working good)

Check Ticket expiry

$kinit -R
kinit: Ticket expired while renewing credentials

klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hdfs@EXAMPLE.COM

Valid starting     Expires            Service principal
10/11/15 06:50:49  10/12/15 06:50:49  krbtgt/EXAMPLE.COM@EXAMPLE.COM
renew until 10/11/15 06:50:49

klist -e -k -t /etc/security/keytabs/hdfs.headless.keytab
Keytab name: FILE:/etc/security/keytabs/hdfs.headless.keytab
KVNO Timestamp         Principal
—- —————– ——————————————————–
3 10/11/15 05:26:37 hdfs@EXAMPLE.COM (aes256-cts-hmac-sha1-96)
3 10/11/15 05:26:37 hdfs@EXAMPLE.COM (aes128-cts-hmac-sha1-96)
3 10/11/15 05:26:37 hdfs@EXAMPLE.COM (des3-cbc-sha1)
3 10/11/15 05:26:37 hdfs@EXAMPLE.COM (arcfour-hmac)
3 10/11/15 05:26:37 hdfs@EXAMPLE.COM (des-hmac-sha1)
3 10/11/15 05:26:37 hdfs@EXAMPLE.COM (des-cbc-md5)

Kerberos Configuration on HDFS:

$egrep -A1 “security.authentication|security.authorization” /etc/hadoop/conf/core-site.xml
<name>hadoop.security.authentication</name>
<value>kerberos</value>

<name>hadoop.security.authorization</name>
<value>true</value>

$egrep -A1 “kerberos.principal” /etc/hadoop/conf/hdfs-site.xml
<name>dfs.datanode.kerberos.principal</name>
<value>dn/_HOST@EXAMPLE.COM</value>

<name>dfs.namenode.kerberos.principal</name>
<value>nn/_HOST@EXAMPLE.COM</value>

<name>dfs.secondary.namenode.kerberos.principal</name>
<value>nn/_HOST@EXAMPLE.COM</value>

<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST@EXAMPLE.COM</value>

Reference:

http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.0.0/Ambari_Doc_Suite/ADS_v200.html#ref-de2249ba-7be6-4286-ae72-848b9d327e15

http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.3.1/bk_installing_manually_book/content/rpm-chap14.html

http://hortonworks.com/community/forums/topic/kerberos-security-in-hdp-gss-initiate-failed-for-the-hdfs-user/

Hadoop: Question and Answers for Hadoop/hdfs/mapreduce/pig

From online training firm Edureka Question and Answers

Map Reduce: http://www.edureka.co/blog/hadoop-interview-questions-mapreduce/

Hadoop Cluster: http://www.edureka.co/blog/hadoop-interview-questions-hadoop-cluster/

HDFS: http://www.edureka.co/blog/hadoop-interview-questions-hdfs-2/

PIG: http://www.edureka.co/blog/hadoop-interview-questions-pig/