Hadoop: FAQ on hdfs snapshot

$ hdfs dfs -mkdir /user/test
$ hdfs dfs -put output23.txt /user/test
$ hdfs dfs -du /user/test
67335  /user/test/output23.txt
$ hdfs dfsadmin -allowSnapshot /user/test
Allowing snaphot on /user/test succeeded
$hdfs dfs -createSnapshot /user/test
Created snapshot /user/test/.snapshot/s20150924-050641.662

How to recover the file from .snapshot directory ?

$hdfs dfs -rm /user/test/output23.txt
15/09/24 05:08:31 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes.
Moved: ‘hdfs://sandbox.hortonworks.com:8020/user/test/output23.txt’ to trash at: hdfs://sandbox.hortonworks.com:8020/user/hdfs/.Trash/Current

Confirm File is deleted:
 hdfs dfs -ls /user/test/output23.txt
ls: `/user/test/output23.txt’: No such file or directory

hdfs dfs -cp /user/test/.snapshot/s20150924-050641.662/output23.txt /user/test

15/09/24 05:12:02 WARN hdfs.DFSClient: DFSInputStream has been closed already
$ hdfs dfs -ls /user/test/output23.txt
-rw-r–r–   1 hdfs hdfs      67335 2015-09-24 05:12 /user/test/output23.txt

Can we delete files from .snapshot copy ?

Try to delete file from .snapshot
As .snapshot is a readonly copy it will not allow to delete the files.

hdfs dfs -rm /user/test/.snapshot/s20150924-050641.662/output23.txt
rm: Failed to move to trash: hdfs://sandbox.hortonworks.com:8020/user/test/.snapshot/s20150924-050641.662/output23.txt: “.snapshot” is a reserved name.

How to list snapshottable directories ?

hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs hdfs 0 2015-09-24 05:12 1 65536 /user/test

How to rename a snapshot ?

hdfs dfs -renameSnapshot /user/test s20150924-050641.662 b4patch.24sep15

$ hdfs dfs -ls /user/test/.snapshot
drwxr-xr-x   – hdfs hdfs          0 2015-09-24 05:06 /user/test/.snapshot/b4patch.24sep15

How to show difference between two snapshots ?

$ touch testfile
$ hdfs dfs -put testfile /user/test
$ hdfs dfs -ls /user/test
Found 2 items
-rw-r–r–   1 hdfs hdfs      67335 2015-09-24 05:12 /user/test/output23.txt
-rw-r–r–   1 hdfs hdfs          0 2015-09-24 05:32 /user/test/testfile

$ hdfs dfs -createSnapshot /user/test
Created snapshot /user/test/.snapshot/s20150924-053313.181
$ hdfs dfs -renameSnapshot /user/test s20150924-053313.181 afterPatch.24sep15

$ hdfs snapshotDiff /user/test b4patch.24sep15 afterPatch.24sep15
Difference between snapshot b4patch.24sep15 and snapshot afterPatch.24sep15 under directory /user/test:
M    .
+    ./output23.txt
+    ./testfile
–    ./output23.txt

How to delete the snapshot ?

hdfs dfs -deleteSnapshot /user/test afterPatch.24sep15
hdfs dfs -deleteSnapshot /user/test b4patch.24sep15

Reference:

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html#Delete_Snapshots

Advertisements

AIX HACMP commands to manage the cluster

HACMP binaries (commands) are located under “/usr/es/sbin/cluster/sbin and utilties”. So set the path if it’s not to avoid lengthy path

export PATH=$PATH:/usr/es/sbin/cluster/sbin;/usr/es/sbin/cluster/utilities

  1. “clRGinfo” – Cluster information
  2. “cllsres” – Resource information
  3. “cllsserv” — list startup/shutdown scripts used in cluster resource
  4. “cllsfs” –> to list file systems used in cluster
  5.  “cltopinfo” –> cluster topology information
  6. “clRginfo -m” –> to verity cluster monitoring status for Service group
  7. “clshowres” –> resource information
  8. “smitty cl_admin” –> to manager cluster service groups/resources
  9. “smitty hacmp” –> to manager cluster  like failovers

AIX Storage Device Managerment

How to find wwn of HBA cards in AIX OS ?

  1. Check available fc cards using ‘lsdev -Cc adapter|grep fcs”
  2. lscfg -vl fcsX (Where X is number)
  3. Also you can find using ‘fcstat fcsX’

How to scan new LUN disks assigned by storage team and identify them ?

  1. lsdev -Cc disk (pre scan output)
  2. cfgmgr (this will scan all the hardware), specific FC hardware then ‘cfgmgr -l fcsX’
  3. lsdev -Cc disk (post scan output), you will find the new LUN disks in the list.
  4. lspv (will give new storage lun disk without pvid)

How to verify Lun device id to confirm  before adding to vg ?

  1. lscfg -vpl <disk name>|grep VPD e.x. lscfg -vpl hdiskpower0|grep VPD
  2. if you are using powerpath as multipath software. then powermt display dev=all|grep “Logical\ device\ ID”

How to verify storage disk size allocated to your aix box

  1. bootinfo -s hdiskpower3
  2. output: <size in mb will be displayed here>

How to extend non cluster file system using newly allocated storage disk ?

  1. Check newly al
  2. Extend the vg adding the disk  “extendvg  myvg01 hdiskpower3”
  3. Verify added free space “lsvg myvg01|grep “FREE PPs”
  4. chfs -a size=+10G /app
  5. df -gt /app (to confirm new size of the file system)

How to extend cluster file system ?

  1. To  extend file system which is in cluster, need to follow different procedure
  2. Verify disk attribute for reserver policy and chage it to no reserver, so that OS will not reserve and allows to failvoer to another node.
  3. lspv (verify if new disk got any pvid or not, ideally it should be none as i’t s anew LUN”
  4. Verify reserver policy
  5. lsattr -El hdiskpower3|grep reserve_policy. If it shows “no_rserve” it’s good. If not then change the plicy
  6. chdev -l hdiskpower3 -a reserve_policy=no_reserve
  7. make it persistent with -P (upper case)
  8. chdev -l hdiskpower3 -a reserve_policy=no_reserve -P
  9. Verify
  10. lsattr -El hdiskpower3|grep reserve_policy.  It should show “no_rserve”
  11. Assign a pv id
  12. chdev -l hdiskpower3 -a pv=yes
  13. verify with lspv
  14. As it’s shared disk between cluster nodes. Other node when you run this command it should assign same pvid.
  15. Add disk to cluster vg using smitty
  16. smitty cl_admin ==> Storage ==> Volume Groups ==> Set Characterisitics of a Volume Group ==> Add a Volume to a Volume Group ==> select the volume group to which you need to add free disk ==> select the free disk by choosing free pvid ==> [ENTER} to execute and “Escape 0” to exit smitty
  17. lsvg <vgname> will show free PP’s and it’s size
  18. /usr/es/sbin/cluster/sbin/cl_chfs -a size=+5G /app (+5G indicates we are extending file system with additional 5Gb space
  19. df -gt /app

Some disk query commands in AIX

This query of disk will give it’s vg information

#lqueryvg -Atp hdiskpower3

To see disk path

#lspath

To see list of disks in a system

#lsdev -Cc disk  (you can replace disk with tape/adapter to see those devices)

To see vg information

#lsvg <vgname)

To see attributes of a disk

#lsattr -El hdiskpower3

Decommissioning a Resource Manager in Hadoop cluster

Why do we need to decommission a Resource node ?

  1. For maintenance of Resource node host like patching/hardware replacement etc.
  2. To discard Resource node server when it complete it’s life cycle and categorized as DNP (do not permit).
  3. Upgrade of hardware from lower configuration to higher configuration servers

How to decommission a data node ?

  1. yarn-site.xml  file will have 2 configurations yarn.resourcemanager.nodes.include-path and yarn.resourcemanager.nodes.exclude-path
  2. Add the decommission node name in <HADOOP_CONF_DIR>/yarn.exclude. If more than one node is there to decommission, then list them separated by newline. An example for <HADOOP_CONF_DIR> is /etc/hadoop/conf
  3. If you have <HADOOP_CONF_DIR>/yarn.include file using for yarn resource manager in hadoop, then make sure the decommission nodes are removed from this list.
  4. su <YARN USER> (example: su – yarn); yarn rmadmin -refresh nodes
    1.  Note: user can very for your environment, use the right user.
  1. Reference:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Sys_Admin_Guides/content/ref-5981b2ae-bdc1-4eeb-8d01-fa2c088edf83.1.html

Decommissioning a Datanode in Hadoop cluster

Why do we need to decommission a data node ?

  1. For maintenance of datanode host like patching/hardware replacement etc.
  2. To discard datanode server when it complete it’s life cycle and categorized as DNP (do not permit).

How to decommission a data node ?

  1. add the decommission node name in <HADOOP_CONF_DIR>/dfs.exclude. If more than one node is there tdecommission, then list them separated by newline. An example for <HADOOP_CONF_DIR> is /etc/hadoop/conf
  2. If you have <HADOOP_CONF_DIR>/dfs.include file using for dfs in hadoop, then make sure the decommission nodes are removed from this list.
  3. su <HDFS USER> (example: su – hdfs); hdfs dfadmin -refresh nodes
    1.  Note: user can very for your environment, use the right user.
  4. Monitor decommission in progress until it turn it changes its status to “Decommissioned”. This state can be monitored from (http://NameNode_FQDN:50070 ) data node page.

Reference:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Sys_Admin_Guides/content/ref-a179736c-eb7c-4dda-b3b4-6f3a778bd8c8.1.html