Why do we need to decommission a Resource node ?
- For maintenance of Resource node host like patching/hardware replacement etc.
- To discard Resource node server when it complete it’s life cycle and categorized as DNP (do not permit).
- Upgrade of hardware from lower configuration to higher configuration servers
How to decommission a data node ?
- On the NameNode host machine, edit the
<HADOOP_CONF_DIR>/dfs.exclude
file and add the list of DataNodes hostnames (separated by a newline character).where
<HADOOP_CONF_DIR>
is the directory for storing the Hadoop configuration files. For example,/etc/hadoop/conf
. - Update the NameNode with the new set of excluded DataNodes. On the NameNode host machine, execute the following command:
su <HDFS_USER> hdfs dfsadmin -refreshNodes
where
<HDFS_USER>
is the user owning the HDFS services. For example,hdfs
. - Open the NameNode web UI (
http://<NameNode_FQDN>:50070
) and navigate to the DataNodes page. Check to see whether the state has changed to Decommission In Progress for the DataNodes being decommissioned. - When all the DataNodes report their state as Decommissioned (on the DataNodes page, or on the Decommissioned Nodes page at
http://<NameNode_FQDN>:8088/cluster/ nodes/decommissioned
), all of the blocks have been replicated. You can then shut down the decommissioned nodes. - If your cluster utilizes a
dfs.include
file, remove the decommissioned nodes from the<HADOOP_CONF_DIR>/dfs.include
file on the NameNode host machine, then execute the following command:su <HDFS_USER> hdfs dfsadmin -refreshNodes
Note If no dfs.include
file is specified, all DataNodes are considered to be included in the cluster (unless excluded in thedfs.exclude
file). Thedfs.hosts
anddfs.hosts.exclude
properties inhdfs-site.xml
are used to specify thedfs.include
anddfs.exclude
files.
How to decommision a nodemanager ?
- yarn-site.xml file will have 2 configurations
yarn.resourcemanager.nodes.include-path
andyarn.resourcemanager.nodes.exclude-path
- Add the decommission node name in <HADOOP_CONF_DIR>/yarn.exclude. If more than one node is there to decommission, then list them separated by newline. An example for <HADOOP_CONF_DIR> is /etc/hadoop/conf
- If you have <HADOOP_CONF_DIR>/yarn.include file using for yarn resource manager in hadoop, then make sure the decommission nodes are removed from this list.
- su <YARN USER> (example: su – yarn); yarn rmadmin -refresh nodes
- Note: user can very for your environment, use the right user.
- Reference: