hdfs: distcp with to cloud storage

Using DistCp with Amazon S3

S3 credentials can be provided in a configuration file (for example, core-site.xml):


hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://MyNameservice-id/user/hdfs/mydata s3a://myBucket/mydata_backup


Using DistCp with Microsoft Azure (WASB)

Configure connectivity to Azure by setting the following property in core-site.xml.

hadoop distcp wasb://<sample_container>@<sample_account>.blob.core.windows.net/ hdfs://hdfs_destination_path

Fix Under-replicated blocks in HDFS manually


Short Description:

Quick instruction to fix under-replicated Blocks in HDFS manually


To Fix under-replicated blocks in HDFS, below is quick instruction to use:

####Fix under-replicated blocks###

  1. su <$hdfs_user>
  2. bash4.1$ hdfs fsck / | grep ‘Under replicated’ | awk F‘:’ ‘{print $1}’ >> /tmp/under_replicated_files
  3. bash4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo “Fixing $hdfsfile :” ; hadoop fs setrep 3 $hdfsfile; done