Hadoop: hdfs block size, replication

Block Size Parameters:

 hdfs dfs -D dfs.blocksize=1024 -put lfile /user/test
put: Specified block size is less than configured minimum value (dfs.namenode.fs-limits.min-block-size): 1024 < 1048576
$hdfs dfs -D dfs.blocksize=1048576 -put lfile /user/test

dfs.namenode.fs-limits.min-block-size 1048576 Minimum block size in bytes, enforced by the Namenode at create time. This prevents the accidental creation of files with tiny block sizes (and thus many blocks), which can degrade performance.
dfs.namenode.fs-limits.max-blocks-per-file 1048576 Maximum number of blocks per file, enforced by the Namenode on write. This prevents the creation of extremely large files which can degrade performance.
dfs.blocksize 134217728 The default block size for new files, in bytes. You can use the following suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in bytes (such as 134217728 for 128 MB).
dfs.client.block.write.retries 3

Replication Parameters:

dfs.replication 3 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
dfs.replication.max 512 Maximal block replication.
dfs.namenode.replication.min 1 Minimal block replication.

Author: rajukv

Hadoop(BigData) Architect and Hadoop Security Architect can design and build hadoop system to meet various data science projects.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s