hbase performance tuning

https://community.hortonworks.com/articles/184892/tuning-hbase-for-optimized-performance-part-1.html

https://community.hortonworks.com/articles/184957/tuning-hbase-for-optimized-performance-part-2.html

https://community.hortonworks.com/articles/185080/tuning-hbase-for-optimized-performance-part-3.html

https://community.hortonworks.com/articles/185082/tuning-hbase-for-optimized-performance-part-4.html

 

Advertisements

Fix Under-replicated blocks in HDFS manually

https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html

Short Description:

Quick instruction to fix under-replicated Blocks in HDFS manually

Article

To Fix under-replicated blocks in HDFS, below is quick instruction to use:

####Fix under-replicated blocks###

  1. su <$hdfs_user>
  2. bash4.1$ hdfs fsck / | grep ‘Under replicated’ | awk F‘:’ ‘{print $1}’ >> /tmp/under_replicated_files
  3. bash4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo “Fixing $hdfsfile :” ; hadoop fs setrep 3 $hdfsfile; done

atime ctime and mtime

atime: Access time
ctime: Change time (All changes including file permissions)
mtime: modified time (File content changes only)

Demonstration:

1. Create the empty file: $touch testfile
2. List it’s 3 times: all 3 times are same.
$ stat –format=’AT:%x MT:%y CT:%z’ testfile
AT :2018-01-18 16:49:41.888538164 +0000
MT:2018-01-18 16:49:41.888538164 +0000
CT :2018-01-18 16:49:41.888538164 +0000

3.touch file again $ touch testfile – it updated all 3 times
$ stat –format=’AT:%x MT:%y CT:%z’ testfile
AT :2018-01-18 16:51:17.911062055 +0000
MT:2018-01-18 16:51:17.911062055 +0000
CT:2018-01-18 16:51:17.911062055 +0000

4. Update file content
echo “sample” > testfile
$stat –format=’AT:%x MT:%y CT:%z’ testfile
AT :2018-01-18 16:51:39.003957564 +0000 –> not updated
MT:2018-01-18 16:52:27.125719302 +0000 –> updated
CT :2018-01-18 16:52:27.125719302 +0000 –> updated

5. Change permissions (inode update, file content same)
$chmod u+x newfile
$ stat –format=’AT:%x MT:%y CT:%z’ testfile
AT:2018-01-19 00:26:49.948206613 +0000
MT:2018-01-19 00:26:49.948206613 +0000
CT:2018-01-19 00:28:03.607859122 +0000 –> updated

what-are-containers-and-why-do-you-need-them

what-are-containers-and-why-do-you-need-them

Nicely articulated notes on containers. Highlights from this article are:

What are containers and why do you need them?

Containers are a solution to the problem of how to get software to run reliably when moved from one computing environment to another. This could be from a developer’s laptop to a test environment, from a staging environment into production, and perhaps from a physical machine in a data center to a virtual machine in a private or public cloud.

 

How do containers solve this problem?

Put simply, a container consists of an entire runtime environment: an application, plus all its dependencies, libraries and other binaries, and configuration files needed to run it, bundled into one package. By containerizing the application platform and its dependencies, differences in OS distributions and underlying infrastructure are abstracted away.

 

Is there a standard container format?

initiative called the Open Container Project was announced, and later renamed as the Open Container Initiative (OCI). Run under the auspices of the Linux Foundation, the purpose of the OCI is to develop industry standards for a container format and container runtime software for all platforms. The starting point of the OCP standards was Docker technology, and Docker donated about 5 percent of its codebase to the project to get it off the ground.

The project’s sponsors include AWS, Google, IBM, HP, Microsoft, VMware, Red Hat, Oracle, Twitter, and HP as well as Docker and CoreOS

Are there any free open source container management systems?

Yes. Probably the best known and most widely used free and open source container management systems is Kubernetes, which is a software project that originated at Google. Kubernetes provides mechanisms for deploying, maintaining and scaling containerized applications

commercial container management solutions

docker enterprise-edition

CoreOS’s Tectonic

Red Hat’s Open Shift Container Platform

Rancher Labs’ Rancher

Which Linux distributions are suitable for use as a container host?

  • Container Linux (formerly CoreOS Linux) — one of the first lightweight container operating systems built for containers
  • RancherOS — a simplified Linux distribution built from containers, specifically for running containers.
  • Photon OS — a minimal Linux container host, optimized to run on VMware platforms.
  • Project Atomic Host — Red Hat’s lightweight container OS has versions that are based on CentOS and Fedora, and there is also a downstream enterprise version in Red Hat Enterprise Linux.
  • Ubuntu Core — the smallest Ubuntu version, Ubuntu Core is designed as a host operating system for IoT devices and large-scale cloud container deployment

 

What if you are a Windows shop?

That’s because in 2016 Microsoft introduced the ability to run Windows containers in Windows Server 2016 and Windows 10. These are Docker containers designed for Windows, and they can be managed from any Docker client or from Microsoft’s PowerShell.

(Microsoft also introduced Hyper-V containers, which are Windows containers running in a Hyper-V virtual machine for added isolation.)

Windows containers can be deployed on a standard install of Windows Server 2016, the streamlined Server Core install, or the Nano Server install option which is specifically designed for running applications inside containers or virtual machines.

They ended with nice conclusion:

Both VM and Containers have their benefits, but what’s important is that rather than replacing virtual machines, it can often be useful to be able to use containers within a virtualized infrastructure.

 

Fun with “hdfs dfs -stat”

hdfs dfs -stat “File %n, is a %F,own by %u and group %g,which has block size %o, with replication %r and modified on %y” /tmp/testfile

File testfile, is a regular file,own by raju  and group admin,which has block size 134217728, with replication 3 and modified on 2018-01-04 21:36:14

Here is what -help on stat says:

hdfs dfs -help stat
-stat [format] <path> … :
Print statistics about the file/directory at <path>
in the specified format. Format accepts filesize in
blocks (%b), type (%F), group name of owner (%g),
name (%n), block size (%o), replication (%r), user name
of owner (%u), modification date (%y, %Y).
%y shows UTC date as “yyyy-MM-dd HH:mm:ss” and
%Y shows milliseconds since January 1, 1970 UTC.
If the format is not specified, %y is used by default.

2016 Big Data Maturity Survey

2016 Big Data Maturity Survey PDF

Note: This survey was done by AtScale ( Analytic platform for the google cloud)

 

How they made survey: 2.550 responded survey. AtScale has partnered with Cloudera,
Hortonworks, MapR, Tableau, Trifacta and Cognizant to identify companies
that are working with Big Data or about to. We asked them how they got
value from it, what tools they are using and the tactics they used to succeed

Summary:

Big Data is growing fast: 97% will do as much or more with Big Data over the next 3 months.

Big Data Cloud is King: 72% of respondents plan on doing Big Data in the Cloud.

Governance is a growing concern: Governance is the fastest growing area of concern  year-over-year (21% YOY).

Business Intelligence is #1: 75% of respondents say they planning on using BI on Big Data.