hive locks

hive locks

How Table Locking Works in Hive

Exclusive locks are not acquired when using dynamic partitions



beeline commands for Hive

How to use beeline commands to access hive database and tables ?

beeline commands

To connect hive server2 on hive server:

beeline -u jdbc:hive2://localhost:10000

To run a query from shell prompt:

beeline -u jdbc:hive2://localhost:10000 -e “show databases;”

Run silent mode to suppress messages and just get query output:

beeline -u jdbc:hive2://localhost:10000 –silent  -e “show databases;”

Change output format from table to csv:

beeline -u jdbc:hive2://localhost:10000 –silent –outputformat=csv2 -e “show databases;”

Turn off the header too:

beeline -u jdbc:hive2://localhost:10000 –silent –outputformat=csv2 –showheader=false -e “show databases;”

More to come keep looking this space … 🙂

Reference Outputs:

[cloudera@quickstart Downloads]$ beeline -u jdbc:hive2://localhost:10000 -e “show databases;” –silent

scan complete in 7ms

Connecting to jdbc:hive2://localhost:10000

Connected to: Apache Hive (version 1.1.0-cdh5.13.0)

Driver: Hive JDBC (version 1.1.0-cdh5.13.0)


INFO  : Compiling command(queryId=hive_20190601201515_a226e5a1-40d4-408e-b591-9d89877f25cc): show databases

INFO  : Semantic Analysis Completed

INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)

INFO  : Completed compiling command(queryId=hive_20190601201515_a226e5a1-40d4-408e-b591-9d89877f25cc); Time taken: 0.184 seconds

INFO  : Concurrency mode is disabled, not creating a lock manager

INFO  : Executing command(queryId=hive_20190601201515_a226e5a1-40d4-408e-b591-9d89877f25cc): show databases

INFO  : Starting task [Stage-0:DDL] in serial mode

INFO  : Completed executing command(queryId=hive_20190601201515_a226e5a1-40d4-408e-b591-9d89877f25cc); Time taken: 0.084 seconds



| database_name  |


| default        |


1 row selected (0.851 seconds)

Beeline version 1.1.0-cdh5.13.0 by Apache Hive

Closing: 0: jdbc:hive2://localhost:10000

$ beeline -u jdbc:hive2://localhost:10000 –silent -e  “show databases;”


| database_name  |


| default        |


[cloudera@quickstart Downloads]$ beeline -u jdbc:hive2://localhost:10000 –silent –-outputformat=csv2 -e “show databases;”



[cloudera@quickstart Downloads]$beeline -u jdbc:hive2://localhost:10000 –silent –outputformat=csv2 –showheader=false -e “show databases;”




hdfs: distcp with to cloud storage

Using DistCp with Amazon S3

S3 credentials can be provided in a configuration file (for example, core-site.xml):


hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://MyNameservice-id/user/hdfs/mydata s3a://myBucket/mydata_backup


Using DistCp with Microsoft Azure (WASB)

Configure connectivity to Azure by setting the following property in core-site.xml.

hadoop distcp wasb://<sample_container>@<sample_account> hdfs://hdfs_destination_path

hbase performance tuning


atime ctime and mtime

atime: Access time
ctime: Change time (All changes including file permissions)
mtime: modified time (File content changes only)


1. Create the empty file: $touch testfile
2. List it’s 3 times: all 3 times are same.
$ stat –format=’AT:%x MT:%y CT:%z’ testfile
AT :2018-01-18 16:49:41.888538164 +0000
MT:2018-01-18 16:49:41.888538164 +0000
CT :2018-01-18 16:49:41.888538164 +0000

3.touch file again $ touch testfile – it updated all 3 times
$ stat –format=’AT:%x MT:%y CT:%z’ testfile
AT :2018-01-18 16:51:17.911062055 +0000
MT:2018-01-18 16:51:17.911062055 +0000
CT:2018-01-18 16:51:17.911062055 +0000

4. Update file content
echo “sample” > testfile
$stat –format=’AT:%x MT:%y CT:%z’ testfile
AT :2018-01-18 16:51:39.003957564 +0000 –> not updated
MT:2018-01-18 16:52:27.125719302 +0000 –> updated
CT :2018-01-18 16:52:27.125719302 +0000 –> updated

5. Change permissions (inode update, file content same)
$chmod u+x newfile
$ stat –format=’AT:%x MT:%y CT:%z’ testfile
AT:2018-01-19 00:26:49.948206613 +0000
MT:2018-01-19 00:26:49.948206613 +0000
CT:2018-01-19 00:28:03.607859122 +0000 –> updated

Fun with “hdfs dfs -stat”

hdfs dfs -stat “File %n, is a %F,own by %u and group %g,which has block size %o, with replication %r and modified on %y” /tmp/testfile

File testfile, is a regular file,own by raju  and group admin,which has block size 134217728, with replication 3 and modified on 2018-01-04 21:36:14

Here is what -help on stat says:

hdfs dfs -help stat
-stat [format] <path> … :
Print statistics about the file/directory at <path>
in the specified format. Format accepts filesize in
blocks (%b), type (%F), group name of owner (%g),
name (%n), block size (%o), replication (%r), user name
of owner (%u), modification date (%y, %Y).
%y shows UTC date as “yyyy-MM-dd HH:mm:ss” and
%Y shows milliseconds since January 1, 1970 UTC.
If the format is not specified, %y is used by default.