hive locks

hive locks

https://cwiki.apache.org/confluence/display/Hive/Locking

How Table Locking Works in Hive

https://www.ericlin.me/2015/05/how-table-locking-works-in-hive/

Exclusive locks are not acquired when using dynamic partitions

https://issues.apache.org/jira/browse/HIVE-3509

 

 

Advertisements

beeline commands for Hive

How to use beeline commands to access hive database and tables ?

beeline commands

To connect hive server2 on hive server:

beeline -u jdbc:hive2://localhost:10000

To run a query from shell prompt:

beeline -u jdbc:hive2://localhost:10000 -e “show databases;”

Run silent mode to suppress messages and just get query output:

beeline -u jdbc:hive2://localhost:10000 –silent  -e “show databases;”

Change output format from table to csv:

beeline -u jdbc:hive2://localhost:10000 –silent –outputformat=csv2 -e “show databases;”

Turn off the header too:

beeline -u jdbc:hive2://localhost:10000 –silent –outputformat=csv2 –showheader=false -e “show databases;”

More to come keep looking this space … 🙂

Reference Outputs:

[cloudera@quickstart Downloads]$ beeline -u jdbc:hive2://localhost:10000 -e “show databases;” –silent

scan complete in 7ms

Connecting to jdbc:hive2://localhost:10000

Connected to: Apache Hive (version 1.1.0-cdh5.13.0)

Driver: Hive JDBC (version 1.1.0-cdh5.13.0)

Transaction isolation: TRANSACTION_REPEATABLE_READ

INFO  : Compiling command(queryId=hive_20190601201515_a226e5a1-40d4-408e-b591-9d89877f25cc): show databases

INFO  : Semantic Analysis Completed

INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)

INFO  : Completed compiling command(queryId=hive_20190601201515_a226e5a1-40d4-408e-b591-9d89877f25cc); Time taken: 0.184 seconds

INFO  : Concurrency mode is disabled, not creating a lock manager

INFO  : Executing command(queryId=hive_20190601201515_a226e5a1-40d4-408e-b591-9d89877f25cc): show databases

INFO  : Starting task [Stage-0:DDL] in serial mode

INFO  : Completed executing command(queryId=hive_20190601201515_a226e5a1-40d4-408e-b591-9d89877f25cc); Time taken: 0.084 seconds

INFO  : OK

+—————-+–+

| database_name  |

+—————-+–+

| default        |

+—————-+–+

1 row selected (0.851 seconds)

Beeline version 1.1.0-cdh5.13.0 by Apache Hive

Closing: 0: jdbc:hive2://localhost:10000

$ beeline -u jdbc:hive2://localhost:10000 –silent -e  “show databases;”

+—————-+–+

| database_name  |

+—————-+–+

| default        |

+—————-+–+

[cloudera@quickstart Downloads]$ beeline -u jdbc:hive2://localhost:10000 –silent –-outputformat=csv2 -e “show databases;”

database_name

default

[cloudera@quickstart Downloads]$beeline -u jdbc:hive2://localhost:10000 –silent –outputformat=csv2 –showheader=false -e “show databases;”

default

 

 

hdfs: distcp with to cloud storage

Using DistCp with Amazon S3

S3 credentials can be provided in a configuration file (for example, core-site.xml):

<property>
    <name>fs.s3a.access.key</name>
    <value>...</value>
</property>
<property>
    <name>fs.s3a.secret.key</name>
    <value>...</value>
</property>

hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://MyNameservice-id/user/hdfs/mydata s3a://myBucket/mydata_backup

 

Using DistCp with Microsoft Azure (WASB)

Configure connectivity to Azure by setting the following property in core-site.xml.

<property>
  <name>fs.azure.account.key.youraccount.blob.core.windows.net</name>
  <value>your_access_key</value>
</property>
hadoop distcp wasb://<sample_container>@<sample_account>.blob.core.windows.net/ hdfs://hdfs_destination_path

hbase performance tuning

https://community.hortonworks.com/articles/184892/tuning-hbase-for-optimized-performance-part-1.html

https://community.hortonworks.com/articles/184957/tuning-hbase-for-optimized-performance-part-2.html

https://community.hortonworks.com/articles/185080/tuning-hbase-for-optimized-performance-part-3.html

https://community.hortonworks.com/articles/185082/tuning-hbase-for-optimized-performance-part-4.html

 

atime ctime and mtime

atime: Access time
ctime: Change time (All changes including file permissions)
mtime: modified time (File content changes only)

Demonstration:

1. Create the empty file: $touch testfile
2. List it’s 3 times: all 3 times are same.
$ stat –format=’AT:%x MT:%y CT:%z’ testfile
AT :2018-01-18 16:49:41.888538164 +0000
MT:2018-01-18 16:49:41.888538164 +0000
CT :2018-01-18 16:49:41.888538164 +0000

3.touch file again $ touch testfile – it updated all 3 times
$ stat –format=’AT:%x MT:%y CT:%z’ testfile
AT :2018-01-18 16:51:17.911062055 +0000
MT:2018-01-18 16:51:17.911062055 +0000
CT:2018-01-18 16:51:17.911062055 +0000

4. Update file content
echo “sample” > testfile
$stat –format=’AT:%x MT:%y CT:%z’ testfile
AT :2018-01-18 16:51:39.003957564 +0000 –> not updated
MT:2018-01-18 16:52:27.125719302 +0000 –> updated
CT :2018-01-18 16:52:27.125719302 +0000 –> updated

5. Change permissions (inode update, file content same)
$chmod u+x newfile
$ stat –format=’AT:%x MT:%y CT:%z’ testfile
AT:2018-01-19 00:26:49.948206613 +0000
MT:2018-01-19 00:26:49.948206613 +0000
CT:2018-01-19 00:28:03.607859122 +0000 –> updated

Fun with “hdfs dfs -stat”

hdfs dfs -stat “File %n, is a %F,own by %u and group %g,which has block size %o, with replication %r and modified on %y” /tmp/testfile

File testfile, is a regular file,own by raju  and group admin,which has block size 134217728, with replication 3 and modified on 2018-01-04 21:36:14

Here is what -help on stat says:

hdfs dfs -help stat
-stat [format] <path> … :
Print statistics about the file/directory at <path>
in the specified format. Format accepts filesize in
blocks (%b), type (%F), group name of owner (%g),
name (%n), block size (%o), replication (%r), user name
of owner (%u), modification date (%y, %Y).
%y shows UTC date as “yyyy-MM-dd HH:mm:ss” and
%Y shows milliseconds since January 1, 1970 UTC.
If the format is not specified, %y is used by default.