hadoop:Hive: HiveQL DDL

Create/Describe/ALTER/Drop database

  1. Create Database ‘test’ with db properties and location

hive> create database IF NOT EXISTS test
    > COMMENT “this is test database”
    > LOCATION ‘/tmp’
    > WITH DBPROPERTIES ( ‘createdBy’ = ‘RajuKV’, ‘createdOn’ = ‘2016-02-08’);
OK
Time taken: 2.277 seconds

2. To see databases

hive> show databases;
OK
default
test
twitterdb
xademo
Time taken: 5.082 seconds, Fetched: 4 row(s)

3. How to use a database ?

hive> use test;
OK
Time taken: 0.376 seconds

Note: There is no way to find out existing database, hence better to use this in every script to ensure you are using right database.

4. To see database properties like comment, location and privileges;

hive> describe database test;
OK
test    this is test database    hdfs://sandbox.hortonworks.com:8020/tmp    hive    USER  

Note: This one did not display DBProperties set while creating test database.

5. To get Database DB Properties use extended functionality in describe.

hive>describe database extended test;
OK
test    this is test database    hdfs://sandbox.hortonworks.com:8020/tmp    hive    USER    {createdOn=2016-02-08, createdBy=RajuKV}

6. Time to clean up, how to delete a database or how to drop a database ? (Do not test on real environments)

hive> drop database if exists test;
OK
Time taken: 0.737 seconds

7. How to alter/modify database properties ?

hive> ALTER DATABASE testdb
    > SET DBPROPERTIES ( ‘created’ = ‘RAJUKV’, ‘date’ = ‘2016-02-08’);
OK
Time taken: 0.564 seconds

Create/Describe/ALTER/Drop TABLES

  1. How to create Hive Manage table with CSV store format ?

hive> create table IF NOT EXISTS testdb.mytable ( Name string, ID int )
    > COMMENT ‘this is test table in testdb’
    > ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY ‘,’
    > STORED AS textfile;
OK
Time taken: 0.702 seconds

hive> use testdb;
OK
Time taken: 0.316 seconds
hive> show tables;
OK
mytable
Time taken: 0.188 seconds, Fetched: 1 row(s)

See table extended porperties (notice many default properties not specified while creating table)

hive> describe extended mytable;
OK
name                    string
id                      int

Detailed Table Information    Table(tableName:mytable, dbName:testdb, owner:hive, createTime:1454921870, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:id, type:int, comment:null)], location:hdfs://sandbox.hortonworks.com:8020/tmp/mytable, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1454921870, comment=this is test table in testdb}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.603 seconds, Fetched: 4 row(s)

As it’s empty table, folder created with table name under /tmp is empty

hive> dfs -ls /tmp/mytable/;

Load data into table

create/copy/download data file, here I created mytable.csv file with 4 rows delimited by ‘,’

hive> dfs -cat /tmp/mytable.csv;
Raju,90
Sateesh,92
Sreeni,99
Bairi,95
hive> load data inpath ‘/tmp/mytable.csv’
    > overwrite into table mytable;
Loading data to table testdb.mytable
Table testdb.mytable stats: [numFiles=1, numRows=0, totalSize=38, rawDataSize=0]
OK
Time taken: 1.165 seconds

Above command simply moved csv file from /tmp to /tmp/mytable folder in hdfs

hive> dfs -ls /tmp/mytable;
Found 1 items
-rwx-wx-wx   1 hive hadoop         38 2016-02-08 09:31 /tmp/mytable/mytable.csv

Run a query and see for the data

hive> select * from mytable;
OK
Raju    90
Sateesh    92
Sreeni    99
Bairi    95
Time taken: 0.231 seconds, Fetched: 4 row(s)

When we loaded with another csv file, it moved existing csv file and copied loaded file.

hive> load data inpath ‘/tmp/tsttable.csv’
> overwrite into table mytable;
Loading data to table testdb.mytable
Moved: ‘hdfs://sandbox.hortonworks.com:8020/tmp/mytable/mytable.csv’ to trash at: hdfs://sandbox.hortonworks.com:8020/user/hive/.Trash/Current
Table testdb.mytable stats: [numFiles=1, numRows=0, totalSize=38, rawDataSize=0]

 TO change delimter of the table;

hive> alter table mytable
    > set SERDEPROPERTIES (‘field.delim’ = ‘:’);
OK
Time taken: 0.456 seconds

What happend to existing csv data ? let’s check

hive> dfs -cat /tmp/mytable/tsttable.csv
> ;
Raju,80
Sateesh,82
Sreeni,89
Bairi,85

data file is still csv file. So Alter will modify only metadata not the data file. Can I access the data now as metadat delimiter modified to “:” but data file has “,” delimter ? let’s check

hive> select * from mytable;
OK
Raju,80    NULL
Sateesh,82    NULL
Sreeni,89    NULL
Bairi,85    NULL
Time taken: 0.304 seconds, Fetched: 4 row(s)

Data fetched is wrong as it’s looking for ‘colon’ delimiter in data file.

Hive has no intelligence to look for data integrity, it’s loading data without cross verification. When table metadata is defined with “:” delimiter, it accepted .csv file to load data.

hive> load data inpath ‘/tmp/mytable.csv’
> overwrite into table mytable;
Loading data to table testdb.mytable
Moved: ‘hdfs://sandbox.hortonworks.com:8020/tmp/mytable/tsttable.csv’ to trash at: hdfs://sandbox.hortonworks.com:8020/user/hive/.Trash/Current
Table testdb.mytable stats: [numFiles=1, numRows=0, totalSize=38, rawDataSize=0]
OK
Time taken: 1.536 seconds
hive> dfs -ls /tmp/mytable/
> ;
Found 1 items
-rwx-wx-wx   1 hive hadoop         38 2016-02-08 14:12 /tmp/mytable/mytable.csv

 

Advertisements

Author: rajukv

Hadoop(BigData) Architect and Hadoop Security Architect can design and build hadoop system to meet various data science projects.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s