The Apache Hive™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. Source : hive.apache.org
To install Apache Hive you can follow the instruction on Hadoop Screencasts – Episode 4 – Installing Apache Hive.
This post is a fast paced, instruction based tutorial that dives directly into using Hive.
Creating a database
A database can be created using the CREATE DATABASE command at the hive prompt.
Syntax:
CREATE DATABASE <database_name>
E.g.
hive> CREATE DATABASE test_hive_db; OK Time taken: 0.048 seconds
The CREATE DATABASE command creates the database under HDFS at the default location: /user/hive/warehouse
This can be verified using the DESCRIBE command.
Syntax:
DESCRIBE DATABASE <database_name>
E.g.
hive> DESCRIBE DATABASE test_hive_db; OK test_hive_db hdfs://localhost:54310/user/hive/warehouse/test_hive_db.db Time taken: 0.042 seconds, Fetched: 1 row(s)
Using a database
To use a database we can use the USE command.
Syntax:
USE <database_name>
E.g.
hive> USE test_hive_db; OK Time taken: 0.045 seconds
Dropping a database
To drop a database we can use the DROP DATABASE command.
Syntax:
DROP DATABASE <database_name>;
E.g.
hive> DROP DATABASE test_hive_db; OK Time taken: 0.233 seconds
To drop a database that has tables within it, you need to use the CASCADE directive along with the DROP DATABASE command.
Syntax:
DROP DATABASE <database_name> CASCADE;
In the next post, we will be creating tables with data and performing some basic queries on them.