1. Mention what are the different modes of Hive?
Different modes of Hive depends on the size of data nodes in Hadoop.
These modes are,
- Local mode
- Map reduce mode
2. Why is Hive not suitable for OLTP systems?
Hive is not suitable for OLTP systems because it does not provide insert and update function at the row level.
3. Mention what is the difference between Hbase and Hive?
Difference between Hbase and Hive is,
- Hive enables most of the SQL queries, but HBase does not allow SQL queries
- Hive does not support record level insert, update, and delete operations on table
- Hive is a data warehouse framework whereas HBase is NoSQL database
- Hive run on the top of MapReduce, HBase runs on the top of HDFS
4. Explain what is a Hive variable? What for we use it?
Hive variable is created in the Hive environment that can be referenced by Hive scripts. It is used to pass some values to the hive queries when the query starts executing.
5. Mention what is ObjectInspector functionality in Hive?
ObjectInspector functionality in Hive is used to analyze the internal structure of the columns, rows, and complex objects. It allows to access the internal fields inside the objects.
6. Mention what is (HS2) HiveServer2?
It is a server interface that performs following functions.
- It allows remote clients to execute queries against Hive
- Retrieve the results of mentioned queries
Some advanced features Based on Thrift RPC in its latest version include
- Multi-client concurrency
7. Mention what Hive query processor does?
Hive query processor convert graph of MapReduce jobs with the execution time framework. So that the jobs can be executed in the order of dependencies.
8. Mention what are the components of a Hive query processor?
The components of a Hive query processor include,
- Logical Plan Generation
- Physical Plan Generation
- Execution Engine
- UDF’s and UDAF’s
- Semantic Analyzer
- Type Checking
9. Mention what is Partitions in Hive?
Hive organizes tables into partitions.
- It is one of the ways of dividing tables into different parts based on partition keys.
- Partition is helpful when the table has one or more Partition keys.
- Partition keys are basic elements for determining how the data is stored in the table.
10. Mention when to choose “Internal Table” and “External Table” in Hive?
In Hive you can choose internal table,
- If the processing data available in local file system
- If we want Hive to manage the complete lifecycle of data including the deletion
You can choose External table,
- If processing data available in HDFS
- Useful when the files are being used outside of Hive
11. Explain what is Hive?
Hive is an ETL and Data warehousing tool developed on top of Hadoop Distributed File System (HDFS). It is a data warehouse framework for querying and analysis of data that is stored in HDFS. Hive is an open-source-software that lets programmers analyze large data sets on Hadoop.
12. When to use Hive?
- Hive is useful when making data warehouse applications
- When you are dealing with static data instead of dynamic data
- When application is on high latency (high response time)
- When a large data set is maintained
- When we are using queries instead of scripting
13. Mention what are the different modes of Hive?
Depending on the size of data nodes in Hadoop, Hive can operate in two modes.
These modes are,
- Local mode
- Map reduce mode
14. Mention when to use Map reduce mode?
Map reduce mode is used when,
- It will perform on large amount of data sets and query going to execute in a parallel way
- Hadoop has multiple data nodes, and data is distributed across different node we use Hive in this mode
- Processing large data sets with better performance needs to be achieved
15. Mention key components of Hive Architecture?
Key components of Hive Architecture includes,
- User Interface
- Execute Engine
16. Mention what are the different types of tables available in Hive?
There are two types of tables available in Hive.
- Managed table: In managed table, both the data and schema are under control of Hive
- External table: In the external table, only the schema is under the control of Hive.
17. Explain what is Metastore in Hive?
Metastore is a central repository in Hive. It is used for storing schema information or metadata in the external database.
18. Mention what Hive is composed of ?
Hive consists of 3 main parts,
- Hive Clients
- Hive Services
- Hive Storage and Computing
19. Mention what are the type of database does Hive support ?
For single user metadata storage, Hive uses derby database and for multiple user Metadata or shared Metadata case Hive uses MYSQL.
20. Mention Hive default read and write classes?
Hive default read and write classes are