168 55 8MB
English Pages [120] Year 2021
Getting Started with Big Data Query using Apache Impala
© 2021 Agus Kurniawan
PE Press
ISBN: 978-1-716-10839-6
Preface
This book provides alternative approach to get started with big data query using Apache Impala. This book describes how to work with Apache Impala and to perform queries inside Apache Impala.
Agus Kurniawan
Depok, February 2021
Table of Contents
Getting Started with Big Data Query using Apache Impala Preface 1. Introduction to Apache Impala 1.1 Introduction 1.2 Installing Apache Impala 1.3 Setting up Lab Demo 2. Working with Apache Impala Shell 2.1 Introduction 2.2 Connecting to Apache Impala Service 2.3 Performing SQL Query with Apache Impala Service 2.4 Executing SQL Query on Apache Impala Shell in NonInteractive Mode 2.5 Executing A SQL Query File with Apache Impala Shell 2.6 Quit from Apache Impala Shell 3. SQL Querying with Apache Hue and Apache Impala 3.1 Setting up Apache Hue 3.2 Connecting Apache Hue to Apache Impala 3.3 Performing SQL Query for Apache Impala 3.4 Working Apache Hue with GetHue Demo Website 4. Loading Dataset to Apache Impala 4.1 Introduction 4.2 Creating Table for Delimited Files 4.3 Testing Query 5. Basic SQL Query for Apache Impala 5.1 Introduction 5.2 Creating and Deleting Databases
5.3 Creating and Deleting Tables 5.4 Inserting and Selecting Data
5.5 Updating and Deleting Data 5.6 Truncating Table Data 5.7 Filtering Data 5.8 Calling Built-in Functions 5.9 Distinct 5.10 Ordering Data 5.11 Grouping 5.12 Having 5.13 Limit and Offset 5.14 Creating and Selecting Views 6. Joining Query and Subquery on Apache Impala 6.1 Introduction 6.2 Joining Query 6.2.1 Inner Join 6.2.2 Left Join 6.2.3 Right Join 6.2.4 Outer Join 6.3 Subquery 6.4 Union and Union All 6.5 With 7. Partition Data on Apache Impala 7.1 Introduction 7.2 Creating Partition Table 7.3 Exploring Partition Table Files on HDFS 8. Apache Impala Database Programming with Java 8.1 Introduction 8.2 Creating A Project
8.3 Connecting to Apache Impala 8.4 Getting All Data
8.5 Inserting Data 8.6 Completed Program Source Code Contact
1. Introduction to Apache Impala
1.1 Introduction
Apache Impala is a modern, open source, distributed SQL query engine for Apache Hadoop. With Impala, we can query data, whether stored in HDFS, Apache Hive or Apache HBase – including SELECT, JOIN, and aggregate functions. You can find the official project on this link, In this book, we learn how to perform queries on Apache Impala.
1.2 Installing Apache Impala
In this section, I use Cloudera Manager to install Apache Impala. You can install Apache Impala to Linux manually. You can see my Cloudera Manager in Figure below.
To add Hadoop service using Cloudera Manager, you can can click Add Server on a context menu as shown in Figure below.
After clicked, you can install Apache Impala. Make sure you also install HDFS, HBASE and Hue.
Once installed, we can start to work with Apache Impala.
1.3 Setting up Lab Demo
You can set up Apache Impala with Cloudera Manager or own Linux. For demo, I use Apache Impala on Cloudera environment. I deployed Apache Impala on Ubuntu Linux.
2. Working with Apache Impala Shell
2.1 Introduction
Apache Impala provide a service and a shell. In this chapter, we learn how to work with Apache Impala shell. To show Impala shell version, you can type this command.
$ impala-shell --version
You will see Impala shell on your Terminal. You can see my Impala shell version is shown in Figure below.
Next, we will work with Impala shell.
2.2 Connecting to Apache Impala Service
To start Impala shell, you open a Terminal on your Apache Impala server. Then, type this command.
$ impala-shell
This will connect to your local Impala shell. After that, you wull see Impala shell as shown in Figure below.
If your Impala service is off, you will obtain error message as shown in Figure below.
How to connect to Impala shell from remote machine ?. Firstly, your Impala machine server already opened Impala port. It usually use port 21000. You can open Impala shell by IP Address as below.
$ impala-shell -i
You can see my remote Impala server is accessed from another Apache Impala machine in Figure below.
After connected, we can work on Impala shell. To show all commands on Impala shell, you press TAB on your keyboard. After pressed, you should see a list of Impala commands as shown in Figure below.
Next, we will learn how to perform queries on Apache Impala shell.
2.3 Performing SQL Query with Apache Impala Service
We can show all databases in Apache Impala using this command.
show databases;
Type this command on Impala shell. Then, you will see a a list of Impala databases as shown in Figure below.
We also print all tables on an Impala database. Firstly, we navigate to a database and then type show tables command.
use ak_testdb;
show tables;
Here is my program output. You can see a list of tables in ak_testdb Impala database.
We can list all data from a table using SELECT..FROM statement. For instance, we print all data from employees table. You can type this SQL script.
select * from employees;
You should see a list of employees table in Impala shell as shown in Figure below.
You can perform standard SQL queries on Impala shell.
2.4 Executing SQL Query on Apache Impala Shell in Non-Interactive Mode
On previous, we can perform SQL queries after we entry Impala shell. We also can run SQL query without entering Impala shell. We can pass -q parameter and type your SQL query. We set -d parameter for the Impala database. For instance, we want to run a SQL query "show tables" on ak_testdb database. You can type this command on Linux shell.
$ impala-shell -i localhost -d ak_testdb -q "show tables"
If succeed, you should see a list of Impala tables in Linux terminal as shown in Figure below.
Another sample, we can execute a SQL query "select * from employees;". We pas this query on -q parameter. You can type this command. Please change -d parameter value by your database name.
$ impala-shell -i localhost -d ak_testdb -q "select * from employees"
You should see a list of employees table on Linux Terminal. You can see my program output in Figure below.
2.5 Executing A SQL Query File with Apache Impala Shell
If you have SQL queries in file, we can run it with Impala shell. For instance, we have the following queries
use ak_testdb;
select * from employees;
In this case, I use a nano editor to write SQL queries on a file as shown in Figure below.
Save these queries into a file, called demo.sql. Then, we can run this SQL file by passing it on -f parameter. Make sure you set Impala database on -d parameter.
$ impala-shell -i localhost -d ak_testdb -f demo.sql
You should see a result of queries from a file. You can see my program output from demo.sql query file in Figure below.
2.6 Quit from Apache Impala Shell
We can quit from Impala shell by typing exit;. You can see my program output on Impala shell in Figure below.
exit;
3. SQL Querying with Apache Hue and Apache Impala
3.1 Setting up Apache Hue
Apache Hue is a web tool that can be used to perform queries on Apache Impala. We can say Apache Hue like MySQL Workbench in MySQL or SQL Server Management Studio in SQL Server. We can use Apache Hue to write queries to Apache Impala easily. This tool has a form in web application so we only need a browser to access. If you have Cloudera platform, you can install Apache Hue using Cloudera Manager. Add a new service on your existing Cloudera Manager. Click Hue and the install as shown in Figure below.
After completed, you can open Apache Hue. For the first time to open Apache Hue, you will be asked to entry username and
password for Admin. You also can add additional users who will access this Apache Hue.
3.2 Connecting Apache Hue to Apache Impala
After installed, you can open Apache Hue. You will see SQL editor as shown in Figure below.
Apache Hue supports various SQL database engine editor. You can click Query button to see a list of supported SQL editor. Select Impala option to work with Impala SQL editor as shown in Figure below.
Now you can see SQL editor for Impala as shown in Figure.
Now you can write SQL queries on this editor.
3.3 Performing SQL Query for Apache Impala
We can write any SQL query on Apache Hue for Impala. For instance, we write the following query.
show databases;
Then, click a blue arrow button (Run button) to execute your query. You also can run specific query lines by highlighting your query text on editor and then click a blue arrow button. You will see the output program from your query as shown in Figure below.
You can change database on your editor. You can click Impala database and click your database that you want to work for.
After selected, you will see your database on Query editor as shown in Figure below.
All queries are written in Query editor will be executed on your selected database unless you use database explicitly using "use your_database;" statement.
For testing, you write the following query to show all tables on current database.
show tables;
Run this query so you will see the output of program as show in Figure below.
You write any SQL query on Query editor. For instance, we perform a query to retrieve all data on employees table.
select * from employees;
Query output:
3.4 Working Apache Hue with GetHue Demo Website
You probably don't have Impala server and Apache Hue on your computer but you want to learn Apache Impale and Hue. You can use Apache Hue on GetHue. You can visit on You will see the websiste in Figure below.
You can click TRY HUE NOW button. Then, you will have a login form as shown in Figure below.
Type demo for username and password. Click Sign In button. After that, you will have Hue editor. To work with Impala, you can change Editor. Select Table (Impala) option.
Now you can write SQL queries on this editor. This website is useful when you don't want to install Apache Impala and Hue.
4. Loading Dataset to Apache Impala
4.1 Introduction
We can create a table by loading dataset files. In this chapter, we will create a table with mapping to dataset file. For demo, we use expense dataset, expense.csv. This file consists of the following headers
Transaction ID Age Items Monthly Income Transaction Time Record Gender City Tier Total Spend
In this demo, we remove a header line from expense.csv file so we have only data on the file content.
4.2 Creating Table for Delimited Files
To implement this demo, we should have copy expense.csv file to HDFS. For instance, we copy expense.csv file to HDFS folder, /user/datasci/ is user folder on HDFS. You can change it with your HDFS account. We can create /demo/exp/ folder if you don't have it. To copy a file from local server, we can use -copyFromLocal parameter on hdfs command. You can see the following my bash commands.
$ hdfs dfs -mkdir -p /user/datasci/datasets/demo/exp/
$ hdfs dfs -chmod -R 777 /user/datasci/expense
$ hdfs dfs -copyFromLocal ./expense-data.csv /user/datasci/datasets/demo/exp/
Now we can create a table that is pointed to HDFS folder. For instance, we have CSV files on this HDFS folder, You creat a table with following SQL query. Change /user/datasci/datasets/demo/exp/ by your HDFS folder.
drop table if exists expense;
create table expense(
id string,
age int,
items int,
Monthly_Income int,
Transaction_Time string,
Record int,
Gender string,
City_Tier string,
Total_Spend double
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/datasci/datasets/demo/exp/';
You can type these SQL queries on Apache Hue as shown in Figure below.
After executed, our Impala table is mapped to folder.
4.3 Testing Query
Now we can perform SQL query to retrieve expense table. You can type this SQL query on Apache Hue, for instance.
select * from expense;
You should see a list of expense data on output as shown in Figure below.
5. Basic SQL Query for Apache Impala
5.1 Introduction
In this chapter, we learn how to build SQL queries on Apache Impala. This is a basic SQL query so you can perform data processing with Apache Impala. Let's start.
5.2 Creating and Deleting Databases
Firstly,we can create a database using create database statement. For instance, we create database You can type this SQL query.
-- creating a database
create database if not exists ak_aa01;
if not exists statement is used to skip on creating a database when the database was already created. To delete a database, you can use drop database statement. You type the following SQL query.
-- creating a database
drop database if exists ak_aa01;
5.3 Creating and Deleting Tables
We can create a table using create table statement. Make sure you set working database using use statement. In creating a table, we define some columns with certain type. For instance, we create a table, with columns: id, first_name, last_name, email, created.
-- use database
use ak_demo;
-- creating a table
create table if not exists employees(
id int,
first_name string,
last_name string,
email string,
created timestamp
);
After created a table, we can verify a table using describe statement.
-- describe table
describe employees;
You should table information on the query output as shown in Figure below.
-- drop a table
drop table if exists employees;
5.4 Inserting and Selecting Data
We can insert data using insert into statement. For instance, we entry 5 data into Employees table. You can type these queries.
-- insert data
insert into Employees(id,first_name,last_name,email,created) values(1,'employee','1','[email protected]',now());
insert into Employees(id,first_name,last_name,email,created) values(2,'employee','2','[email protected]',now());
insert into Employees(id,first_name,last_name,email,created) values(3,'employee','3','[email protected]',now());
insert into Employees(id,first_name,last_name,email,created) values(4,'employee','4','[email protected]',now());
insert into Employees(id,first_name,last_name,email,created) values(5,'employee','5','[email protected]',now());
Now we can perform a query to retrieve all data from employees table. You can run this query.
-- select data
select * from employees;
Program output:
We also can limit a number of data that will be retrieved using limit statement. For instance, we want to obtain 3 data from employees table.
-- select data with limit
select * from employees limit 3;
Query output:
5.5 Updating and Deleting Data
We can update data using update statement. For instance, we want to update employees table data with id=3. You can type this query.
-- update
update Employees set first_name='updated_emp',last_name='update_3'
where id = 3;
We also can delete data using delete from statement. For instance, we want to delete employees table data with id=3. You can type this query.
-- delete
delete from Employees where id=3;
5.6 Truncating Table Data
We can delete all data using delete from statement. We also can use truncate statement to delete data. Truncate can be used to delete the entire data of the table without maintaining the integrity of the table. For instance, we delete all data on employees table.
truncate employees;
Then, we can verify to retrieve all employees table data. You should obtain empty data.
select * from employees;
5.7 Filtering Data
While we are retrieving data from a table, we can filter the data result. We can use where statement for filtering data. For instance, we retrieve data from expense gender with gender 'Female' only.
-- filter
select * from expense where gender = 'Female' limit 5;
Query output:
On filtering data, we can construct some filtering criteria-based table columns. We use AND and OR operations while constructing filtering
data.
select * from expense where gender = 'Female' and monthly_income > 15000 limit 5;
Here is a query sample for constructing filtering criteria.
select * from expense where (gender = 'Female' or city_tier like 'Tier 2') and monthly_income > 15000 limit 5;
Query output:
5.8 Calling Built-in Functions
Apache Impala has built-in functions such as min(), max(), count(). You can write this query for counting a number of data.
-- select count
select count(id) as total from employees;
Program output:
You also can find min and max of data in expense table. You can type this query.
select max(age) as max, min(age) as min from expense;
Query output:
5.9 Distinct
We can get a list of unique data from table column using distinct statement. You can type this query for demo.
-- distinct
select distinct city_tier from expense;
Program output:
5.10 Ordering Data
We can order our data based on table columns. Ordering data can be ascending or descending mode. For instance, we order expense table data by monthly_income column with ascending and descending.
-- order
select * from expense order by monthly_income asc limit 5;
select * from expense order by monthly_income desc limit 5;
You can see the query output as shown in Figure below.
We also can order data with two columns or more. For instance, we order data on expense table by monthly_income and record columns.
select * from expense order by monthly_income desc, record asc limit 5;
select * from expense order by monthly_income desc, record desc limit 5;
You can see the query output in Figure below.
You can order data by other column such as items and record columns.
select * from expense order by items desc, record desc limit 5;
5.11 Grouping
We can group data by certain column. We should perform aggregation on grouping data. For instance, we want to calculate to sum all data on monthly_income column and are grouped by You can see the following query sample.
-- group by
select city_tier, sum(monthly_income) from expense group by city_tier;
Query output:
We can perform grouping and ordering data on the same query. For instance, we group data by city_tier with ordering by total of
select city_tier, sum(monthly_income) as total from expense group by city_tier order by total desc;
Query output:
5.12 Having
We can filter our data when we perform grouping data. We can use having statement. For instance, we use previous query and set filtering with having sum(monthly_income) >
select city_tier, sum(monthly_income) as total from expense group by city_tier having sum(monthly_income) > 12400000;
Query output:
5.13 Limit and Offset
We already learned about limit. For instance, we show 5 data from the following query.
-- offset
select id,age,monthly_income from expense order by id desc limit 5;
Query output:
By default, we use offset 0 by default when we use limit statement. You can see the following query with same output from previous query.
select id,age,monthly_income from expense order by id desc limit 5 offset 0;
Now we can use offset 3 when we use limit statement. We will have 5 data wit starting on index 3.
select id,age,monthly_income from expense order by id desc limit 5 offset 3;
5.14 Creating and Selecting Views
Apache Impala supports for view. We can create a view from a table. A view does not store data. It only keeps a schema. You can create a view using create view statement. For instance, we create a view from selecting employees table.
-- view
Create View if not exists myview as select * from employees;
Now we can perform a query to retrieve all data from myview view.
select * from myview;
You should see myview data on a query editor as shown in Figure below.
We can drop a view using drop view statement. For instance, we want to drop myview view. You can type this query.
drop view if exists myview;
6. Joining Query and Subquery on Apache Impala
6.1 Introduction
On previous chapter, we have learned to perform basic SQL queries on Apache Impala. Now we learn more about SQL query. We involve two table or more in our query statement. We learn the following topics
Joining Subquery Union
Next, we learn about joining query in Apache Impala.
6.2 Joining Query
There are some options to implement joining query for two tables. We can perform joining queries on Apache Impala as follows
Inner join Left join Right join Union
You can see the illustration form for these joining query models in Figure below.
For demo, we use userorder and userorderdetail tables. You can use describe statement to show table schema.
---- joining
describe userorder;
describe userorderdetail;
6.2.1 Inner Join
For the first joining query model, we will implement inner join. We use inner join .. on statement. For instance, we implement inner join on userorder and userorderdetail. We map userorder id column to userorderdetail orderid column. You can see this sample query.
-- inner join
select t1.code, t1.username, t2.product, t2.quantity*t2.price as total from userorder t1
inner join userorderdetail t2 on t1.id = t2.orderid;
You can see the output of inner join query as shown in Figure below.
6.2.2 Left Join
Now we implement left join with the same case on previous section. We change inner join statement to left joint statement.
-- left join
select t1.code, t1.username, t2.product, t2.quantity*t2.price as total from userorder t1
left join userorderdetail t2 on t1.id = t2.orderid;
Program output:
We also can display userorder data only with ignoring data from userorderdetail.
select t1.code, t1.username, t2.product, t2.quantity*t2.price as total from userorder t1
left join userorderdetail t2 on t1.id = t2.orderid where t2.orderid is null;
Program output:
6.2.3 Right Join
We also can implement right join. We can change a previous query with right join statement. You can write this query.
-- right join
select t1.code, t1.username, t2.product, t2.quantity*t2.price as total from userorder t1
right join userorderdetail t2 on t1.id = t2.orderid;
Program output:
We can obtain userorderdetail data only with ignoring data from userorder. We set id=null on userorder table. You can write this query.
select t1.code, t1.username, t2.product, t2.quantity*t2.price as total from userorder t1
right join userorderdetail t2 on t1.id = t2.orderid where t1.id is null;
Program output:
6.2.4 Outer Join
We can merge two tables like union using full outer join statement. You can run this query sample.
select t1.code, t1.username, t2.product, t2.quantity*t2.price as total from userorder t1
full outer join userorderdetail t2 on t1.id = t2.orderid;
Program output:
We also can remove duplication data with setting id and orderid to null. You can write this query.
select t1.code, t1.username, t2.product, t2.quantity*t2.price as total from userorder t1
full outer join userorderdetail t2 on t1.id = t2.orderid where t1.id is null or t2.orderid is null;
Program output:
6.3 Subquery
We can perform a query inside a query. This is called a subquery. You can see the following query sample for subquery.
-- subquery
select * from userorder where id in
(select distinct orderid from userorderdetail where (quantity*price) > 10);
Program output:
6.4 Union and Union All
We can merge two tables using union statement. If there are the same data on both tables,the data will be picked up only one. No duplication data with union statement.
-- union
select * from userorder limit 3
union
select * from userorder limit 2;
You can see my program output. We can see that we merge three data and two data. since there are duplication data, the query result show three data only.
We can merge two tables with ignoring duplication data using union all statement. You can see the following query for sample.
-- union all
select * from userorder limit 3
union all
select * from userorder limit 2;
You can see merging data for 3 data and 2 data. The query result shows 5 data.
6.5 With
We can use with statement to create an alias that performs a query. For instance, we create t1 and t2 as query statements. Then, we can perform a query on t1 and t2. You can see the following sample query.
-- with
with t1 as (select * from userorderdetail where price > 5),
t2 as (select * from userorderdetail where price < 2) (select * from t1 union select * from t2);
This is my query output.
7. Partition Data on Apache Impala
7.1 Introduction
If you have a big growth data on a table, we perform partition on our table. In this chapter, we will create a partition table on Apache Impala. Let's start!
7.2 Creating Partition Table
We can create a partition table in Impala. For demo, we use three parameters on partition table: year, month and day. In this case, we create news table with the following partition table.
create table if not exists news (title string, content string)
partitioned by (year int, month int , day int)
row format delimited fields terminated by ',';
After created, we can verify to check existence of news table. You can type this command.
show tables;
You will see a list of tables as shown in Figure below.
For demo, we create some data with passing three parameters (year, month, day). You can type the following SQL scripts.
insert into news partition(year=2021,month=1,day=8)
values('News 1','Lorem ipsum dolor sit amet, consectetur adipiscing elit.');
insert into news partition(year=2021,month=1,day=8)
values('News 2','Aenean erat eros, aliquam eu posuere at, placerat at justo');
insert into news partition(year=2021,month=1,day=7)
values('News 3','Nullam laoreet accumsan leo eu tristique');
insert into news partition(year=2020,month=12,day=5)
values('News 4','Proin auctor augue eget dictum interdum');
insert into news partition(year=2020,month=12,day=6)
values('News 5','In fermentum accumsan laoreet.');
Now we can verify by querying news table. You should see a list of data from news table.
Next, we explore our data on partition table by querying.
7.3 Exploring Partition Table Files on HDFS
Now we can explore how our data was stored into Impala table. Impala database storage usually locates on /user/hive/warehouse/ HDFS folder. You show all databases files using HDFS command as below.
$ hdfs dfs -ls /user/hive/warehouse
We also can show all tables inside a database. For instance, we show tables on ak_demo.db database file. Type this command.
$ hdfs dfs -ls /user/hive/warehouse/ak_demo.db/
After executed, you can see a list of tables on ak_demo.db database. You can see my output in Figure below.
You can see a list of tables in ak_demo.db database file.
$ hdfs dfs -ls /user/hive/warehouse/ak_demo.db/news/
You will see a list of data on news table. You can see partition parameters: year=2020 and year=2021. For instance, we perform query on partition year=2021.
$ hdfs dfs -ls /user/hive/warehouse/ak_demo.db/news/year=2021
You can see see data with partition month. Let us to perform query with partition month=1.
$ hdfs dfs -ls /user/hive/warehouse/ak_demo.db/news/year=2021/month=1
You will see a list of data with partition day parameter. Now we perform a query for partition day=8.
$ hdfs dfs -ls /user/hive/warehouse/ak_demo.db/news/year=2021/month=1/day=8
Now you can see the real data on partition data year=2021, month=1, day = 8. Another sample, we execute query with partition data year=2021, month=1, day = 7.
$ hdfs dfs -ls /user/hive/warehouse/ak_demo.db/news/year=2021/month=1/day=7
You should see a list of data under partition year=2021, month=1, day = 7. You can do practices by creating tables with your own partition data.
8. Apache Impala Database Programming with Java
8.1 Introduction
In this chapter, we learn how to access Apache Impala from a program. We will use Java application for a sample of client application. To access Apache Impala, we can use ODCB and JDBC drivers from Cloudera. You can see them on https://www.cloudera.com/downloads.html as shown in Figure below.
For demo, we use JDBC driver for Impala. You can download it. Then, extrat zip file. You should see some JDBC driver files as shown in Figure below.
We will JDBC 4.2 driver for Java application. Next, we create Java application project.
8.2 Creating A Project
You can create Java application using any editor tool. In this book, I use Jetbrain IntelliJ IDEA. This tool is available for community edition. You can download it on
Now we can create a new project using IntelliJ IDEA. You can select Java application with project template as shown in Figure below.
Then, click Next button. You should see a dialog as shown in Figure below.
Checked Create project from template option. After that, click Next button. You should obtain a form as shown in Figure below.
Fill project name and project folder. You can see my project name and folder in Figure above. Click Finish button to complete for creating a project. Then, you will get editor with template codes as follows.
package id.ilmudata;
public class Main {
public static void main(String[] args) {
}
}
Now we open the project structure. Then, add JDBC driver file for Impala into our project. Select Libraries and add JDBC driver file for Impala. You can see my project structure as shown in Figure below.
Click Apply and OK buttons to close. Next, we write Java codes to connect Apache Impala server.
8.3 Connecting to Apache Impala
To connect to Apache Impala server, we create Connection object from JDBC API. We pass JDBC url with format You can change for Impala IP server or hostname.You also can change database and Impala port for your resources.
We create getConnection() function to connect Impala server. We use com.cloudera.impala.jdbc.Driver class name for Clouder Impala JDBC driver. You can write the following codes for implementation.
package id.ilmudata;
import java.sql.*;
public class Main {
static String CONNECTION_URL = "jdbc:impala://:21050/database";
public static void main(String[] args) {
try {
Connection conn = getConnection();
if (conn != null) {
System.out.println("Connected");
}else {
System.out.println("Not connected");
return;
}
conn.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
public static Connection getConnection() throws Exception {
Connection connection;
Class.forName("com.cloudera.impala.jdbc.Driver");
connection = DriverManager.getConnection(CONNECTION_URL);
return connection;
}
}
If we obtain Connection object, it means our program already connected to Apache Impala server. Next, we can retrieve data from Apache Impala.
8.4 Getting All Data
In this section, we retrieve all data from Apache Impala. For implementation, we create showAllData() method with Connection as parameter. To retrieve data, we perform Select query. We create Statement object and then pass our query text into Statement object. Call executeQuery() to obtain a cursor. After that, we perform a looping to retrieve data from Apache Impala. We print the data to Terminal using println() method.
The following is codes implementation for showAllData() method. You can write these codes.
public static Integer showAllData(Connection conn) throws Exception {
System.out.println("------------show data-----------");
String sql = "SELECT * FROM employees";
Statement statement = conn.createStatement();
ResultSet result = statement.executeQuery(sql);
Integer total = 0;
while (result.next()){
Integer id = result.getInt("id");
String first_name = result.getString("first_name");
String last_name = result.getString("last_name");
String email = result.getString("email");
Timestamp created = result.getTimestamp("created");
String output = "Employee #%d: %s - %s - %s %s";
System.out.println(String.format(output, id, first_name, last_name, email, created));
total++;
}
result.close();
System.out.println("--------------------------");
return total;
}
Save these codes. Next, we perform to insert data into Apache Impala.
8.5 Inserting Data
We can insert data into Apache Impala using SQL query, INSERT INTO. Since we input parameters, we create PreparedStatement object for parameter inputs. We use now() function from Apache Impala for inserting date and time now. We call executeUpdate() method to execute our INSERT query.
We implement inserted data in insertData() method. The following is the completed codes for insertData() method.
public static void insertData(Connection conn, Integer id, String first_name,
String last_name, String email) throws Exception{
System.out.println("------------inserting data-----------");
String sql = "INSERT INTO employees (id, first_name, last_name, email, created) VALUES (?, ?, ?, ?, now())";
PreparedStatement statement = conn.prepareStatement(sql);
statement.setInt(1, id);
statement.setString(2, first_name);
statement.setString(3, last_name);
statement.setString(4, email);
statement.executeUpdate();
System.out.println("--------------------------");
}
Save these codes. Next we can call in our main program.
8.6 Completed Program
In this scenario, we open a connection to Apache Impala. Once connected, we retrieve all data by calling showData() method. A number of data is saved to total variable. We use total variable for next inserted data. Next, we insert data with an employee Id = total + 1. After that, we show all data. You can see the complete program to connect to Apache Impala below. Then, the program performs inserting and showing data.
The following is the completed codes for our program.
package id.ilmudata;
import java.sql.*;
public class Main {
static String CONNECTION_URL = "jdbc:impala://:21050/database";
public static void main(String[] args) {
try {
Connection conn = getConnection();
if (conn != null) {
System.out.println("Connected");
}else {
System.out.println("Not connected");
return;
}
// show data
Integer total = showAllData(conn);
// insert data
insertData(conn,total+1,"new-first","newlast","[email protected]");
// show data
showAllData(conn);
conn.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
public static Connection getConnection() throws Exception {
Connection connection;
Class.forName("com.cloudera.impala.jdbc.Driver");
connection = DriverManager.getConnection(CONNECTION_URL);
return connection;
}
public static Integer showAllData(Connection conn) throws Exception {
System.out.println("------------show data-----------");
String sql = "SELECT * FROM employees";
Statement statement = conn.createStatement();
ResultSet result = statement.executeQuery(sql);
Integer total = 0;
while (result.next()){
Integer id = result.getInt("id");
String first_name = result.getString("first_name");
String last_name = result.getString("last_name");
String email = result.getString("email");
Timestamp created = result.getTimestamp("created");
String output = "Employee #%d: %s - %s - %s %s";
System.out.println(String.format(output, id, first_name, last_name, email, created));
total++;
}
result.close();
System.out.println("---------------------------");
return total;
}
public static void insertData(Connection conn, Integer id, String first_name,
String last_name, String
email) throws Exception{
System.out.println("------------inserting data-----------");
String sql = "INSERT INTO employees (id, first_name, last_name, email, created) VALUES (?, ?, ?, ?, now())";
PreparedStatement statement = conn.prepareStatement(sql);
statement.setInt(1, id);
statement.setString(2, first_name);
statement.setString(3, last_name);
statement.setString(4, email);
statement.executeUpdate();
System.out.println("---------------------------");
}
}
Now we can run this program. Make sure you already change IP address of Apache Impala server. You probably use localhost for Apache Impala server.
After run, you can see the program show a list of employees. Then, the program inserts data and shows all data. You can see my program output in Figure below.
Source Code
You can download source code for this book on http://www.makers.id/ak/impala2361.zip
Contact
If you have question related to this book, please contact me at [email protected] . My blog: http://blog.aguskurniawan.net .