Working with PrestoDB Connectors

Prerequisite:
Complete my previous post Installing/Configuring PrestoDB

Presto enables you to connect to other databases using some connector, in order to perform queries and joins over several sources providing metadata and data for queries. In this post we will work with some connectors. A coordinator (a master daemon) uses connectors to get metadata (such as table schema) that is needed to build a query plan. Workers use connectors to get actual data that will be processed by them.

Using mysql Connector

The MySQL connector allows querying and creating tables in an external MySQL database. This can be used to join data between different systems like MySQL and Hive, or between two different MySQL instances. Remember mysql connector does not support the following queries

create,insert,update,delete,drop

Installing mysql (if not already installed)
1- Download MySQL 5.7 repo file :

[root@vbgeneric bin]# wget https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm

--2017-06-15 22:20:40-- https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm

Resolving dev.mysql.com (dev.mysql.com)... 137.254.60.11

Connecting to dev.mysql.com (dev.mysql.com)|137.254.60.11|:443... connected.

HTTP request sent, awaiting response... 302 Found

Location: https://repo.mysql.com//mysql57-community-release-el7-8.noarch.rpm [following]

--2017-06-15 22:20:41-- https://repo.mysql.com//mysql57-community-release-el7-8.noarch.rpm

Resolving repo.mysql.com (repo.mysql.com)... 104.117.138.198

Connecting to repo.mysql.com (repo.mysql.com)|104.117.138.198|:443... connected.

HTTP request sent, awaiting response... 200 OK

Length: 9116 (8.9K) [application/x-redhat-package-manager]

Saving to: ‘mysql57-community-release-el7-8.noarch.rpm’

100%[===========================================================================>] 9,116 --.-K/s in 0s

2017-06-15 22:20:42 (22.7 MB/s) - ‘mysql57-community-release-el7-8.noarch.rpm’ saved [9116/9116]

2- Install MySQL 5.7 repo file :

[root@vbgeneric bin]# rpm -ivh mysql57-community-release-el7-8.noarch.rpm

warning: mysql57-community-release-el7-8.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY

Preparing... ################################# [100%]

Updating / installing...

1:mysql57-community-release-el7-8 ################################# [100%]

3- Install MySQL 5.7 database server :

[root@vbgeneric bin]# rpm -ivh mysql57-community-release-el7-8.noarch.rpm
warning: mysql57-community-release-el7-8.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY

Preparing... ################################# [100%]

Updating / installing...

1:mysql57-community-release-el7-8 ################################# [100%]

[root@vbgeneric bin]# yum install mysql-server -y

Loaded plugins: langpacks, ulninfo

mysql-connectors-community | 2.5 kB 00:00:00

mysql-tools-community | 2.5 kB 00:00:00

mysql57-community | 2.5 kB 00:00:00

(1/3): mysql-connectors-community/x86_64/primary_db | 14 kB 00:00:00

(2/3): mysql-tools-community/x86_64/primary_db | 33 kB 00:00:00

(3/3): mysql57-community/x86_64/primary_db | 106 kB 00:00:00

Resolving Dependencies

--> Running transaction check

---> Package mysql-community-server.x86_64 0:5.7.18-1.el7 will be installed

--> Processing Dependency: mysql-community-common(x86-64) = 5.7.18-1.el7 for package: mysql-community-server-5.7.18-1.el7.x86_64

....

Verifying : mysql-community-common-5.6.23-3.el7.x86_64 7/7

Installed:

mysql-community-server.x86_64 0:5.7.18-1.el7

Dependency Installed:

mysql-community-client.x86_64 0:5.7.18-1.el7 mysql-community-libs-compat.x86_64 0:5.7.18-1.el7

Dependency Updated:

mysql-community-common.x86_64 0:5.7.18-1.el7 mysql-community-libs.x86_64 0:5.7.18-1.el7

Complete!

4- Start MySQL server

[root@vbgeneric bin]# systemctl start mysqld

5- Connecting to the MySQL Server with the mysql Client

Once your MySQL server is up and running, you can connect to it as the superuser root with the mysql client.

[root@vbgeneric bin]# mysql -u root -p
Enter password:
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)

For installations using the MySQL Yum repository, MySQL SUSE repository, or RPM packages directly downloaded from Oracle, the generated root password is in the error log. View it with, for example, the following command:

[root@vbgeneric bin]# grep 'temporary password' /var/log/mysqld.log

2017-06-16T02:31:25.052590Z 1 [Note] A temporary password is generated for root@localhost: -Ekw6uk3R;Yp

Now use the above password

[root@vbgeneric bin]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.18

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Reset your root password with the following statement because until you reset your root password, you will not be able to exercise any of the superuser privileges, even if you are logged in as root.

mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'yourPassword';

Query OK, 0 rows affected (0.00 sec)

6- Create Database and objects in MySQL Server

mysql> create database iub;

mysql> use iub;

mysql> show databases;

+--------------------+

| Database |

+--------------------+

| information_schema |

| iub |

| mysql |

| performance_schema |

| sys |

+--------------------+

mysql> create table friend(frnd_id int not null, frnd_name varchar(50),category varchar(100));

mysql> insert into friend values(1,'Zeeshan','SQL DBA') ;

mysql> insert into friend values(2,'Saeed','Java') ;

mysql> insert into friend values(3,'Ali','Oracle DBA');

mysql> insert into friend values(4,'Tanveer','Telend');

mysql> select * from friend;

+---------+-----------+------------+

| frnd_id | frnd_name | category |

+---------+-----------+------------+

| 1 | Zeeshan | SQL DBA |

| 2 | Saeed | Java |

| 3 | Ali | Oracle DBA |

| 4 | Tanveer | Telend |

+---------+-----------+------------+

Configure mysql Connector

To configure the MySQL connector, create a catalog properties file in etc/catalog named, for example, mysql.properties, to mount the MySQL connector as the mysql catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

[presto@vbgeneric bin]$ vi $PRESTO_HOME/etc/catalog/mysql.properties
[presto@vbgeneric bin]$ cat $PRESTO_HOME/etc/catalog/mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://localhost:3306
connection-user=root
connection-password=P@ssw0rd

Connect Presto CLI
Let’s connect Mysql storage plugin to Presto server using command line options. Here "iub" refers to schema in mysql server.

[presto@vbgeneric bin]$ prestocli --server localhost:7070 --catalog mysql --schema iub
presto:iub>

List out all the schemas in mysql

presto:iub> show schemas from mysql;

Schema

--------------------

information_schema

iub

performance_schema

sys

(4 rows)

Query 20170616_041856_00003_prm89, FINISHED, 1 node

Splits: 18 total, 18 done (100.00%)

0:02 [4 rows, 62B] [1 rows/s, 28B/s]

List Tables from Schemapresto:iub> show tables from mysql.iub;
Table
--------
friend
(1 row)

Query 20170616_042135_00004_prm89, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:02 [1 rows, 19B] [0 rows/s, 8B/s]
Describe Tablepresto:iub> describe mysql.iub.friend;
Column | Type | Extra | Comment
-----------+--------------+-------+---------
frnd_id | integer | |
frnd_name | varchar(50) | |
category | varchar(100) | |
(3 rows)

Query 20170616_042312_00006_prm89, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:03 [3 rows, 198B] [1 rows/s, 74B/s]

Show Columns from Table

presto:iub> show columns from mysql.iub.friend;

Create Table Using as CommandMysql connector doesn’t support create table query but you can create a table using as command.

presto:iub> create table mysql.iub.frnd2 as select * from mysql.iub.friend;
CREATE TABLE: 4 rows

Query 20170616_042826_00009_prm89, FINISHED, 1 node
Splits: 35 total, 35 done (100.00%)
0:04 [4 rows, 0B] [0 rows/s, 0B/s]

presto:iub> quit;

Using JMX Connector

The JMX connector provides the ability to query JMX information from all nodes in a Presto cluster. This is very useful for monitoring or debugging. Java Management Extensions (JMX) provides information about the Java Virtual Machine and all of the software running inside it. Presto itself is heavily instrumented via JMX.

This connector can also be configured so that chosen JMX information will be periodically dumped and stored in memory for later access.

As we have already enabled “jmx.properties” file under “etc/catalog”directory. Now connect Prest CLI to enable JMX plugin.

[presto@vbgeneric bin]$ prestocli --server localhost:7070 --catalog jmx --schema jmx

List out all the schemas in jmx

presto:jmx> show schemas from jmx;

Schema

--------------------

current

history

information_schema

(3 rows)

presto:jmx> show tables from jmx.current;

Table

----------------------------------------------------------------------------------------------------------

com.facebook.presto.execution.executor:name=taskexecutor

com.facebook.presto.execution.scheduler:name=nodescheduler

com.facebook.presto.execution.scheduler:name=splitschedulerstats

com.facebook.presto.execution:name=queryexecution

com.facebook.presto.execution:name=querymanager

....

presto:jmx> select * from jmx.current."com.facebook.presto.execution.executor:name=taskexecutor";

node | blockedquantawalltime.alltime.count | blockedquantawalltime.alltim

--------------------------------------+-------------------------------------+-----------------------------

ffffffff-ffff-ffff-ffff-ffffffffffff | 403.0 | 24413

(1 row)

Using HIVE Connector

The Hive connector allows querying data stored in a Hive data warehouse. The Hive metastore service (default port 9083) stores the metadata for Hive tables and partitions in a relational database, and provides clients (including Hive) access to this information using the metastore service API. Presto uses Hive metastore service to get the hive table’s details. It does not use HiveQL or any part of Hive’s execution environment.

The following file types are supported for the Hive connector:

ORC,Parquet,Avro,RCFile,SequenceFile,JSON,Text

[hdpsysuser@vbgeneric ~]$ hive --service metastore

which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/java/default/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/apache-hive-2.1.1-bin/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin:/usr/hadoopsw/apache-flume-1.7.0-bin/bin)

Starting Hive Metastore Server

Configuring Hive Connector

[presto@vbgeneric bin]$ vi $PRESTO_HOME/etc/catalog/hive.properties

[presto@vbgeneric bin]$ cat $PRESTO_HOME/etc/catalog/hive.properties

connector.name=hive-hadoop2

hive.metastore.uri=thrift://localhost:9083

Check hive tables from hive prompt

I've already created hive database and tables which we will query from presto, but before that check from hive itself.

hive> use flume;

hive> show tables;

raw_tweets

tweets_data2

twitterdata

vw_parsed_tweets

Time taken: 0.272 seconds, Fetched: 4 row(s)

hive> select * from raw_tweets limit 1;

{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:39 +0000 2017","in_reply_to_user_id_str":null,"source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone<\/a>","retweeted_status":{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"coordinates":null,"created_at":"Sun Jun 11 10:16:58 +0000 2017","truncated":false,"in_reply_to_user_id_str":null,"source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone<\/a>","retweet_count":2,"retweeted":false,"geo":null,"filter_level":"low","in_reply_to_screen_name":null,"is_quote_status":false,"entities":{"urls":[],"hashtags":[{"indices":[74,95],"text":"شعب_قطر_في_قلب_سلمان"},{"indices":[96,117],"text":"_قطع_العلاقات_مع_قطر"}],"user_mentions":[],"symbols":[]},"id_str":"873846512484208640","in_reply_to_user_id":null,"favorite_count":5,"id":873846512484208640,"text":"UAE : Trouble maker \nSaudi : Support UAE but not sure \nBahrain : Retweet #شعب_قطر_في_قلب_سلمان #_قطع_العلاقات_مع_قطر","place":null,"contributors":null,"lang":"en","user":{"utc_offset":10800,"friends_count":306,"profile_image_url_https":"https://pbs.twimg.com/profile_images/615523415617712128/DMuJuuUu_normal.jpg","listed_count":0,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","default_profile_image":false,"favourites_count":483,"description":"ربِ بي منك فضلٌ ونعمه عطاً غير مجذوذ ولا منقوص","created_at":"Sun Jul 29 17:41:11 +0000 2012","is_translator":false,"profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","protected":false,"screen_name":"im_aisha92","id_str":"724360172","profile_link_color":"1DA1F2","id":724360172,"geo_enabled":true,"profile_background_color":"C0DEED","lang":"ar","profile_sidebar_border_color":"C0DEED","profile_text_color":"333333","verified":false,"profile_image_url":"http://pbs.twimg.com/profile_images/615523415617712128/DMuJuuUu_normal.jpg","time_zone":"Kuwait","url":null,"contributors_enabled":false,"profile_background_tile":false,"profile_banner_url":"https://pbs.twimg.com/profile_banners/724360172/1446770569","statuses_count":2895,"follow_request_sent":null,"followers_count":187,"profile_use_background_image":true,"default_profile":true,"following":null,"name":"Alsulaiti Aisha","location":"Cardiff-UK","profile_sidebar_fill_color":"DDEEF6","notifications":null},"favorited":false},"retweet_count":0,"retweeted":false,"geo":null,"filter_level":"low","in_reply_to_screen_name":null,"is_quote_status":false,"id_str":"874026367360409600","in_reply_to_user_id":null,"favorite_count":0,"id":874026367360409600,"text":"RT @im_aisha92: UAE : Trouble maker \nSaudi : Support UAE but not sure \nBahrain : Retweet #شعب_قطر_في_قلب_سلمان #_قطع_العلاقات_مع_قطر","place":null,"lang":"en","favorited":false,"coordinates":null,"truncated":false,"timestamp_ms":"1497219099260","entities":{"urls":[],"hashtags":[{"indices":[90,111],"text":"شعب_قطر_في_قلب_سلمان"},{"indices":[112,133],"text":"_قطع_العلاقات_مع_قطر"}],"user_mentions":[{"indices":[3,14],"screen_name":"im_aisha92","id_str":"724360172","name":"Alsulaiti Aisha","id":724360172}],"symbols":[]},"contributors":null,"user":{"utc_offset":null,"friends_count":8,"profile_image_url_https":"https://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png","listed_count":0,"profile_background_image_url":"","default_profile_image":false,"favourites_count":0,"description":null,"created_at":"Wed Jun 07 05:44:14 +0000 2017","is_translator":false,"profile_background_image_url_https":"","protected":false,"screen_name":"Reem01020398","id_str":"872328324122644482","profile_link_color":"1DA1F2","id":872328324122644482,"geo_enabled":false,"profile_background_color":"F5F8FA","lang":"en","profile_sidebar_border_color":"C0DEED","profile_text_color":"333333","verified":false,"profile_image_url":"http://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png","time_zone":null,"url":null,"contributors_enabled":false,"profile_background_tile":false,"statuses_count":65,"follow_request_sent":null,"followers_count":1,"profile_use_background_image":true,"default_profile":true,"following":null,"name":"Reem","location":null,"profile_sidebar_fill_color":"DDEEF6","notifications":null}}

Time taken: 11.043 seconds, Fetched: 1 row(s)

hive>

Start Presto CLI
Start Presto CLI to connect Hive storage plugin using the following command.

[presto@vbgeneric bin]$ prestocli --server localhost:7070 --catalog hive --schema flume

presto:flume> show schemas from hive;

Schema

--------------------

default

flume

information_schema

(3 rows)

presto:flume> show tables from hive.flume;

Table

------------------

raw_tweets

tweets_data2

twitterdata

vw_parsed_tweets

(4 rows)

presto:flume> select * from hive.flume.raw_tweets;

Query 20170616_051901_00005_aceay, RUNNING, 1 node, 16 splits

0:19 [ 0 rows, 0B] [ 0 rows/s, 0B/s] [ <=> ]

STAGES ROWS ROWS/s BYTES BYTES/s QUEUED RUN DONE

0.........R 0 0 0B 0B 0 0 0

1.......S 0 0 0B 0B 0 0 0

----------------------------------------------------------------------------------------------------------

{"quoted_status":{"extended_tweet":{"extended_entities":{"media":[{"display_url":"pic.twitter.com/iG9xY6R

{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:50 +0000 20

{"quoted_status":{"extended_tweet":{"entities":{"urls":[{"display_url":"twitter.com/ajenglish/stat\u2026"

{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:51 +0000 20

...

Test Performance

Run the below query both in hive and presto and observe the response time. I'm running Oracle VM with one node cluster for this test. Both hive and presto are running on the same VM.

presto:flume> select count(*) from raw_tweets;

_col0

-------

428

(1 row)

Query 20170616_101712_00002_g7yqb, FINISHED, 1 node

Splits: 19 total, 19 done (100.00%)

0:20 [428 rows, 2.4MB] [21 rows/s, 125KB/s]

Using System Connector

The System connector provides information and metrics about the currently running Presto cluster. It makes this available via normal SQL queries. The System connector doesn’t need to be configured: it is automatically available via a catalog named system.

List the available system schemas:

presto:flume> SHOW SCHEMAS FROM system;

Schema

--------------------

information_schema

jdbc

metadata

runtime

(4 rows)

List the tables in one of the schemas:

presto:flume> SHOW TABLES FROM system.runtime;

Table

--------------

nodes

queries

tasks

transactions

(4 rows)

Query one of the tables:

presto:flume> SELECT * FROM system.runtime.nodes;

node_id | http_uri | node_version | coordinator | state

--------------------------------------+----------------------------+--------------+-------------+--------

ffffffff-ffff-ffff-ffff-ffffffffffff | http://192.168.56.101:7070 | 0.179 | true | active

(1 row)

Kill a running query:

presto:flume> CALL system.runtime.kill_query('20151207_215727_00146_tx3nr');

CALL

System Connector Tables

metadata.catalogsThe catalogs table contains the list of available catalogs.

metadata.schema_propertiesThe schema properties table contains the list of available properties that can be set when creating a new schema.

metadata.table_propertiesThe table properties table contains the list of available properties that can be set when creating a new table.

runtime.nodesThe nodes table contains the list of visible nodes in the Presto cluster along with their status.

runtime.queriesThe queries table contains information about currently and recently running queries on the Presto cluster. From this table you can find out the original query text (SQL), the identity of the user who ran the query and performance information about the query including how long the query was queued and analyzed.

runtime.tasksThe tasks table contains information about the tasks involved in a Presto query including where they were executed and and how many rows and bytes each task processed.

runtime.transactionsThe transactions table contains the list of currently open transactions and related metadata. This includes information such as the create time, idle time, initialization parameters, and accessed catalogs.

Using Local File Connector

The local file connector allows querying data stored on the local file system of each worker. To configure the local file connector, create a catalog properties file under etc/catalog named, for example, localfile.properties with the following contents:

[presto@vbgeneric ~]$ mkdir /data/presto_data/localfiles

[presto@vbgeneric ~]$ vi $PRESTO_HOME/etc/catalog/localfile.properties

[presto@vbgeneric ~]$ cat $PRESTO_HOME/etc/catalog/localfile.properties

connector.name=localfile

presto-logs.http-request-log.location=/data/presto_data/var/log/http-request.log

The local file connector provides a single schema named logs and this connector can only process the http log format created by Presto itself.

[presto@vbgeneric ~]$ prestocli --server localhost:7070 --catalog localfile --schema logs

presto:logs> show tables;

Table

------------------

http_request_log

(1 row)

presto:logs> describe http_request_log;

Column | Type | Extra | Comment

-------------------+-----------+-------+---------

server_address | varchar | |

timestamp | timestamp | |

client_address | varchar | |

method | varchar | |

request_uri | varchar | |

user | varchar | |

agent | varchar | |

response_code | bigint | |

request_size | bigint | |

response_size | bigint | |

time_to_last_byte | bigint | |

trace_token | varchar | |

(12 rows)

presto:logs> select method,user,agent from http_request_log;

method | user | agent

--------+------+-----------------------------------------------------------------------------------------------------

GET | NULL | ffffffff-ffff-ffff-ffff-ffffffffffff

POST | NULL | NULL

GET | NULL | NULL

POST | NULL | NULL

GET | NULL | NULL

POST | NULL | NULL

GET | NULL | ffffffff-ffff-ffff-ffff-ffffffffffff

POST | NULL | NULL

GET | NULL | NULL

POST | NULL | NULL

GET | NULL | NULL

GET | NULL | Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safar

...

Using Hive and mysql as combined data source

Prepare data in mysql

Create table in mysql database eg; iub.dept

mysql> CREATE TABLE iub.dept( deptno INT, dname VARCHAR(10), loc varchar(10));

load data into mysql table

insert into iub.dept values (10,'ACCOUNTING','NEW YORK');

insert into iub.dept values (20,'RESEARCH','DALLAS');

insert into iub.dept values (30,'SALES','CHICAGO');

insert into iub.dept values (40,'OPERATIONS','BOSTON');

mysql> select * from iub.dept;

+--------+------------+----------+

| deptno | dname | loc |

+--------+------------+----------+

| 10 | ACCOUNTING | NEW YORK |

| 20 | RESEARCH | DALLAS |

| 30 | SALES | CHICAGO |

| 40 | OPERATIONS | BOSTON |

+--------+------------+----------+

4 rows in set (0.00 sec)

Prepare data in hive

Create emp table in hive database eg; scott.emp

create table scott.emp( empno STRING, ename STRING, job STRING,

mgr STRING, hiredate STRING, sal STRING,

comm STRING, deptno STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

load data into hive table

emp.csv

7369,SMITH,CLERK,7902,17-DEC-80,800,,20
7499,ALLEN,SALESMAN,7698,20-FEB-81,1600,300,30
7521,WARD,SALESMAN,7698,22-FEB-81,1250,500,30
7566,JONES,MANAGER,7839,02-APR-81,2975,,20
7654,MARTIN,SALESMAN,7698,28-SEP-81,1250,1400,30
7698,BLAKE,MANAGER,7839,01-MAY-81,2850,,30
7782,CLARK,MANAGER,7839,09-JUN-81,2450,,10
7788,SCOTT,ANALYST,7566,19-APR-87,3000,,20
7839,KING,PRESIDENT,,17-NOV-81,5000,,10
7844,TURNER,SALESMAN,7698,08-SEP-81,1500,0,30
7876,ADAMS,CLERK,7788,23-MAY-87,1100,,20
7900,JAMES,CLERK,7698,03-DEC-81,950,,30
7902,FORD,ANALYST,7566,03-DEC-81,3000,,20
7934,MILLER,CLERK,7782,23-JAN-82,1300,,10

LOAD DATA LOCAL INPATH '/tmp/mydata/emp.csv' OVERWRITE INTO TABLE scott.emp;

Process Data

As we have data in both databases, we are ready to process data through hive and mysql catalogs which have been defined earlier.

With Presto, to specify a table that you wish to process, we use the catalog.database.table naming convention e.g. mysql.iub.dept or hive.scott.emp.

Connect with presto CLI

[presto@vbgeneric ~]$ prestocli --server localhost:7070

presto>

presto> SHOW TABLES FROM mysql.iub;

Table

--------

dept

friend

frnd2

(3 rows)

presto> SHOW TABLES FROM hive.scott;

Table

-------

emp

(1 row)

Run the join query now

presto> SELECT d.dname,e.ename from mysql.iub.dept d JOIN hive.scott.emp e ON d.deptno = e.deptno;

Query 20170616_143710_00007_mgvsh failed: line 1:80: '=' cannot be applied to integer, varchar

SELECT d.dname,e.ename from mysql.iub.dept d JOIN hive.scott.emp e ON d.deptno = e.deptno

Query failed because in hive emp table has deptno column as STRING while in mysql deptno column is defined as INT. So we cast in presto and run the query again.

presto> SELECT d.dname,e.ename from mysql.iub.dept d JOIN hive.scott.emp e ON d.deptno = cast(e.deptno as integer)

-> ;

dname | ename

------------+--------

ACCOUNTING | MILLER

ACCOUNTING | KING

ACCOUNTING | CLARK

RESEARCH | FORD

RESEARCH | ADAMS

RESEARCH | SCOTT

RESEARCH | JONES

RESEARCH | SMITH

SALES | JAMES

SALES | TURNER

SALES | BLAKE

SALES | MARTIN

SALES | WARD

SALES | ALLEN

(14 rows)

Query 20170616_144316_00008_mgvsh, FINISHED, 1 node

Splits: 66 total, 66 done (100.00%)

0:05 [18 rows, 617B] [3 rows/s, 132B/s]

Run another query with summary function

select dname,count(*) number_of_employees

from hive.scott.emp e join mysql.iub.dept d

on cast(e.deptno as integer) = d.deptno

group by dname

order by dname

presto> select dname,count(*) number_of_employees

-> from hive.scott.emp e join mysql.iub.dept d

-> on cast(e.deptno as integer) = d.deptno

-> group by dname

-> order by dname;

dname | number_of_employees

------------+---------------------

ACCOUNTING | 3

RESEARCH | 5

SALES | 6

(3 rows)

Query 20170616_145245_00014_mgvsh, FINISHED, 1 node

Splits: 99 total, 99 done (100.00%)

0:05 [18 rows, 617B] [3 rows/s, 133B/s]

Using PostgreSQL Connector

The PostgreSQL connector allows querying and creating tables in an external PostgreSQL database. This can be used to join data between different systems like PostgreSQL and

Hive, or between two different PostgreSQL instances.
Check PostgreSQL

I've Ambari setup containing postgreSQL, first check that PostgreSQL is running fine and play with some commands.
1- Connct with PostgreSQL owner and connect to its using psql utility
[root@te1-hdp-rp-en01 ~]# su - postgres

-bash-4.2$ psql

postgres=# ALTER USER postgres WITH PASSWORD 'postgres';

2- List the databases

postgres=# \list
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
ambari | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres +
| | | | | postgres=CTc/postgres+
| | | | | ambari=CTc/postgres
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)

3- Quit from psql

ctrl+d
postgres=# \q

4-

postgres=# SELECT version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.13 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
(1 row)

5- List of existing users in DB

postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+-----------
ambari | | {}
postgres | Superuser, Create role, Create DB, Replication | {}

postgres=# SELECT usename FROM pg_user;

usename
----------
postgres
ambari

6- Connect with specific user , ambari user password is bigdata

-bash-4.2$ psql -U ambari

7- Downloaded pgadmin3 to connect with ambari database for Ambari UI on EN01 node if required.

8- Few more tests

ambari=> CREATE SCHEMA test;
ambari=> CREATE USER xxx PASSWORD 'yyy'
postgres=# CREATE USER xxx PASSWORD 'yyy';
ambari=> GRANT ALL ON SCHEMA test TO xxx;
ambari=> GRANT ALL ON ALL TABLES IN SCHEMA test TO xxx;
ambari=> CREATE TABLE test.test (coltest varchar(20));
ambari=> insert into test.test (coltest) values ('It works!');

ambari=> SELECT * from test.test;

Configuration

1- create a catalog properties (postgresql.properties) file in etc/catalog named, on Presto Coordinator's (EN01) location /root/.prestoadmin/catalog

connector.name=postgresql
connection-url=jdbc:postgresql://en01:5432/ambari
connection-user=ambari
connection-password=bigdata

2- Check the status and restart to take effect for new config done above

[root@te1-hdp-rp-en01 ~]# su - presto

[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin server status -p presto

[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin topology show

[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin catalog add postgresql -p presto

[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin server restart -p presto

3- Connect with particular schema of particular catalog

[presto@te1-hdp-rp-en01 ~]$ /usr/hadoopsw/prestoadmin/prestocli --server te1-hdp-rp-en01:6060 --catalog postgresql --schema test

presto:test> select count(*) from postgresql.test.test;

4- Connect with presto server and query any catalog

[presto@te1-hdp-rp-en01 ~]$ /usr/hadoopsw/prestoadmin/prestocli --server te1-hdp-rp-en01:6060

presto> select count(*) from postgresql.test.test;

-- Query 2 catalogs at the same time

presto> select count(*) from postgresql.test.test,hive.flume.presto_vw_syslogs;

DBMentors - Inam Bukhari's Blog

Pages

Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Friday, June 23, 2017