Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Friday, June 23, 2017

Working with PrestoDB Connectors



Prerequisite:
Complete my previous post Installing/Configuring PrestoDB


Presto enables you to connect to other databases using some connector, in order to perform queries and joins over several sources providing metadata and data for queries. In this post we will work with some connectors. A coordinator (a master daemon) uses connectors to get metadata (such as table schema) that is needed to build a query plan. Workers use connectors to get actual data that will be processed by them.









Using mysql Connector

The MySQL connector allows querying and creating tables in an external MySQL database. This can be used to join data between different systems like MySQL and Hive, or between two different MySQL instances. Remember mysql connector does not support the following queries

create,insert,update,delete,drop

Installing mysql (if not already installed)
1- Download MySQL 5.7 repo file :

[root@vbgeneric bin]# wget https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
--2017-06-15 22:20:40--  https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
Resolving dev.mysql.com (dev.mysql.com)... 137.254.60.11
Connecting to dev.mysql.com (dev.mysql.com)|137.254.60.11|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://repo.mysql.com//mysql57-community-release-el7-8.noarch.rpm [following]
--2017-06-15 22:20:41--  https://repo.mysql.com//mysql57-community-release-el7-8.noarch.rpm
Resolving repo.mysql.com (repo.mysql.com)... 104.117.138.198
Connecting to repo.mysql.com (repo.mysql.com)|104.117.138.198|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9116 (8.9K) [application/x-redhat-package-manager]
Saving to: ‘mysql57-community-release-el7-8.noarch.rpm’

100%[===========================================================================>] 9,116       --.-K/s   in 0s

2017-06-15 22:20:42 (22.7 MB/s) - ‘mysql57-community-release-el7-8.noarch.rpm’ saved [9116/9116]

2- Install MySQL 5.7 repo file :
[root@vbgeneric bin]# rpm -ivh mysql57-community-release-el7-8.noarch.rpm
warning: mysql57-community-release-el7-8.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:mysql57-community-release-el7-8  ################################# [100%]

3- Install MySQL 5.7 database server :

[root@vbgeneric bin]# rpm -ivh mysql57-community-release-el7-8.noarch.rpm
warning: mysql57-community-release-el7-8.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:mysql57-community-release-el7-8  ################################# [100%]
[root@vbgeneric bin]# yum install mysql-server -y
Loaded plugins: langpacks, ulninfo
mysql-connectors-community                                                                    | 2.5 kB  00:00:00
mysql-tools-community                                                                         | 2.5 kB  00:00:00
mysql57-community                                                                             | 2.5 kB  00:00:00
(1/3): mysql-connectors-community/x86_64/primary_db                                           |  14 kB  00:00:00
(2/3): mysql-tools-community/x86_64/primary_db                                                |  33 kB  00:00:00
(3/3): mysql57-community/x86_64/primary_db                                                    | 106 kB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package mysql-community-server.x86_64 0:5.7.18-1.el7 will be installed
--> Processing Dependency: mysql-community-common(x86-64) = 5.7.18-1.el7 for package: mysql-community-server-5.7.18-1.el7.x86_64

....
....
Verifying  : mysql-community-common-5.6.23-3.el7.x86_64                                                        7/7

Installed:
  mysql-community-server.x86_64 0:5.7.18-1.el7

Dependency Installed:
  mysql-community-client.x86_64 0:5.7.18-1.el7           mysql-community-libs-compat.x86_64 0:5.7.18-1.el7

Dependency Updated:
  mysql-community-common.x86_64 0:5.7.18-1.el7               mysql-community-libs.x86_64 0:5.7.18-1.el7

Complete!

4- Start MySQL server
[root@vbgeneric bin]# systemctl start mysqld


5- Connecting to the MySQL Server with the mysql Client

Once your MySQL server is up and running, you can connect to it as the superuser root with the mysql client.

[root@vbgeneric bin]# mysql -u root -p
Enter password:
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)

For installations using the MySQL Yum repository, MySQL SUSE repository, or RPM packages directly downloaded from Oracle, the generated root password is in the error log. View it with, for example, the following command:

[root@vbgeneric bin]# grep 'temporary password' /var/log/mysqld.log

2017-06-16T02:31:25.052590Z 1 [Note] A temporary password is generated for root@localhost: -Ekw6uk3R;Yp

Now use the above password

[root@vbgeneric bin]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.18

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>


Reset your root password with the following statement because until you reset your root password, you will not be able to exercise any of the superuser privileges, even if you are logged in as root.

mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'yourPassword';

Query OK, 0 rows affected (0.00 sec)

6- Create Database and objects in MySQL Server


mysql> create database iub;
mysql> use iub;
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
iub                |
| mysql              |
| performance_schema |
| sys                |
+--------------------+

mysql> create table friend(frnd_id int not null, frnd_name varchar(50),category varchar(100));
mysql> insert into friend values(1,'Zeeshan','SQL DBA') ;
mysql> insert into friend values(2,'Saeed','Java') ;
mysql> insert into friend values(3,'Ali','Oracle DBA');
mysql> insert into friend values(4,'Tanveer','Telend');
mysql> select * from friend;
+---------+-----------+------------+
| frnd_id | frnd_name | category   |
+---------+-----------+------------+
|       1 | Zeeshan   | SQL DBA    |
|       2 | Saeed     | Java       |
|       3 | Ali       | Oracle DBA |
|       4 | Tanveer   | Telend     |
+---------+-----------+------------+

Configure mysql Connector

To configure the MySQL connector, create a catalog properties file in etc/catalog named, for example, mysql.properties, to mount the MySQL connector as the mysql catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

[presto@vbgeneric bin]$ vi $PRESTO_HOME/etc/catalog/mysql.properties
[presto@vbgeneric bin]$ cat $PRESTO_HOME/etc/catalog/mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://localhost:3306
connection-user=root
connection-password=P@ssw0rd

Connect Presto CLI
Let’s connect Mysql storage plugin to Presto server using command line options. Here "iub" refers to schema in mysql server.

[presto@vbgeneric bin]$ prestocli --server localhost:7070 --catalog mysql --schema iub
presto:iub>

List out all the schemas in mysql

presto:iub> show schemas from mysql;
       Schema
--------------------
 information_schema
 iub
 performance_schema
 sys
(4 rows)

Query 20170616_041856_00003_prm89, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:02 [4 rows, 62B] [1 rows/s, 28B/s]


List Tables from Schema
presto:iub> show tables from mysql.iub;
Table
--------
friend
(1 row)


Query 20170616_042135_00004_prm89, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:02 [1 rows, 19B] [0 rows/s, 8B/s]

Describe Tablepresto:iub> describe mysql.iub.friend;
  Column   |     Type     | Extra | Comment
-----------+--------------+-------+---------
 frnd_id   | integer      |       |
 frnd_name | varchar(50)  |       |
 category  | varchar(100) |       |
(3 rows)

Query 20170616_042312_00006_prm89, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:03 [3 rows, 198B] [1 rows/s, 74B/s]

Show Columns from Table
presto:iub> show columns from mysql.iub.friend;

Access Table Records
presto:iub> select * from mysql.iub.friend;
 frnd_id | frnd_name |  category
---------+-----------+------------
       1 | Zeeshan   | SQL DBA
       2 | Saeed     | Java
       3 | Ali       | Oracle DBA
       4 | Tanveer   | Telend
(4 rows)

Query 20170616_042606_00008_prm89, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:03 [4 rows, 0B] [1 rows/s, 0B/s]

Create Table Using as CommandMysql connector doesn’t support create table query but you can create a table using as command.
presto:iub> create table mysql.iub.frnd2 as select * from mysql.iub.friend;
CREATE TABLE: 4 rows

Query 20170616_042826_00009_prm89, FINISHED, 1 node
Splits: 35 total, 35 done (100.00%)
0:04 [4 rows, 0B] [0 rows/s, 0B/s]

presto:iub> quit;

Using JMX Connector


The JMX connector provides the ability to query JMX information from all nodes in a Presto cluster. This is very useful for monitoring or debugging. Java Management Extensions (JMX) provides information about the Java Virtual Machine and all of the software running inside it. Presto itself is heavily instrumented via JMX.


This connector can also be configured so that chosen JMX information will be periodically dumped and stored in memory for later access.

As we have already enabled “jmx.properties” file under “etc/catalog”directory. Now connect Prest CLI to enable JMX plugin.

[presto@vbgeneric bin]$ prestocli --server localhost:7070 --catalog jmx --schema jmx

List out all the schemas in jmx
presto:jmx> show schemas from jmx;
       Schema
--------------------
 current
 history
 information_schema
(3 rows)

presto:jmx> show tables from jmx.current;
                                                                      Table
----------------------------------------------------------------------------------------------------------
 com.facebook.presto.execution.executor:name=taskexecutor
 com.facebook.presto.execution.scheduler:name=nodescheduler
 com.facebook.presto.execution.scheduler:name=splitschedulerstats
 com.facebook.presto.execution:name=queryexecution
 com.facebook.presto.execution:name=querymanager
....

presto:jmx> select * from jmx.current."com.facebook.presto.execution.executor:name=taskexecutor";
                 node                 | blockedquantawalltime.alltime.count | blockedquantawalltime.alltim
--------------------------------------+-------------------------------------+-----------------------------
 ffffffff-ffff-ffff-ffff-ffffffffffff |                               403.0 |                        24413
(1 row)


Using HIVE Connector

The Hive connector allows querying data stored in a Hive data warehouse. The Hive metastore service (default port 9083) stores the metadata for Hive tables and partitions in a relational database, and provides clients (including Hive) access to this information using the metastore service API. Presto uses Hive metastore service to get the hive table’s details. It does not use HiveQL or any part of Hive’s execution environment.


The following file types are supported for the Hive connector:
ORC,Parquet,Avro,RCFile,SequenceFile,JSON,Text


[hdpsysuser@vbgeneric ~]$ hive --service metastore
which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/java/default/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/apache-hive-2.1.1-bin/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin:/usr/hadoopsw/apache-flume-1.7.0-bin/bin)
Starting Hive Metastore Server

Configuring Hive Connector
[presto@vbgeneric bin]$ vi $PRESTO_HOME/etc/catalog/hive.properties
[presto@vbgeneric bin]$ cat $PRESTO_HOME/etc/catalog/hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083

Check hive tables from hive prompt
I've already created hive database and tables which we will query from presto, but before that check from hive itself.

hive> use flume;
hive> show tables;
OK
raw_tweets
tweets_data2
twitterdata
vw_parsed_tweets
Time taken: 0.272 seconds, Fetched: 4 row(s)

hive> select * from raw_tweets limit 1;
OK
{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:39 +0000 2017","in_reply_to_user_id_str":null,"source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone<\/a>","retweeted_status":{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"coordinates":null,"created_at":"Sun Jun 11 10:16:58 +0000 2017","truncated":false,"in_reply_to_user_id_str":null,"source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone<\/a>","retweet_count":2,"retweeted":false,"geo":null,"filter_level":"low","in_reply_to_screen_name":null,"is_quote_status":false,"entities":{"urls":[],"hashtags":[{"indices":[74,95],"text":"شعب_قطر_في_قلب_سلمان"},{"indices":[96,117],"text":"_قطع_العلاقات_مع_قطر"}],"user_mentions":[],"symbols":[]},"id_str":"873846512484208640","in_reply_to_user_id":null,"favorite_count":5,"id":873846512484208640,"text":"UAE : Trouble maker  \nSaudi : Support UAE but not sure \nBahrain : Retweet #شعب_قطر_في_قلب_سلمان #_قطع_العلاقات_مع_قطر","place":null,"contributors":null,"lang":"en","user":{"utc_offset":10800,"friends_count":306,"profile_image_url_https":"https://pbs.twimg.com/profile_images/615523415617712128/DMuJuuUu_normal.jpg","listed_count":0,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","default_profile_image":false,"favourites_count":483,"description":"ربِ بي منك فضلٌ ونعمه عطاً غير مجذوذ ولا منقوص","created_at":"Sun Jul 29 17:41:11 +0000 2012","is_translator":false,"profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","protected":false,"screen_name":"im_aisha92","id_str":"724360172","profile_link_color":"1DA1F2","id":724360172,"geo_enabled":true,"profile_background_color":"C0DEED","lang":"ar","profile_sidebar_border_color":"C0DEED","profile_text_color":"333333","verified":false,"profile_image_url":"http://pbs.twimg.com/profile_images/615523415617712128/DMuJuuUu_normal.jpg","time_zone":"Kuwait","url":null,"contributors_enabled":false,"profile_background_tile":false,"profile_banner_url":"https://pbs.twimg.com/profile_banners/724360172/1446770569","statuses_count":2895,"follow_request_sent":null,"followers_count":187,"profile_use_background_image":true,"default_profile":true,"following":null,"name":"Alsulaiti Aisha","location":"Cardiff-UK","profile_sidebar_fill_color":"DDEEF6","notifications":null},"favorited":false},"retweet_count":0,"retweeted":false,"geo":null,"filter_level":"low","in_reply_to_screen_name":null,"is_quote_status":false,"id_str":"874026367360409600","in_reply_to_user_id":null,"favorite_count":0,"id":874026367360409600,"text":"RT @im_aisha92: UAE : Trouble maker  \nSaudi : Support UAE but not sure \nBahrain : Retweet #شعب_قطر_في_قلب_سلمان #_قطع_العلاقات_مع_قطر","place":null,"lang":"en","favorited":false,"coordinates":null,"truncated":false,"timestamp_ms":"1497219099260","entities":{"urls":[],"hashtags":[{"indices":[90,111],"text":"شعب_قطر_في_قلب_سلمان"},{"indices":[112,133],"text":"_قطع_العلاقات_مع_قطر"}],"user_mentions":[{"indices":[3,14],"screen_name":"im_aisha92","id_str":"724360172","name":"Alsulaiti Aisha","id":724360172}],"symbols":[]},"contributors":null,"user":{"utc_offset":null,"friends_count":8,"profile_image_url_https":"https://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png","listed_count":0,"profile_background_image_url":"","default_profile_image":false,"favourites_count":0,"description":null,"created_at":"Wed Jun 07 05:44:14 +0000 2017","is_translator":false,"profile_background_image_url_https":"","protected":false,"screen_name":"Reem01020398","id_str":"872328324122644482","profile_link_color":"1DA1F2","id":872328324122644482,"geo_enabled":false,"profile_background_color":"F5F8FA","lang":"en","profile_sidebar_border_color":"C0DEED","profile_text_color":"333333","verified":false,"profile_image_url":"http://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png","time_zone":null,"url":null,"contributors_enabled":false,"profile_background_tile":false,"statuses_count":65,"follow_request_sent":null,"followers_count":1,"profile_use_background_image":true,"default_profile":true,"following":null,"name":"Reem","location":null,"profile_sidebar_fill_color":"DDEEF6","notifications":null}}
Time taken: 11.043 seconds, Fetched: 1 row(s)
hive>


Start Presto CLI

Start Presto CLI to connect Hive storage plugin using the following command.
[presto@vbgeneric bin]$ prestocli --server localhost:7070 --catalog hive --schema flume

presto:flume> show schemas from hive;
       Schema
--------------------
 default
 flume
 information_schema
(3 rows)

presto:flume> show tables from hive.flume;
      Table
------------------
 raw_tweets
 tweets_data2
 twitterdata
 vw_parsed_tweets
(4 rows)

presto:flume> select * from hive.flume.raw_tweets;

Query 20170616_051901_00005_aceay, RUNNING, 1 node, 16 splits
0:19 [    0 rows,     0B] [    0 rows/s,     0B/s] [                   <=>                    ]

     STAGES   ROWS  ROWS/s  BYTES  BYTES/s  QUEUED    RUN   DONE
0.........R      0       0     0B       0B       0      0      0
  1.......S      0       0     0B       0B       0      0      0


----------------------------------------------------------------------------------------------------------

 {"quoted_status":{"extended_tweet":{"extended_entities":{"media":[{"display_url":"pic.twitter.com/iG9xY6R

 {"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:50 +0000 20

 {"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:50 +0000 20

 {"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:50 +0000 20

 {"quoted_status":{"extended_tweet":{"entities":{"urls":[{"display_url":"twitter.com/ajenglish/stat\u2026"

 {"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:51 +0000 20

...

..


Test Performance
Run the below query both in hive and presto and observe the response time. I'm running Oracle VM with one node cluster for this test. Both hive and presto are running on the same VM.

presto:flume>  select count(*) from raw_tweets;
 _col0
-------
   428
(1 row)

Query 20170616_101712_00002_g7yqb, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:20 [428 rows, 2.4MB] [21 rows/s, 125KB/s]





Using System Connector

The System connector provides information and metrics about the currently running Presto cluster. It makes this available via normal SQL queries. The System connector doesn’t need to be configured: it is automatically available via a catalog named system.

List the available system schemas:

presto:flume> SHOW SCHEMAS FROM system;

       Schema

--------------------

 information_schema

 jdbc

 metadata

 runtime

(4 rows)


List the tables in one of the schemas:

presto:flume> SHOW TABLES FROM system.runtime;
    Table
--------------
 nodes
 queries
 tasks
 transactions
(4 rows)

Query one of the tables:

presto:flume> SELECT * FROM system.runtime.nodes;

               node_id                |          http_uri          | node_version | coordinator | state

--------------------------------------+----------------------------+--------------+-------------+--------

 ffffffff-ffff-ffff-ffff-ffffffffffff | http://192.168.56.101:7070 | 0.179        | true        | active

(1 row)


Kill a running query:
presto:flume> CALL system.runtime.kill_query('20151207_215727_00146_tx3nr');
CALL


System Connector Tables

metadata.catalogsThe catalogs table contains the list of available catalogs.

metadata.schema_propertiesThe schema properties table contains the list of available properties that can be set when creating a new schema.

metadata.table_propertiesThe table properties table contains the list of available properties that can be set when creating a new table.

runtime.nodesThe nodes table contains the list of visible nodes in the Presto cluster along with their status.

runtime.queriesThe queries table contains information about currently and recently running queries on the Presto cluster. From this table you can find out the original query text (SQL), the identity of the user who ran the query and performance information about the query including how long the query was queued and analyzed.

runtime.tasksThe tasks table contains information about the tasks involved in a Presto query including where they were executed and and how many rows and bytes each task processed.

runtime.transactionsThe transactions table contains the list of currently open transactions and related metadata. This includes information such as the create time, idle time, initialization parameters, and accessed catalogs.

Using Local File Connector

The local file connector allows querying data stored on the local file system of each worker. To configure the local file connector, create a catalog properties file under etc/catalog named, for example, localfile.properties with the following contents:


[presto@vbgeneric ~]$ mkdir /data/presto_data/localfiles

[presto@vbgeneric ~]$ vi $PRESTO_HOME/etc/catalog/localfile.properties
[presto@vbgeneric ~]$ cat $PRESTO_HOME/etc/catalog/localfile.properties
connector.name=localfile
presto-logs.http-request-log.location=/data/presto_data/var/log/http-request.log


The local file connector provides a single schema named logs and this connector can only process the http log format created by Presto itself.

[presto@vbgeneric ~]$ prestocli --server localhost:7070 --catalog localfile --schema logs
presto:logs> show tables;
      Table
------------------
 http_request_log
(1 row)

presto:logs> describe http_request_log;
      Column       |   Type    | Extra | Comment
-------------------+-----------+-------+---------
 server_address    | varchar   |       |
 timestamp         | timestamp |       |
 client_address    | varchar   |       |
 method            | varchar   |       |
 request_uri       | varchar   |       |
 user              | varchar   |       |
 agent             | varchar   |       |
 response_code     | bigint    |       |
 request_size      | bigint    |       |
 response_size     | bigint    |       |
 time_to_last_byte | bigint    |       |
 trace_token       | varchar   |       |
(12 rows)

presto:logs> select method,user,agent from http_request_log;
 method | user |                                                    agent
--------+------+-----------------------------------------------------------------------------------------------------
 GET    | NULL | ffffffff-ffff-ffff-ffff-ffffffffffff
 GET    | NULL | ffffffff-ffff-ffff-ffff-ffffffffffff
 POST   | NULL | NULL
 GET    | NULL | NULL
 POST   | NULL | NULL
 POST   | NULL | NULL
 POST   | NULL | NULL
 GET    | NULL | NULL
 POST   | NULL | NULL
 GET    | NULL | ffffffff-ffff-ffff-ffff-ffffffffffff
 GET    | NULL | ffffffff-ffff-ffff-ffff-ffffffffffff
 POST   | NULL | NULL
 GET    | NULL | NULL
 POST   | NULL | NULL
 POST   | NULL | NULL
 POST   | NULL | NULL
 GET    | NULL | NULL
 GET    | NULL | Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safar
...
...

Using Hive and mysql as combined data source

Prepare data in mysql

Create table in mysql database eg; iub.dept
mysql> CREATE  TABLE iub.dept( deptno INT,  dname  VARCHAR(10),  loc    varchar(10));

load data into mysql table

insert into iub.dept values (10,'ACCOUNTING','NEW YORK');
insert into iub.dept values (20,'RESEARCH','DALLAS');
insert into iub.dept values (30,'SALES','CHICAGO');
insert into iub.dept values (40,'OPERATIONS','BOSTON');

mysql> select * from iub.dept;
+--------+------------+----------+
| deptno | dname      | loc      |
+--------+------------+----------+
|     10 | ACCOUNTING | NEW YORK |
|     20 | RESEARCH   | DALLAS   |
|     30 | SALES      | CHICAGO  |
|     40 | OPERATIONS | BOSTON   |
+--------+------------+----------+
4 rows in set (0.00 sec)

Prepare data in hive

Create emp table in hive database eg; scott.emp

create table scott.emp(  empno    STRING,  ename    STRING,  job      STRING,
  mgr      STRING,  hiredate STRING,  sal      STRING,
  comm     STRING,  deptno   STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

load data into hive table

emp.csv
7369,SMITH,CLERK,7902,17-DEC-80,800,,20
7499,ALLEN,SALESMAN,7698,20-FEB-81,1600,300,30
7521,WARD,SALESMAN,7698,22-FEB-81,1250,500,30
7566,JONES,MANAGER,7839,02-APR-81,2975,,20
7654,MARTIN,SALESMAN,7698,28-SEP-81,1250,1400,30
7698,BLAKE,MANAGER,7839,01-MAY-81,2850,,30
7782,CLARK,MANAGER,7839,09-JUN-81,2450,,10
7788,SCOTT,ANALYST,7566,19-APR-87,3000,,20
7839,KING,PRESIDENT,,17-NOV-81,5000,,10
7844,TURNER,SALESMAN,7698,08-SEP-81,1500,0,30
7876,ADAMS,CLERK,7788,23-MAY-87,1100,,20
7900,JAMES,CLERK,7698,03-DEC-81,950,,30
7902,FORD,ANALYST,7566,03-DEC-81,3000,,20
7934,MILLER,CLERK,7782,23-JAN-82,1300,,10

LOAD DATA LOCAL INPATH '/tmp/mydata/emp.csv' OVERWRITE INTO TABLE scott.emp;

Process Data 
As we have data in both databases, we are ready to process data through hive and mysql catalogs which have been defined earlier.
With Presto, to specify a table that you wish to process, we use the catalog.database.table naming convention e.g. mysql.iub.dept or hive.scott.emp.

Connect with presto CLI

[presto@vbgeneric ~]$ prestocli --server localhost:7070
presto>

presto> SHOW TABLES FROM mysql.iub;
 Table
--------
 dept
 friend
 frnd2
(3 rows)

presto> SHOW TABLES FROM hive.scott;
 Table
-------
 emp
(1 row)

Run the join query now 

presto> SELECT d.dname,e.ename from mysql.iub.dept d JOIN hive.scott.emp e ON d.deptno = e.deptno;
Query 20170616_143710_00007_mgvsh failed: line 1:80: '=' cannot be applied to integer, varchar
SELECT d.dname,e.ename from mysql.iub.dept d JOIN hive.scott.emp e ON d.deptno = e.deptno

Query failed because in hive emp table has deptno column as STRING while in mysql deptno column is defined as INT. So we cast in presto and run the query again.

presto> SELECT d.dname,e.ename from mysql.iub.dept d JOIN hive.scott.emp e ON d.deptno = cast(e.deptno as integer)
     -> ;
   dname    | ename
------------+--------
 ACCOUNTING | MILLER
 ACCOUNTING | KING
 ACCOUNTING | CLARK
 RESEARCH   | FORD
 RESEARCH   | ADAMS
 RESEARCH   | SCOTT
 RESEARCH   | JONES
 RESEARCH   | SMITH
 SALES      | JAMES
 SALES      | TURNER
 SALES      | BLAKE
 SALES      | MARTIN
 SALES      | WARD
 SALES      | ALLEN
(14 rows)

Query 20170616_144316_00008_mgvsh, FINISHED, 1 node
Splits: 66 total, 66 done (100.00%)
0:05 [18 rows, 617B] [3 rows/s, 132B/s]

Run another query with summary function

select dname,count(*)  number_of_employees
from hive.scott.emp e join mysql.iub.dept d
on cast(e.deptno as integer) = d.deptno
group by dname
order by dname

presto> select dname,count(*)  number_of_employees
     -> from hive.scott.emp e join mysql.iub.dept d
     -> on cast(e.deptno as integer) = d.deptno
     -> group by dname
     -> order by dname;
   dname    | number_of_employees
------------+---------------------
 ACCOUNTING |                   3
 RESEARCH   |                   5
 SALES      |                   6
(3 rows)

Query 20170616_145245_00014_mgvsh, FINISHED, 1 node
Splits: 99 total, 99 done (100.00%)
0:05 [18 rows, 617B] [3 rows/s, 133B/s]


Using PostgreSQL Connector

The PostgreSQL connector allows querying and creating tables in an external PostgreSQL database. This can be used to join data between different systems like PostgreSQL and 
Hive, or between two different PostgreSQL instances.
Check PostgreSQL

I've Ambari setup containing postgreSQL, first check that PostgreSQL is running fine and play with some commands.
1- Connct with PostgreSQL owner and connect to its using psql utility
[root@te1-hdp-rp-en01 ~]# su - postgres

-bash-4.2$ psql

postgres=# ALTER USER postgres WITH PASSWORD 'postgres';

2- List the databases

postgres=# \list
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 ambari    | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres         +
           |          |          |             |             | postgres=CTc/postgres+
           |          |          |             |             | ambari=CTc/postgres
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(4 rows)

3- Quit from psql

ctrl+d
postgres=# \q

4-

postgres=# SELECT version();
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.13 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
(1 row)

5- List of existing users in DB 

postgres=# \du
                             List of roles
 Role name |                   Attributes                   | Member of
-----------+------------------------------------------------+-----------
 ambari    |                                                | {}
 postgres  | Superuser, Create role, Create DB, Replication | {}


postgres=# SELECT usename FROM pg_user;

 usename
----------
 postgres
 ambari

6- Connect with specific user , ambari user password is bigdata

-bash-4.2$ psql -U ambari


7- Downloaded pgadmin3 to connect with ambari database for Ambari UI on EN01 node if required.

8- Few more tests

ambari=> CREATE SCHEMA test;
ambari=> CREATE USER xxx PASSWORD 'yyy'
postgres=# CREATE USER xxx PASSWORD 'yyy';
ambari=> GRANT ALL ON SCHEMA test TO xxx;
ambari=> GRANT ALL ON ALL TABLES IN SCHEMA test TO xxx;
ambari=> CREATE TABLE test.test (coltest varchar(20));
ambari=> insert into test.test (coltest) values ('It works!');

ambari=> SELECT * from test.test;

Configuration


1- create a catalog properties (postgresql.properties) file in etc/catalog named, on Presto Coordinator's (EN01) location /root/.prestoadmin/catalog

connector.name=postgresql
connection-url=jdbc:postgresql://en01:5432/ambari
connection-user=ambari
connection-password=bigdata

2-  Check the status and restart to take effect for new config done above

[root@te1-hdp-rp-en01 ~]# su - presto

[presto@te1-hdp-rp-en01 ~]$  sudo /usr/hadoopsw/prestoadmin/presto-admin server status -p presto

[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin topology show

[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin catalog add postgresql -p presto

[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin server restart -p presto

3- Connect with particular schema of particular catalog

[presto@te1-hdp-rp-en01 ~]$ /usr/hadoopsw/prestoadmin/prestocli --server te1-hdp-rp-en01:6060 --catalog postgresql --schema test

presto:test> select count(*) from postgresql.test.test;

4- Connect with presto server and query any catalog

[presto@te1-hdp-rp-en01 ~]$ /usr/hadoopsw/prestoadmin/prestocli --server te1-hdp-rp-en01:6060

presto> select count(*) from postgresql.test.test;

-- Query 2 catalogs at the same time

presto> select count(*) from postgresql.test.test,hive.flume.presto_vw_syslogs;


No comments: