Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Friday, June 23, 2017

Hive Streaming



Streaming offers an alternative way to transform data. During a streaming job, the Hadoop 
Streaming API opens an I/O pipe to an external process. Data is then passed to 
the process, which operates on the data it reads from the standard input and writes the 
results out through the standard output, and back to the Streaming API job.


Identity Transformation
The most basic streaming job is an identity operation. The /bin/cat command echoes
the data sent to it.

hive (scott)> SELECT TRANSFORM (ename, sal) USING '/bin/cat' AS (newEname, newSal) FROM scott.emp;

Changing Types
The return columns from TRANSFORM are typed as strings, by default. There is an alternative syntax that casts the results to different types.

hive (scott)> SELECT TRANSFORM (ename, sal) USING '/bin/cat' AS (newEname STRING, newSal DOUBLE) FROM scott.emp;
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 2.33 sec   HDFS Read: 5439 HDFS Write: 435 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 330 msec
OK
newEname        newSal
SMITH   800.0
ALLEN   1600.0
WARD    1250.0
JONES   2975.0
MARTIN  1250.0
BLAKE   2850.0
CLARK   2450.0
SCOTT   3000.0
KING    5000.0
TURNER  1500.0
ADAMS   1100.0
JAMES   950.0
FORD    3000.0
MILLER  1300.0
Time taken: 20.392 seconds, Fetched: 14 row(s)


No comments: