Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Monday, December 01, 2014

Exadata: Using Hybrid Columnar Compression

Scenario:

We have a table TRANSACTION_LOG with more than 10 million rows , estimated size of table is 640MB. We want to  examine the performance of Exadata Hybrid Columnar Compression and want compare predicted and actual compression ratios using an example dataset, so that we can use HCC for our logging tables. We also want to examine how bulk data loading and query operations are affected using the different compression modes.

Predicting Compression Ratio
Steps
1- Connect with your DB
C:\Users\ibukhary>sqlplus system/manager@iubdbt11
SQL> set serveroutput on
SQL> set timing on


2- Predict the expected compression ratio for the TRANSACTION_LOG table using all the different Exadata Hybrid Columnar Compression modes
Query Low
SQL> DECLARE
  2     b_cmp       NUMBER;
  3     b_ucmp      NUMBER;
  4     r_cmp       NUMBER;
  5     r_ucmp      NUMBER;
  6     cmp_ratio   NUMBER (6, 2);
  7     cmp_type    VARCHAR2 (1024);
  8  BEGIN
  9     dbms_compression.
 10      get_compression_ratio ('SYSTEM', -- Tablespace
 11                             'SYSTEM', -- Schema
 12                             'TRANSACTION_LOG', -- Table
 13                             NULL,
 14                             DBMS_COMPRESSION.COMP_FOR_QUERY_LOW,
 15                             b_cmp,
 16                             b_ucmp,
 17                             r_cmp,
 18                             r_ucmp,
 19                             cmp_ratio,
 20                             cmp_type);
 21     DBMS_OUTPUT.put_line ('Table: MYCUSTOMERS');
 22     DBMS_OUTPUT.put_line ('Compression Ratio: ' || cmp_ratio);
 23     DBMS_OUTPUT.put_line ('Compression Type: ' || cmp_type);
 24  END;
 25  /
Compression Advisor self-check validation successful. select count(*) on both
Uncompressed and EHCC Compressed format = 1000001 rows
Table: TRANSACTION_LOG
Compression Ratio: 7.8
Compression Type: "Compress For Query Low"

PL/SQL procedure successfully completed.


Query High
Elapsed: 00:00:16.25

SQL>

SQL> DECLARE
  2     b_cmp       NUMBER;
  3     b_ucmp      NUMBER;
  4     r_cmp       NUMBER;
  5     r_ucmp      NUMBER;
  6     cmp_ratio   NUMBER (6, 2);
  7     cmp_type    VARCHAR2 (1024);
  8  BEGIN
  9     dbms_compression.
 10      get_compression_ratio ('SYSTEM',
 11                             'SYSTEM',
 12                             'TRANSACTION_LOG',
 13                             NULL,
 14                             DBMS_COMPRESSION.COMP_FOR_QUERY_HIGH,
 15                             b_cmp,
 16                             b_ucmp,
 17                             r_cmp,
 18                             r_ucmp,
 19                             cmp_ratio,
 20                             cmp_type);
 21     DBMS_OUTPUT.put_line ('Table: TRANSACTION_LOG');
 22     DBMS_OUTPUT.put_line ('Compression Ratio: ' || cmp_ratio);
 23     DBMS_OUTPUT.put_line ('Compression Type: ' || cmp_type);
 24  END;
 25  /
Compression Advisor self-check validation successful. select count(*) on both
Uncompressed and EHCC Compressed format = 1000001 rows
Table: TRANSACTION_LOG
Compression Ratio: 135.8
Compression Type: "Compress For Query High"

PL/SQL procedure successfully completed.

Elapsed: 00:00:16.24
SQL>

Archive Low

SQL> DECLARE
  2     b_cmp       NUMBER;
  3     b_ucmp      NUMBER;
  4     r_cmp       NUMBER;
  5     r_ucmp      NUMBER;
  6     cmp_ratio   NUMBER (6, 2);
  7     cmp_type    VARCHAR2 (1024);
  8  BEGIN
  9     dbms_compression.
 10      get_compression_ratio ('SYSTEM',
 11                             'SYSTEM',
 12                             'TRANSACTION_LOG',
 13                             NULL,
 14                             DBMS_COMPRESSION.COMP_FOR_ARCHIVE_LOW,
 15                             b_cmp,
 16                             b_ucmp,
 17                             r_cmp,
 18                             r_ucmp,
 19                             cmp_ratio,
 20                             cmp_type);
 21     DBMS_OUTPUT.put_line ('Table: TRANSACTION_LOG');
 22     DBMS_OUTPUT.put_line ('Compression Ratio: ' || cmp_ratio);
 23     DBMS_OUTPUT.put_line ('Compression Type: ' || cmp_type);
 24  END;
 25  /
Compression Advisor self-check validation successful. select count(*) on both
Uncompressed and EHCC Compressed format = 1000001 rows
Table: TRANSACTION_LOG
Compression Ratio: 135.8
Compression Type: "Compress For Archive Low"

PL/SQL procedure successfully completed.

Elapsed: 00:00:16.12
SQL>

Archive High

SQL> DECLARE
  2     b_cmp       NUMBER;
  3     b_ucmp      NUMBER;
  4     r_cmp       NUMBER;
  5     r_ucmp      NUMBER;
  6     cmp_ratio   NUMBER (6, 2);
  7     cmp_type    VARCHAR2 (1024);
  8  BEGIN
  9     dbms_compression.
 10      get_compression_ratio ('SYSTEM',
 11                             'SYSTEM',
 12                             'TRANSACTION_LOG',
 13                             NULL,
 14                             DBMS_COMPRESSION.COMP_FOR_ARCHIVE_HIGH,
 15                             b_cmp,
 16                             b_ucmp,
 17                             r_cmp,
 18                             r_ucmp,
 19                             cmp_ratio,
 20                             cmp_type);
 21     DBMS_OUTPUT.put_line ('Table: TRANSACTION_LOG');
 22     DBMS_OUTPUT.put_line ('Compression Ratio: ' || cmp_ratio);
 23     DBMS_OUTPUT.put_line ('Compression Type: ' || cmp_type);
 24  END;
 25  /
Compression Advisor self-check validation successful. select count(*) on both
Uncompressed and EHCC Compressed format = 1000001 rows
Table: TRANSACTION_LOG
Compression Ratio: 135.8
Compression Type: "Compress For Archive High"

PL/SQL procedure successfully completed.

Elapsed: 00:00:24.87
SQL>


3- HCC achieves its highest levels of compression with data that is direct-path inserted. Execute the following ALTER SESSION commands to ensure the use of direct-path inserts.

SQL> alter session force parallel query;

Session altered.

Elapsed: 00:00:00.05
SQL> alter session force parallel ddl;

Session altered.

Elapsed: 00:00:00.59
SQL>
SQL> alter session force parallel dml;

Session altered.

Elapsed: 00:00:00.04
SQL>

4- Create compressed copies of the TRANSACTION_LOG table. Notice the relative
difference in the time taken to create each table by using the different compression modes.

SQL> CREATE TABLE TRANSACTION_LOG_query_low
  2  COMPRESS FOR QUERY LOW
  3  NOLOGGING
  4  PARALLEL 4
  5  AS
  6     SELECT * FROM TRANSACTION_LOG;

Table created.


Elapsed: 00:00:16.21


SQL> CREATE TABLE TRANSACTION_LOG_query_high
  2  COMPRESS FOR QUERY HIGH
  3  NOLOGGING
  4  PARALLEL 4
  5  AS
  6     SELECT * FROM TRANSACTION_LOG;

Table created.


Elapsed: 00:00:20.16


SQL> CREATE TABLE TRANSACTION_LOG_archive_low
  2  COMPRESS FOR ARCHIVE LOW
  3  NOLOGGING
  4  PARALLEL 4
  5  AS
  6     SELECT * FROM TRANSACTION_LOG
  7  ;

Table created.


Elapsed: 00:00:19.91


SQL> CREATE TABLE TRANSACTION_LOG_archive_high
  2  COMPRESS FOR ARCHIVE HIGH
  3  NOLOGGING
  4  PARALLEL 4
  5  AS
  6     SELECT * FROM TRANSACTION_LOG;

Table created.


Elapsed: 00:00:57.89


4-  Compare the size of the original uncompressed table with the newly created compressed copies. Calculate the compression ratios achieved using the formula:
Compression Ratio = Uncompressed Size / Compressed Size

SQL> col segment_name format a30
  SELECT segment_name, SUM (bytes) / 1024 / 1024 MB
    FROM dba_segments
   WHERE segment_name LIKE 'TRANSACTION_LOG%'
GROUP BY segment_name;

SEGMENT_NAME MB
TRANSACTION_LOG_QUERY_HIGH
5.015625
TRANSACTION_LOG
640.2890625
TRANSACTION_LOG_ARCHIVE_HIGH
5.015625
TRANSACTION_LOG_QUERY_LOW
84.4296875
TRANSACTION_LOG_ARCHIVE_LOW
5.015625


Direct path insert performance
Steps
1- Compare direct path insert performance for compressed and uncompressed tables. Perform the same transaction twice to ensure consistent results. First check the uncompressed TRANSACTION_LOG table. Pay attention to Elapsed time.

SQL> insert /*+APPEND */ into  TRANSACTION_LOG select * from TRANSACTION_LOG where rownum <=300000;

300000 rows created.

Elapsed: 00:00:11.50
SQL> commit;

Commit complete.

Elapsed: 00:00:00.06
SQL> insert /*+APPEND */ into  TRANSACTION_LOG select * from TRANSACTION_LOG where rownum <=300000;

300000 rows created.

Elapsed: 00:00:05.67
SQL> commit;

Commit complete.

Elapsed: 00:00:00.04

2-  Execute the same insert transactions against the COMPRESS FOR QUERY LOW copy of the table. Note the time taken to perform the second insert. 

SQL> insert /*+APPEND */ into  TRANSACTION_LOG_query_low select * from TRANSACTION_LOG where rownum <=300000;

300000 rows created.

Elapsed: 00:00:00.69
SQL> commit;

Commit complete.

Elapsed: 00:00:00.60
SQL> insert /*+APPEND */ into  TRANSACTION_LOG_query_low select * from TRANSACTION_LOG where rownum <=300000;

300000 rows created.

Elapsed: 00:00:00.63


You may observe that the time for this insert is better than the uncompressed insert in step 1. In this case, the cost of performing the compression is offset by the lower number of I/O operations that are required. This characteristic is one of the reasons why query compression is well suited to data warehouse environments where large data loads exist.


3- Execute the same insert transactions against the COMPRESS FOR QUERY HIGH copy of the table. Note the time taken to perform the second insert and compare it with the previous results.

SQL> insert /*+APPEND */ into  TRANSACTION_LOG_query_high select * from TRANSACTION_LOG where rownum <=300000;

300000 rows created.

Elapsed: 00:00:00.76
SQL> commit;

Commit complete.

Elapsed: 00:00:00.05
SQL> insert /*+APPEND */ into  TRANSACTION_LOG_query_high select * from TRANSACTION_LOG where rownum <=300000;

300000 rows created.

Elapsed: 00:00:00.78


4- Execute the same insert transactions against the COMPRESS FOR ARCHIVE LOW copy of the table. Note the time taken to perform the second insert. You should observe that the
load times are steadily increasing as more aggressive compression modes are used.

SQL> insert /*+APPEND */ into  TRANSACTION_LOG_archive_low select * from TRANSACTION_LOG where rownum <=300000;

300000 rows created.

Elapsed: 00:00:00.73
SQL> commit;

Commit complete.

Elapsed: 00:00:00.04
SQL> insert /*+APPEND */ into  TRANSACTION_LOG_archive_low select * from TRANSACTION_LOG where rownum <=300000;

300000 rows created.

Elapsed: 00:00:01.34

5- Execute the insert transaction against the COMPRESS FOR ARCHIVE HIGH copy of the
table. Note the time taken to perform the second insert. This time you should observe a
more substantial cost for the data compression. This is because COMPRESS FOR ARCHIVE HIGH uses a more costly compression algorithm to achieve higher levels of compression. This extra cost is generally acceptable in archiving situations because the data does not change (or changes very little) after it is loaded.

SQL> insert /*+APPEND */ into  TRANSACTION_LOG_archive_high select * from TRANSACTION_LOG where rownum <=300000;

300000 rows created.

Elapsed: 00:00:01.76
SQL> commit;

Commit complete.

Elapsed: 00:00:00.05
SQL> insert /*+APPEND */ into  TRANSACTION_LOG_archive_high select * from TRANSACTION_LOG where rownum <=300000;

300000 rows created.

Elapsed: 00:00:01.74

Query Performance

Compare query performance for compressed and uncompressed tables. 

1- Reconnect to your database user. This clears the session-level statistics, which will be used later to compare query performance. Execute the following test query against the uncompressed table. Note the time taken to execute the query.

SQL> select avg(nin) from TRANSACTION_LOG;

  AVG(NIN)
----------
320442.292

Elapsed: 00:00:00.74

Examine the I/O statistics for the query you just ran
SQL> SELECT a.name, b.VALUE / 1024 / 1024 MB
  2    FROM v$sysstat a, v$mystat b
  3   WHERE a.statistic# = b.statistic#
  4         AND (a.name IN
  5                 ('physical read total bytes',
  6                  'physical write total bytes',
  7                  'cell IO uncompressed bytes')
  8              OR a.name LIKE 'cell phy%');

NAME                                                                     MB
---------------------------------------------------------------- ----------
physical read total bytes                                         536.28125
physical write total bytes                                                0
cell physical IO interconnect bytes                              123.009117
cell physical IO bytes saved during optimized file creation               0
cell physical IO bytes saved during optimized RMAN file restore           0
cell physical IO bytes eligible for predicate offload            532.695313
cell physical IO bytes saved by storage index                             0
cell physical IO bytes sent directly to DB node to balance CPU            0
cell physical IO interconnect bytes returned by smart scan        119.42318
cell IO uncompressed bytes                                       533.554688

10 rows selected.
Elapsed: 00:00:00.06

2- Execute the test query against the COMPRESS FOR QUERY LOW copy of the
table. Compare the time taken to execute the query with the query performance observed in
step 1.

SQL> select avg(nin) from TRANSACTION_LOG_query_low;

  AVG(NIN)
----------
320581.419

Elapsed: 00:00:03.92
SQL>     SELECT a.name, b.VALUE / 1024 / 1024 MB
  2        FROM v$sysstat a, v$mystat b
  3       WHERE a.statistic# = b.statistic#
  4             AND (a.name IN
  5                     ('physical read total bytes',
  6                      'physical write total bytes',
  7                      'cell IO uncompressed bytes')
  8                  OR a.name LIKE 'cell phy%');

NAME                                                                     MB
---------------------------------------------------------------- ----------
physical read total bytes                                        67.0234375
physical write total bytes                                                0
cell physical IO interconnect bytes                              67.0234375
cell physical IO bytes saved during optimized file creation               0
cell physical IO bytes saved during optimized RMAN file restore           0
cell physical IO bytes eligible for predicate offload                     0
cell physical IO bytes saved by storage index                             0
cell physical IO bytes sent directly to DB node to balance CPU            0
cell physical IO interconnect bytes returned by smart scan                0
cell IO uncompressed bytes                                                0

10 rows selected.

Elapsed: 00:00:00.08


3- Execute the test query against the COMPRESS FOR QUERY HIGH copy of the
table. Compare the time taken to execute the query with the query performance observed
previously.

SQL> select avg(nin) from TRANSACTION_LOG_query_high;

  AVG(NIN)
----------
320568.281

Elapsed: 00:00:00.31
SQL>     SELECT a.name, b.VALUE / 1024 / 1024 MB
  2        FROM v$sysstat a, v$mystat b
  3       WHERE a.statistic# = b.statistic#
  4             AND (a.name IN
  5                     ('physical read total bytes',
  6                      'physical write total bytes',
  7                      'cell IO uncompressed bytes')
  8                  OR a.name LIKE 'cell phy%');

NAME                                                                     MB
---------------------------------------------------------------- ----------
physical read total bytes                                           3.78125
physical write total bytes                                                0
cell physical IO interconnect bytes                                 3.78125
cell physical IO bytes saved during optimized file creation               0
cell physical IO bytes saved during optimized RMAN file restore           0
cell physical IO bytes eligible for predicate offload                     0
cell physical IO bytes saved by storage index                             0
cell physical IO bytes sent directly to DB node to balance CPU            0
cell physical IO interconnect bytes returned by smart scan                0
cell IO uncompressed bytes                                                0

10 rows selected.

Elapsed: 00:00:00.08

4- Execute the test query against the COMPRESS FOR ARCHIVE LOW copy of the table. Compare the time taken to execute the query with the query performance observed in previously.

SQL> select avg(nin) from TRANSACTION_LOG_archive_low;

  AVG(NIN)
----------
320568.525

Elapsed: 00:00:01.11

SQL>     SELECT a.name, b.VALUE / 1024 / 1024 MB
  2        FROM v$sysstat a, v$mystat b
  3       WHERE a.statistic# = b.statistic#
  4             AND (a.name IN
  5                     ('physical read total bytes',
  6                      'physical write total bytes',
  7                      'cell IO uncompressed bytes')
  8                  OR a.name LIKE 'cell phy%');

NAME                                                                     MB
---------------------------------------------------------------- ----------
physical read total bytes                                            3.8125
physical write total bytes                                                0
cell physical IO interconnect bytes                                  3.8125
cell physical IO bytes saved during optimized file creation               0
cell physical IO bytes saved during optimized RMAN file restore           0
cell physical IO bytes eligible for predicate offload                     0
cell physical IO bytes saved by storage index                             0
cell physical IO bytes sent directly to DB node to balance CPU            0
cell physical IO interconnect bytes returned by smart scan                0
cell IO uncompressed bytes                                                0

10 rows selected.

Elapsed: 00:00:00.05

5-  Execute the test query against the COMPRESS FOR ARCHIVE HIGH copy of the
table. Compare the time taken to execute the query with the query performance observed
previously. Note that in all cases, the queries against the compressed tables outperformed
the query against the uncompressed table. With compression, you will often observe
improved query performance for scanning queries because less I/O is required.

SQL> select avg(nin) from TRANSACTION_LOG_archive_high;

  AVG(NIN)
----------
320553.509

Elapsed: 00:00:00.36

SQL>     SELECT a.name, b.VALUE / 1024 / 1024 MB
  2        FROM v$sysstat a, v$mystat b
  3       WHERE a.statistic# = b.statistic#
  4             AND (a.name IN
  5                     ('physical read total bytes',
  6                      'physical write total bytes',
  7                      'cell IO uncompressed bytes')
  8                  OR a.name LIKE 'cell phy%');

NAME                                                                     MB
---------------------------------------------------------------- ----------
physical read total bytes                                         3.7421875
physical write total bytes                                                0
cell physical IO interconnect bytes                               3.7421875
cell physical IO bytes saved during optimized file creation               0
cell physical IO bytes saved during optimized RMAN file restore           0
cell physical IO bytes eligible for predicate offload                     0
cell physical IO bytes saved by storage index                             0
cell physical IO bytes sent directly to DB node to balance CPU            0
cell physical IO interconnect bytes returned by smart scan                0
cell IO uncompressed bytes                                                0

10 rows selected.

Elapsed: 00:00:00.12

No comments: