Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Monday, October 28, 2013

Exadata: Monitoring Active Requests, Alerts and Wait Events

Active request provides a client-centric or application-centric view of client I/O requests that are currently being processed by a cell. An active request is characterized at all levels: instance, database, ASM, and cell.

Alerts represent events of importance occurring within the storage cell, typically indicating that storage cell functionality is either compromised or in danger of failure.An alert is automatically triggered when a predefined hardware or software issue is detected, or when a metric exceeds a threshold.
Alert history entries are retained for a maximum of 100 days. If the number of alert history entries exceeds 500, then the alert history entries are only retained for 7 days. 
Stateful alerts represent observable cell states that can be subsequently retested to detect whether the state has changed, indicating that a previously observed alert condition is no longer a problem.
Stateless alerts represent point-in-time events that do not represent a persistent condition; they simply show that something has occurred.
1- Active requests

CellCLI> LIST ACTIVEREQUEST  -- shows the active requests


CellCLI> LIST ACTIVEREQUEST WHERE IoType = 'predicate pushing' DETAIL
CellCLI> DESCRIBE ACTIVEREQUEST -- shows attributes

2- Alerts
CellCLI> LIST ALERTDEFINITION -- shows all the alerts
CellCLI> LIST ALERTDEFINITION HardwareAlert DETAIL -- shows detail of particular alert


CellCLI> LIST ALERTDEFINITION ATTRIBUTES name, metricName, description
ADRAlert "Incident Alert"
HardwareAlert "Hardware Alert"
StatefulAlert_CD_IO_ERRS_MIN CD_IO_ERRS_MIN "Threshold Alert"
StatefulAlert_CG_IO_RQ_LG CG_IO_RQ_LG "Threshold Alert"
StatefulAlert_CG_IO_RQ_LG_SEC CG_IO_RQ_LG_SEC "Threshold Alert"
StatefulAlert_CG_IO_RQ_SM CG_IO_RQ_SM "Threshold Alert"
....
The metric name identifies the metric on which the alert is based.

CellCLI> LIST ALERTHISTORY WHERE severity = 'critical' AND examinedBy = '' DETAIL
When stateful alerts are cleared, meaning that the underlying metric, hardware or software condition has returned to normal, then the stateful alert is retained either 100 or 7 days, depending on the number of alert history entries. Stateful alerts that are not cleared are retained, regardless of their age.
CellCLI> LIST ALERTHISTORY WHERE alertmessage LIKE  "Logical drive lost.*" DETAIL
CellCLI> ALTER ALERTHISTORY 13 examinedBy="InamBukhary" -- modify alert history

3- To determine if there are any metrics that are currently in an abnormal state.
CellCLI> list metriccurrent where alertState != normal

4- To show the current space utilization on the cell file systems that contain the cell operating
system and Exadata software binaries.
CellCLI> list metriccurrent cl_fsut
         CL_FSUT         "/"                     63 %
         CL_FSUT         "/boot"                 28 %
         CL_FSUT         "/dev/shm"              0 %
         CL_FSUT         "/opt/oracle"           36 %
         CL_FSUT         "/var/log/oracle"       7 %

5- LIST METRICHISTORY command shows historical metric observations that are maintained in an internal repository on the cell. The retention period for metric history observations is specified by the metricHistoryDays cell attribute. The default retention period is 7 days and you can modify this setting with the CellCLI ALTER CELL command. You can display all the retained history for all the metrics by using LIST METRICHISTORY; however, doing so will output vast amounts of data so you should always use a filter to return more specific output.
CellCLI> list metrichistory where alertState != normal

6- Use the DROP ALERTHISTORY command to clear out old or unwanted alerts.If you want to
focus on future alerts, you can clear the entire alert history.
CellCLI> drop alerthistory all

Monitoring using Views
V$CELL and V$CELL ==>   Provide identifying information about cells
V$BACKUP_DATAFILE ==> Contains columns (BLOCKS, BLOCKS_READ, BLOCKS_SKIPPED_IN_CELL, DATAFILE_BLOCKS) relevant to Exadata Cell during Oracle Recovery Manager (RMAN) incremental backups.
V$SYSSTAT ==> for statistics that can be used to compute Exadata Cell effectiveness.

SQL> SELECT name, value/1024/1024 MB from v$sysstat a WHERE
a.name = 'physical read total bytes' OR
a.name = 'physical write total bytes' OR
a.name = 'cell physical IO interconnect bytes' OR
a.name = 'cell physical IO bytes eligible for predicate offload' OR
a.name = 'cell physical IO bytes saved during optimized file creation' OR
a.name = 'cell physical IO bytes saved during optimized RMAN file restore'
OR
a.name = 'cell IO uncompressed bytes' OR
a.name = 'cell physical IO interconnect bytes returned by smart scan' OR
a.name = 'cell physical IO bytes saved by storage index';

V$SEGMENT_STATISTICS ==> Statistics on a per segment basis.
SELECT object_name, value FROM V$segment_statistics WHERE 
statistic_name=’optimized physical reads’ AND value>1000 ORDER BY value;

V$SQL ==> Statistics on shared SQL areas and contains one row for each child of the original SQL text entered.
SELECT sql_text,
io_cell_offload_eligible_bytes/1024/1024 cell_offload_eligible_mb,
io_cell_uncompressed_bytes/1024/1024 io_uncompressed_mb,
io_interconnect_bytes/1024/1024 io_interconnect_mb,
io_cell_offload_returned_bytes/1024/1024 cell_return_bytes_mb,
(physical_read_bytes + physical_write_bytes)/1024/1024 io_disk_mb
FROM v$sql WHERE
sql_text LIKE '%from mytable%';

V$CELL_STATE ==> State of all the cells accessible from the database client. The state includes key performance statistics, such as Cell Server (CELLSRV) network layer information, scheduling layer information, and Cell Server configuration information.

SQL> SPOOL /tmp/cell_state_20131028.log
SQL> SET PAGESIZE 10000
SQL> SET LONG 500000
SQL> SELECT statistics_type, XMLTYPE(statistics_value) FROM v$cell_state;
SQL> SPOOL OFF
Used XMLTYPE(statistics_value)to retrieve the information in the STATISTICS_VALUE column in a readable format.

$CELL_THREAD_HISTORY ==> Contains the samples that Cell Server takes of the threads in the cells visible to the database client.
V$CELL_REQUEST_TOTALS ==> Contains a historical view of the types and frequencies of the requests being run by a cell.

Monitor Cell Wait Events
Oracle uses a different set of wait events for disk I/O to Exadata Cell than disk I/O to conventional storage. 
V$SESSION_ACTIVE_HISTORY ==> Can be used to analyze wait events. This view shows what has happened, when a query was run, and how it ran. It also shows what events the query had to wait on.
V$SESSION_WAIT  ==>   Monitor Sessions
V$SYSTEM_EVENT ==>  Monitor Wait Events
V$SESSION_EVENT ==> Monitor Events by Sessions

Wait Events for Exadata Cell
cell interconnect retransmit during physical read ==> Appears during retransmission for an I/O of a single-block or multiblock read.
cell list of blocks physical read ==> Equivalent to db file parallel read for a cell
cell multiblock physical read ==> Equivalent to db file scattered read for a cell
cell single block physical read ==> equivalent to db file sequential read for a cell
cell smart file creation ==> Appears when the database is waiting for the completion of a file creation on a cell
cell smart index scan ==> Appears when the database is waiting for index or index-organized table (IOT) fast full scans
cell smart restore from backup ==> Appears when the database is waiting for the completion of a file initialization for restore from backup on a cell
cell smart table scan ==> appears when the database is waiting for table scans to complete on a cell
cell statistics gather ==> appears when a select is done on the V$CELL_STATE, V$CELL_THREAD_HISTORY, or V$CELL_REQUEST_TOTALS tables. During the select, data from the cells and any wait events are shown in this wait event.

Few useful queries 
-- Display the current or last wait for each session
SELECT w.event, w.p1, w.p2, w.p3 FROM V$SESSION_WAIT w, V$EVENT_NAME e
WHERE e.name LIKE 'cell%' AND e.wait_class_id = w.wait_class_id;

SELECT w.event, c.cell_path, d.name, w.p3 FROM V$SESSION_WAIT w,
V$EVENT_NAME e, V$ASM_DISK d, V$CELL c
WHERE e.name LIKE 'cell%' AND e.wait_class_id = w.wait_class_id
AND w.p1 = c.cell_hashval AND w.p2 = d.hash_value;

--Display information about the number of total waits for an event
SELECT s.event FROM V$SYSTEM_EVENT s, V$EVENT_NAME e WHERE e.name LIKE 'cell%'
AND e.event_id = s.event_id;

-- Display information about waits for an event by a session
SELECT s.event FROM V$SESSION_EVENT s, V$EVENT_NAME e WHERE e.name LIKE 'cell%'
AND e.event_id = s.event_id;

No comments: