Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Monday, November 04, 2013

Exadata: Diagnostics using sundiag/deaddisk

For Sun Oracle Exadata Environments
On each Exadata compute and storage cell nodes, Oracle delivers a utility called sundiag.sh . Bydefault sundiag.sh script isinstalled in /opt/oracle.SupportTools.When logging Oracle Service Requests, it is common for Oracle Support to request the output of the sundiag.sh utility.

When complete, the output of the sundiag.sh utility will be stored in /tmp with a date-stamped BZ2-compressed file. On both compute servers and storage cells, sundiag.sh generates the following output:

• The output of the dmesg command, which contains kernel-level diagnostics from the kernel ring buffer
• The output of fdisk –l, which contains a list of all disk partitions
• The output of lspci, which contains a list of all PCI buses on the system
• The output of lsscsi, which contains a list of all SCSI drives on the system
• Various outputs of MegaCli64, which provides MegaRAID controller diagnostics
• The output from ipmitool sel elist, which queries the ILOM interface for assorted sensor readings for all IPMI enabled devices
• A copy of /var/log/messages
• A file called MegaSAS.log, which provides information about your SAS disks

When launched from an Exadata Storage Server, sundiag.sh also collects the following information
• The output of cellcli list cell detail
• The output of cellcli list celldisk detail
• The output of cellcli list lun detail
• The output of cellcli list physicaldisk detail
• The output of all physical disks not in a normal state
• The output of cellcli list griddisk detail
• The output of cellcli list flashcache detail
• The output of cellcli list alerthistory
• A copy of your storage cell alert.log, ms-odl.log, and ms-odl.trc files
• Information about your PCI flash modules, or FDOMs, by using the /usr/bin/flash_dom –l command
• The output of /opt/oracle/cell/cellsrv/deploy/scripts/unix/hwadapter/diskadp /scripts_aura.sh, which provides details about your disk adapters
• Additional information about your disk devices from the /opt/oracle/cell/cellsrv/deploy/scripts/unix/hwadapter/diskadp/get_disk_devices.pl script

[root@exadb sundiag]# ./sundiag_v1.4.1.sh 

Oracle Exadata Database Machine - Diagnostics Collection Tool

Gathering Linux information

./sundiag_v1.4.1.sh: line 256: /opt/oracle.cellos/imageinfo: No such file or directory
./sundiag_v1.4.1.sh: line 257: /opt/oracle.cellos/imagehistory: No such file or directory
/bin/cp: cannot stat `/var/log/cellos/validations': No such file or directory
./sundiag_v1.4.1.sh: line 282: /usr/sbin/iblinkinfo.pl: No such file or directory
./sundiag_v1.4.1.sh: line 284: /usr/sbin/ibstatus: No such file or directory
./sundiag_v1.4.1.sh: line 331: /usr/bin/lsscsi: No such file or directory


Skipping ILOM collection. Use the ilom or snapshot options, or login to ILOM
over the network and run Snapshot separately if necessary.

./sundiag_v1.4.1.sh: line 594: [: =: unary operator expected
./sundiag_v1.4.1.sh: line 597: [: =: unary operator expected
./sundiag_v1.4.1.sh: line 613: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 615: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 616: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 617: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 618: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 619: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 620: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 91: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 93: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 95: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 98: [: : integer expression expected
./sundiag_v1.4.1.sh: line 99: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 102: [: : integer expression expected
./sundiag_v1.4.1.sh: line 622: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
./sundiag_v1.4.1.sh: line 627: /opt/MegaRAID/MegaCli/MegaCli64: No such file or directory
Generating diagnostics tarball and removing temp directory

=================================================================================
Done. The report files are bzip2 compressed in /tmp/sundiag_exadb_0_2013_10_31_19_41.tar.bz2
=================================================================================

For HP Oracle Exadata Environments:
deaddisk utility can be used on Exadata Storage Server with the disks failures. After running the utility you can Upload the zip file generated. Under /tmp,  zip file with the content of directory created by the execution of the script, format info_deaddisk_.zip, Example: info_deaddisk_2013-10-31-19:52:14.zip

[root@exacell1 tmp]# ./deaddisk.sh

Stopping MS services...
The SHUTDOWN of MS services was successful.

Starting to collect disk information.....
./deaddisk.sh[20]: hpacucli: not found [No such file or directory]
./deaddisk.sh[21]: hpaducli: not found [No such file or directory]
./deaddisk.sh[22]: hpaducli: not found [No such file or directory]


Starting MS services...
The STARTUP of MS services was successful.
cp: cannot stat `/var/spool/compaq/hpasm/registry/serial_output/*': No such file or directory
adding: info_deaddisk_2013-10-31-20:14:58/ (stored 0%)
adding: info_deaddisk_2013-10-31-20:14:58/messages (deflated 89%)
adding: info_deaddisk_2013-10-31-20:14:58/localhost.localdomain localhost_2013-10-31-20:14:58.hpacucli.txt (stored 0%)


[root@exacell1 tmp]#


Note: I got  errors because of the nature of my environment .

Related Posts:
Exadata: Monitoring Active Requests, Alerts and Wait Events
Exadata: Replacing damaged disk is really plugNplay activity
Exadata: Get Cell statistics quickly
Exadata: What differentiates GI on Exadata with GI on non-Exadata?
Exadata: Understanding key OS Processes for a cell
Exadata: Health Checking Exadata
Exadata: Knowing a bit Exadata administrative utilities

No comments: