# IBM(c) 2008 EPL license http://www.eclipse.org/legal/epl-v10.html

annotatelog.README

This README describes how to use the annotatelog script.

The syntax of the annotatelog command is:

annotatelog -f log_file [-s start_time] [-e end_time]
            { [-i -g guid_file -l link_file] [-S] [-c] [-u]| [-a -g guid_file -l link_file]}
            {[-n node_list -g guid_file] [-E]}
            [-h]

        -A       Output the combination of -i, -S, -c and -u. It should be used with -g and -l flags.
        -f log_file
                 Specifies a log file fullpath name to analyze.
                 Must be xCAT consolidated log got from Qlogic HSM or ESM.
        -s start_time
                 Specifies the start time for analysis, where the start_time
                 variable has the format ddmmyyhh:mm:ss (day, month, year, 
                 hour, minute, and second), 00:00:00 is valid.
        -e end_time
                 Specifies the end time for analysis, where the end_time 
                 variable has the format ddmmyyhh:mm:ss (day, month, year, 
                 hour, minute, and second), 00:00:00 is valid.
        -l link_file
                 Specifies a link file fullpath name, which concatenates all
                 '/var/opt/iba/analysis/baseline/fabric*links' files from all fabric management servers.
        -g guid_file
                 Specifies a guid file fullpath name, which has a list of 
                 GUIDs as obtained from the "getGuids" script.
        -E       Annotate with node ERRLOG_ON and ERRLOG_OFF information. This
                 can help determine if a disappearance was caused by a node
                 disappearing. It is for AIX nodes only and should be used with -x or -n flag.
        -S       Sort the log entries by subnet manager only.
        -i       Sort the log entries by IB node only.
        -c       Sort the log entries by chassis only.
        -u       Sort the log entries by FRU only.
        -n node_list
                 Specifies a comma-separated list of node host names, IP addresses to look up in log entries.
        -h       Display usage information.
        
In xCAT cluster with IB QLogic switches, the switch logs and subnet 
manager (ESM/HSM) logs will use the syslog protocol for log redirection; 
they are redirected to the xCAT Management Node. The xCAT Management Node syslogd recognizes the facility (local6) and priority (NOTICE and above) and put the log 
entries into a file/FIFO that is being monitored by AIXSyslogSensor on 
AIX system or ErrorLogSensor on Linux system. The condition-response 
setup on xCAT Management Node local will move the log entries to file 
/var/log/xcat/errorlog/[xCAT Management Node]. So there are a lot of 
entries in this log file and it is difficult for the administrator to look through.

annotatelog is a sample script to parse the QLogic log entries in file 
/var/log/xcat/errorlog/[xCAT Management Node] on xCAT Management Node 
by subnet manager, IB node, chassis, FRU(Field-Replaceable Unit) or a 
particular node. This script is supported by both AIX and Linux Management 
Node. From xCAT's point of view, the log to analyze must be xCAT 
consolidated log, which means this log file must come from xCAT 
syslog/errorlog monitoring mechanism, such as /var/log/xcat/errorlog/[xCAT 
Management Node] file. Since the log format is various, xCAT do not 
support other log files.

This script provides several flags to specify the category critera, 
they are -S, -i, -c, -u, -n and -A.
If -S flag is set, the output will be sorted by Subnet Manager, since 
the SM may have multi-port, so the output is classified by 
<subnet_manager_name: port number>, please see the details in the 
example below:

############################################
Logs by Subnet Manager
############################################

----------------------------------------------
Report by subnet manager: 'c890f12ec07:port 2' 
----------------------------------------------
May  5 09:06:33 c890f12ec07 local6:notice c890f12ec07 iview_sm[5445]: 
c890f12ec07; MSG:NOTICE|SM:c890f12ec07:port 2|COND:#4 Disappearance 
from fabric|NODE:IBM G1 Logical HCA :port 2:0x000255007002651f|DETAIL:
Node type: hca
May  5 08:23:55 c890f12ec07 local6:notice c890f12ec07 iview_sm[5128]: 
c890f12ec07; MSG:NOTICE|SM:c890f12ec07:port 2|COND:#3 Appearance in 
fabric|NODE:IBM G1 Logical HCA :port 2:0x0002550070027f1f|DETAIL:Node 
type: hca

----------------------------------------------
Report by subnet manager: 'c890f12ec07:port 1'
----------------------------------------------

May  5 09:06:33 c890f12ec07 local6:notice c890f12ec07 iview_sm[5442]: 
c890f12ec07; MSG:NOTICE|SM:c890f12ec07:port 1|COND:#4 Disappearance 
from fabric|NODE:IBM G1 Logical HCA :port 1:0x000255007002650f|DETAIL:
Node type: hca
May  5 09:06:33 c890f12ec07 local6:notice c890f12ec07 iview_sm[5442]: 
c890f12ec07; MSG:NOTICE|SM:c890f12ec07:port 1|COND:#4 Disappearance 
from fabric|NODE:IBM G1 Logical HCA :port 1:0x0002550070027f0f|DETAIL:
Node type: hca


If -i flag is set, the output will be sorted by IB node, it is 
classified by < node_name: port number : GUID number>, furthermore, 
if the log entry includes keyword of "IBM *Logical", then we will 
display the link relationships between the IBM G1 Logic HCA and 
IBM G1 Logic Switch, IBM G1 Logic Switch and Qlogic Switch. We will 
use the result file of getGuids script to get the node name 
corresponding to HCA port number and GUIDs; and use the fabric link 
files  from Fabric Management Server(HSM/ESM) to get the Qlogic swith 
connection information, so -i flag must be used with -g and -l flags.; 
and if the log entry does not include keyword of "IBM *Logical", we 
could not get the corresponding nodename and connection relationship 
for it, so this entry will be displayed directly and set the string 
after "NODE:" as the IB node. Please see the details in the examples below:

############################################
Logs by IB node
############################################

----------------------------------------------
Reported by IB node: 'c890f11ec06.ppd.pok.ibm.com: port 1: GUID 0x0002550070027f0f'
- ehca0 <-> IBM G1 Logical Switch 1 port 17 (GUID 0x0002550070027f20)
- Connector is C65-T1 (HV=Cx-T1)
- IBM G1 Logical Switch 1 <-> SilverStorm 9024 DDR port 16 (GUID 0x00066a00d900042d)
----------------------------------------------

May  5 09:06:33 c890f12ec07 local6:notice c890f12ec07 iview_sm[5442]: 
c890f12ec07; MSG:NOTICE|SM:c890f12ec07:port 1|COND:#4 Disappearance 
from fabric|NODE:IBM G1 Logical HCA :port 1:0x0002550070027f0f|DETAIL:
Node type: hca

----------------------------------------------
Reported by IB node: 'SilverStorm 9120 GUID=0x00066a0002000225 Leaf 8, Chip A:port 0:0x00066a0007001311'
----------------------------------------------
Mar 25 16:21:54 c890f12ec07 iview_sm[3725]: c890f12ec07; MSG:NOTICE|
SM:c890f12ec07:port 1|COND:#4 Disappearance from fabric|NODE:SilverStorm 
9120 GUID=0x00066a0002000225 Leaf 8, Chip A:port 0:0x00066a0007001311|
DETAIL:Node type: switch


If -c flag is set, the output will be sorted by chassis, it is classified 
by <chassis: model type: GUID number>, please see the details in the example 
below:

############################################################
Logs by CHASSIS
############################################################

----------------------------------------------
Reported by chassis: 'SilverStorm: model 9120: GUID 0x00066a0002000225' 
----------------------------------------------

Apr 23 09:16:56 qswitch slot101:9.114.80.179;MSG:WARNING|CHASSIS:SilverStorm 
9120 GUID=0x00066a0002000225|COND:#17 FRU state changed from online 
to offline|FRU:Power Supply 1|PN:200805-101


If -u flag is set, the output will be sorted by FRU, please see the details in 
the example below:

############################################################
Logs by FRUS from CHASSIS
############################################################

------------------------------------------------------------
Associated with FRU: 'Power Supply 1' 
------------------------------------------------------------

Apr 23 09:19:12 qswitch slot101:9.114.80.179;MSG:NOTICE|CHASSIS:SilverStorm 
9120 GUID=0x00066a0002000225|COND:#18 FRU state changed from offline 
to online|FRU:Power Supply 1|PN:200805-101


If -n flag is set, the output will be sorted by a certain nodename, please see 
the details in the example below:

############################################################
Logs for special nodes
############################################################
------------------------------------------------------------
Reported by node: 'c890f11ec06.ppd.pok.ibm.com'
------------------------------------------------------------

May  5 08:23:55 c890f12ec07 local6:notice c890f12ec07 iview_sm[5128]: 
c890f12ec07; MSG:NOTICE|SM:c890f12ec07:port 2|COND:#3 Appearance in fabric|
NODE:IBM G1 Logical HCA :port 2:0x0002550070027f1f|DETAIL:Node type: hca
May  5 09:06:33 c890f12ec07 local6:notice c890f12ec07 iview_sm[5442]: 
c890f12ec07; MSG:NOTICE|SM:c890f12ec07:port 1|COND:#4 Disappearance from 
fabric|NODE:IBM G1 Logical HCA :port 1:0x0002550070027f0f|DETAIL:Node 
type: hca

------------------------------------------------------------
Reported by node: 'c890f11ec05.ppd.pok.ibm.com'
------------------------------------------------------------
May  5 09:06:33 c890f12ec07 local6:notice c890f12ec07 iview_sm[5445]: 
c890f12ec07; MSG:NOTICE|SM:c890f12ec07:port 2|COND:#4 Disappearance from 
fabric|NODE:IBM G1 Logical HCA :port 2:0x000255007002651f|DETAIL:Node 
type: hca


If -E flag is set with -n or -i flag, annotatelog script will use dsh to access 
the IB nodes or node list specified by -n flag, and use "errpt -J ERRLOG_ON 
ERRLOG_OFF" command to get the corresponding timestamps, and added these 
timestamps into annotatelog output. Please see the detail below:

----------------------------------------------
Reported by IB node: 'c890f11ec06.ppd.pok.ibm.com: port 1: GUID 0x0002550070027f0f'
- ehca0 <-> IBM G1 Logical Switch 1 port 17 (GUID 0x0002550070027f20)
- Connector is C65-T1 (HV=Cx-T1)
- IBM G1 Logical Switch 1 <-> SilverStorm 9024 DDR port 16 (GUID 0x00066a00d900042d)
----------------------------------------------
# c890f11ec06.ppd.pok.ibm.com
# ERRLOG_ON: 01/05/08 09:23
# ERRLOG_OFF: 04/05/08 09:23
----------------------------------------------

May  5 09:06:33 c890f12ec07 local6:notice c890f12ec07 iview_sm[5442]: 
c890f12ec07; MSG:NOTICE|SM:c890f12ec07:port 1|COND:#4 Disappearance from 
fabric|NODE:IBM G1 Logical HCA :port 1:0x0002550070027f0f|DETAIL:Node 
type: hca


If -A flag is set, the output will be the combination of -i, -S, -c and -u.

If a log entry that can not be parsed into any types above, it will be displayed 
in "Logs by others". And these logs can only be displayed when -A flag is used, 
please see the details in the example below:

############################################################
Logs by others
############################################################
May 14 11:44:22 qswitch slot101:9.114.80.179 FEtask[86f38fb8]: ESM: Embedded SM 
Error: rmsg_recv: output buffer[2016] too small for incoming data[2035] : 0
May 14 11:44:22 qswitch slot101:9.114.80.179 PM_task[86f08298]: ESM: Embedded SM 
Error: DoSendFeAsync  - message send failed  : 128