OS-6448: fmdump(1m) should be more resilient in the face of missing message content

Details

Issue Type:Bug
Priority:5 - Low
Status:Resolved
Created at:2017-11-08T20:00:36.421Z
Updated at:2018-06-06T16:10:59.000Z

People

Created by:Rob Johnston [X]
Reported by:Rob Johnston [X]
Assigned to:Rob Johnston [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2017-12-11T19:09:13.689Z)

Fix Versions

2017-12-07 Gold Saucer (Release Date: 2017-12-07)

Related Links

Description

This is issue was observed on illumos-2727bb055f.

fmdump(1m) implements a "-m" option which will dump the localized message content associated with each event in the fault logs (/var/fm/fmd/fltlog*).

Currently, if fmdump fails to lookup the message content for an event it exits immediately with an error. This is an undesirable behavior for a couple reasons:

1) The current error doesn't even identify the affected event or the diagcode that it failed to lookup the msg content for.

2) In such a case, it would be far more useful to at least dump as much as it can successfully rather than bailing out on the first error.

Given that fmd supports proxying events from other fault managers, potentially from other machines (or OS instances) the possibility of ending up with an list.* FM event for which there is no associated message content on the local machine is not entirely inconceivable, so we should handle it better.

This CR is to change fmdump such that if it fails to lookup the message content for an event, rather than exiting, it will print an error message indicating that it couldn't lookup the content for said event (identifying the event by its uuid) and then forge on.

Comments

Comment by Rob Johnston [X]
Created at 2017-11-20T22:32:09.106Z
Updated at 2017-11-28T00:10:38.675Z
Manual Test Result
===============
Below are the contents of an fmd fltlog populated with some list.* events:

root@openindiana:/usr/bin# fmdump
TIME                 UUID                                 SUNW-MSG-ID EVENT
Nov 07 19:33:06.1791 dcce622d-886d-4157-ee31-aa25a052a949 SMF-8000-YX Diagnosed
Nov 07 19:35:06.2355 dcce622d-886d-4157-ee31-aa25a052a949 FMD-8000-4M Repaired
Nov 07 19:35:06.2406 dcce622d-886d-4157-ee31-aa25a052a949 FMD-8000-6U Resolved
Nov 07 21:29:06.2792 aa1aafcd-fe07-6485-92e4-f1da458ac23c SUNOS-8000-DM Diagnosed
Nov 07 21:31:12.9770 aa1aafcd-fe07-6485-92e4-f1da458ac23c FMD-8000-4M Repaired
Nov 07 21:31:13.0162 aa1aafcd-fe07-6485-92e4-f1da458ac23c FMD-8000-6U Resolved
Nov 07 21:34:33.5724 3e2cb549-54c1-e884-99bf-a31c900b99f7 SUNOS-8000-DM Diagnosed
Nov 07 21:36:10.6988 3e2cb549-54c1-e884-99bf-a31c900b99f7 FMD-8000-4M Repaired
Nov 07 21:36:10.7020 3e2cb549-54c1-e884-99bf-a31c900b99f7 FMD-8000-6U Resolved
Nov 07 21:36:11.0035 0c249033-d69d-ea19-a29e-aad8a02b80f1 SUNOS-8000-DM Diagnosed
Nov 07 21:51:16.5448 0c249033-d69d-ea19-a29e-aad8a02b80f1 FMD-8000-4M Repaired
Nov 07 21:51:16.5482 0c249033-d69d-ea19-a29e-aad8a02b80f1 FMD-8000-6U Resolved
Nov 07 21:52:26.4169 16509c32-cc54-4492-a4bb-83325f05d4c3 FMD-8000-11 Diagnosed

To test the fix, I replaced the following files with ones that were missing the
content for the following diagcodes SMF-8000-YX and SUNOS-8000-DM:

/usr/lib/fm/dicts/SUNOS.dict
/usr/lib/locale/C/LC_MESSAGES/SUNOS.mo
/usr/lib/fm/dicts/SMF.dict
/usr/lib/locale/C/LC_MESSAGES/SMF.mo


Unfixed fmdump:

root@openindiana:/usr/sbin# fmdump -m
fmdump: failed to format message: No such file or directory
fmdump: warning: failed to dump /var/fm/fmd/fltlog: Error 0
root@openindiana:/usr/sbin# echo $?
3

Fixed fmdump:

oot@openindiana:/usr/sbin# fmdump -m
fmdump: failed to format message for diagcode SMF-8000-YX, event dcce622d-886d-4157-ee31-aa25a052a949: No such file or directory

SUNW-MSG-ID: FMD-8000-4M, TYPE: Repair, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Nov  7 19:33:06 PST 2017
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: openindiana
SOURCE: fmd, REV: 1.2
EVENT-ID: dcce622d-886d-4157-ee31-aa25a052a949
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-4M for more information.
AUTO-RESPONSE: Some system components offlined because of the original fault may have been brought back online.
IMPACT: Performance degradation of the system due to the original fault may have been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Nov  7 19:33:06 PST 2017
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: openindiana
SOURCE: fmd, REV: 1.2
EVENT-ID: dcce622d-886d-4157-ee31-aa25a052a949
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-6U for more information.
AUTO-RESPONSE: All system components offlined because of the original fault have been brought back online.
IMPACT: Performance degradation of the system due to the original fault has been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

fmdump: failed to format message for diagcode SUNOS-8000-DM, event aa1aafcd-fe07-6485-92e4-f1da458ac23c: No such file or directory

SUNW-MSG-ID: FMD-8000-4M, TYPE: Repair, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Nov  7 21:29:06 PST 2017
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: openindiana
SOURCE: fmd, REV: 1.2
EVENT-ID: aa1aafcd-fe07-6485-92e4-f1da458ac23c
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-4M for more information.
AUTO-RESPONSE: Some system components offlined because of the original fault may have been brought back online.
IMPACT: Performance degradation of the system due to the original fault may have been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Nov  7 21:29:06 PST 2017
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: openindiana
SOURCE: fmd, REV: 1.2
EVENT-ID: aa1aafcd-fe07-6485-92e4-f1da458ac23c
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-6U for more information.
AUTO-RESPONSE: All system components offlined because of the original fault have been brought back online.
IMPACT: Performance degradation of the system due to the original fault has been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

fmdump: failed to format message for diagcode SUNOS-8000-DM, event 3e2cb549-54c1-e884-99bf-a31c900b99f7: No such file or directory

SUNW-MSG-ID: FMD-8000-4M, TYPE: Repair, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Nov  7 21:34:33 PST 2017
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: 3e2cb549-54c1-e884-99bf-a31c900b99f7
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-4M for more information.
AUTO-RESPONSE: Some system components offlined because of the original fault may have been brought back online.
IMPACT: Performance degradation of the system due to the original fault may have been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Nov  7 21:34:33 PST 2017
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: 3e2cb549-54c1-e884-99bf-a31c900b99f7
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-6U for more information.
AUTO-RESPONSE: All system components offlined because of the original fault have been brought back online.
IMPACT: Performance degradation of the system due to the original fault has been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

fmdump: failed to format message for diagcode SUNOS-8000-DM, event 0c249033-d69d-ea19-a29e-aad8a02b80f1: No such file or directory

SUNW-MSG-ID: FMD-8000-4M, TYPE: Repair, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Nov  7 21:36:10 PST 2017
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: openindiana
SOURCE: fmd, REV: 1.2
EVENT-ID: 0c249033-d69d-ea19-a29e-aad8a02b80f1
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-4M for more information.
AUTO-RESPONSE: Some system components offlined because of the original fault may have been brought back online.
IMPACT: Performance degradation of the system due to the original fault may have been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Nov  7 21:36:10 PST 2017
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: openindiana
SOURCE: fmd, REV: 1.2
EVENT-ID: 0c249033-d69d-ea19-a29e-aad8a02b80f1
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-6U for more information.
AUTO-RESPONSE: All system components offlined because of the original fault have been brought back online.
IMPACT: Performance degradation of the system due to the original fault has been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

SUNW-MSG-ID: FMD-8000-11, TYPE: Defect, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Nov  7 21:52:26 PST 2017
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: 16509c32-cc54-4492-a4bb-83325f05d4c3
DESC: An illumos Fault Manager component generated a diagnosis for which no message summary exists.  Refer to http://illumos.org/msg/FMD-8000-11 for more information.
AUTO-RESPONSE: The diagnosis has been saved in the fault log for examination by your illumos distribution team.
IMPACT: The fault log will need to be manually examined using fmdump(1M) in order to determine if any human response is required.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to view the diagnosis result.  Ensure that fault management software is installed properly.

root@openindiana:/usr/sbin# echo $?
3

Comment by Jira Bot
Created at 2017-11-29T19:58:28.470Z

illumos-joyent commit 58853b2326b7e3a605e4e558d0af3e028c87f434 (branch master, by Rob Johnston)

OS-6448 fmdump(1m) should be more resilient in the face of missing message content
Reviewed by: Robert Mustacchi <robert.mustacchi@joyent.com>
Approved by: Robert Mustacchi <robert.mustacchi@joyent.com>


Comment by Rob Johnston [X]
Created at 2018-06-06T16:09:14.666Z
Updated at 2018-06-06T16:10:58.997Z

Upstreamed to illumos-gate via the following commit:

Author: Rob Johnston <rob.johnston@joyent.com>
Date:   Tue Jan 2 13:44:04 2018 -0800

    8946 fmdump(1m) should be more resilient in the face of missing message content
    Reviewed by: Robert Mustacchi <robert.mustacchi@joyent.com>
    Reviewed by: Andy Stormont <astormont@racktopsystems.com>
    Reviewed by: Igor Kozhukhov <igor@dilos.org>
    Approved by: Gordon Ross <gwr@nexenta.com>