[Pacemaker] Failed stop of stonith resource
Andrew Beekhof
andrew at beekhof.net
Thu Aug 15 05:10:54 UTC 2013
On 14/08/2013, at 7:51 AM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> Hi,
>
> I just caught unexpected fencing of a node because (as I see from a very
> quick analysis, but I may be wrong) stonith resource running on it
> (fence_ipmilan) failed to start and then stop.
>
> Excerpt from logs:
>
> Aug 13 20:57:56 v03-a pengine[2329]: notice: stage6: Scheduling Node
> v03-a for shutdown
> Aug 13 20:57:56 v03-a pengine[2329]: notice: LogActions: Move
> stonith-ipmi-v03-b#011(Started v03-a -> mgmt01)
> Aug 13 20:57:56 v03-a crmd[2330]: notice: te_rsc_command: Initiating
> action 127: stop stonith-ipmi-v03-b_stop_0 on v03-a (local)
> Aug 13 20:57:56 v03-a crmd[2330]: notice: process_lrm_event: LRM
> operation stonith-ipmi-v03-b_stop_0 (call=992, rc=0, cib-update=415,
> confirmed=true)
> Aug 13 20:57:56 v03-a crmd[2330]: notice: te_rsc_command: Initiating
> action 128: start stonith-ipmi-v03-b_start_0 on mgmt01
> Aug 13 20:58:58 v03-a crmd[2330]: warning: status_from_rc: Action 128
> (stonith-ipmi-v03-b_start_0) on mgmt01 failed (target: 0 vs. rc: 1): Error
> Aug 13 20:58:58 v03-a crmd[2330]: warning: update_failcount: Updating
> failcount for stonith-ipmi-v03-b on mgmt01 after failed start: rc=1
> (update=INFINITY, time=1376427538)
> Aug 13 20:58:58 v03-a crmd[2330]: warning: update_failcount: Updating
> failcount for stonith-ipmi-v03-b on mgmt01 after failed start: rc=1
> (update=INFINITY, time=1376427538)
> Aug 13 20:58:58 v03-a pengine[2329]: warning: unpack_rsc_op: Processing
> failed op start for stonith-ipmi-v03-b on mgmt01: unknown error (1)
> Aug 13 20:58:58 v03-a pengine[2329]: notice: LogActions: Recover
> stonith-ipmi-v03-b#011(Started mgmt01)
> Aug 13 20:58:59 v03-a crmd[2330]: notice: te_rsc_command: Initiating
> action 1: stop stonith-ipmi-v03-b_stop_0 on mgmt01
> Aug 13 20:59:01 v03-a crmd[2330]: warning: status_from_rc: Action 1
> (stonith-ipmi-v03-b_stop_0) on mgmt01 failed (target: 0 vs. rc: 1): Error
> Aug 13 20:59:01 v03-a crmd[2330]: warning: update_failcount: Updating
> failcount for stonith-ipmi-v03-b on mgmt01 after failed stop: rc=1
> (update=INFINITY, time=1376427541)
> Aug 13 20:59:01 v03-a crmd[2330]: warning: update_failcount: Updating
> failcount for stonith-ipmi-v03-b on mgmt01 after failed stop: rc=1
> (update=INFINITY, time=1376427541)
> Aug 13 20:59:12 v03-a pengine[2329]: warning: unpack_rsc_op: Processing
> failed op stop for stonith-ipmi-v03-b on mgmt01: unknown error (1)
> Aug 13 20:59:12 v03-a pengine[2329]: warning: pe_fence_node: Node
> mgmt01 will be fenced because of resource failure(s)
> Aug 13 20:59:12 v03-a pengine[2329]: warning: common_apply_stickiness:
> Forcing stonith-ipmi-v03-b away from mgmt01 after 1000000 failures
> (max=1000000)
> Aug 13 20:59:12 v03-a pengine[2329]: warning: stage6: Scheduling Node
> mgmt01 for STONITH
>
> I would expect stonith resources failures not to cause fencing. Am I wrong?
Failures are handled just like any other type of resource.
But there could be a case made that stop operations for fencing resources should default to on-fail=block instead of fence.
>
> mgmt01 is running merge of latest ClusterLabs and beekhof trees
> (ClusterLabs/pacemaker/master 98aca50 + beekhof/pacemaker/master
> 86b339c), v03-a was running 2518fd0 when that happened (I was rebooting
> it in order to upgrade to the above version).
>
> Sure, reason of the failure of the fence_ipmilan requires investigations
> too, but that is not important for the above issue I think.
>
> Vladislav
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130815/5331c711/attachment-0004.sig>
More information about the Pacemaker
mailing list