[ClusterLabs] STONITH forever?
Ken Gaillot
kgaillot at redhat.com
Tue Apr 10 10:52:17 EDT 2018
On Tue, 2018-04-10 at 07:26 +0000, Stefan Schlösser wrote:
> Hi,
>
> I have a 3 node setup on ubuntu 16.04. Corosync/Pacemaker services
> are not started automatically.
>
> If I put all 3 nodes to offline mode, with 1 node in an „unclean“
> state I get a never ending STONITH.
>
> What happens is that the STONITH causes a reboot of the unclean node.
>
> 1) I would have thought with all nodes in standby no STONITH can
> occur. Why does it?
Standby prevents a node from running resources, but it still
participates in quorum voting. I suspect *starting* a node in standby
mode would prevent it from using fence devices, but *changing* a node
to standby will have no effect on whether it can fence.
> 2) Why does it keep on killing the unclean node?
Good question. The DC's logs will have the most useful information --
each pengine run should say why fencing is being scheduled.
>
> The only way to stop it, is to temporarily disable stonith and bring
> the unclean node back online manually, and the enable it again.
>
> Here is a log extract of node c killing node a:
> Apr 10 09:08:30 [2276] xxx-c stonith-ng: notice: log_operation:
> Operation 'reboot' [2428] (call 5 from crmd.2175) for host 'xxx-a'
> with device 'stonith_a' returned: 0 (OK)
> Apr 10 09:08:30 [2276] xxx-c stonith-ng: notice: remote_op_done:
> Operation reboot of xxx-a by xxx-c for crmd.2175 at xxx-b.20531831: OK
> Apr 10 09:08:30 [2275] xxx-c cib: info:
> cib_process_request: Completed cib_modify operation for section
> status: OK (rc=0, origin=xxx-b/crmd/83, version=0.164.37)
> Apr 10 09:08:30 [2275] xxx-c cib: info:
> cib_process_request: Completed cib_delete operation for section
> //node_state[@uname='xxx-a']/lrm: OK (rc=0, origin=xxx-b/crmd/84,
> version=0.164.37)
> Apr 10 09:08:30 [2275] xxx-c cib: info:
> cib_process_request: Completed cib_delete operation for section
> //node_state[@uname='xxx-a']/transient_attributes: OK (rc=0,
> origin=xxx-b/crmd/85, version=0.164.37)
> Apr 10 09:08:30 [2275] xxx-c cib: info:
> cib_process_request: Completed cib_modify operation for section
> status: OK (rc=0, origin=xxx-b/crmd/86, version=0.164.37)
> Apr 10 09:08:30 [2275] xxx-c cib: info:
> cib_process_request: Completed cib_delete operation for section
> //node_state[@uname='xxx-a']/lrm: OK (rc=0, origin=xxx-b/crmd/87,
> version=0.164.37)
> Apr 10 09:08:30 [2275] xxx-c cib: info:
> cib_process_request: Completed cib_delete operation for section
> //node_state[@uname='xxx-a']/transient_attributes: OK (rc=0,
> origin=xxx-b/crmd/88, version=0.164.37)
>
> This the repeats forevermore ...
>
> Thanks for any hints,
>
> cheers,
>
> Stefan
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list