[ClusterLabs] STONITH forever?

Stefan Schlösser sschloesser at enomic.com
Tue Apr 10 03:26:07 EDT 2018


Hi,

I have a 3 node setup on ubuntu 16.04. Corosync/Pacemaker services are not started automatically.

If I put all 3 nodes to offline mode, with 1 node in an "unclean" state I get a never ending STONITH.

What happens is that the STONITH causes a reboot of the unclean node.

1) I would have thought with all nodes in standby no STONITH can occur. Why does it?

2) Why does it keep on killing the unclean node?

The only way to stop it, is to temporarily disable stonith and bring the unclean node back online manually, and the enable it again.

Here is a log extract of node c killing node a:
Apr 10 09:08:30 [2276] xxx-c stonith-ng:   notice: log_operation:   Operation 'reboot' [2428] (call 5 from crmd.2175) for host 'xxx-a' with device 'stonith_a' returned: 0 (OK)
Apr 10 09:08:30 [2276] xxx-c stonith-ng:   notice: remote_op_done:  Operation reboot of xxx-a by xxx-c for crmd.2175 at xxx-b.20531831: OK
Apr 10 09:08:30 [2275] xxx-c        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=xxx-b/crmd/83, version=0.164.37)
Apr 10 09:08:30 [2275] xxx-c        cib:     info: cib_process_request:     Completed cib_delete operation for section //node_state[@uname='xxx-a']/lrm: OK (rc=0, origin=xxx-b/crmd/84, version=0.164.37)
Apr 10 09:08:30 [2275] xxx-c        cib:     info: cib_process_request:     Completed cib_delete operation for section //node_state[@uname='xxx-a']/transient_attributes: OK (rc=0, origin=xxx-b/crmd/85, version=0.164.37)
Apr 10 09:08:30 [2275] xxx-c        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=xxx-b/crmd/86, version=0.164.37)
Apr 10 09:08:30 [2275] xxx-c        cib:     info: cib_process_request:     Completed cib_delete operation for section //node_state[@uname='xxx-a']/lrm: OK (rc=0, origin=xxx-b/crmd/87, version=0.164.37)
Apr 10 09:08:30 [2275] xxx-c        cib:     info: cib_process_request:     Completed cib_delete operation for section //node_state[@uname='xxx-a']/transient_attributes: OK (rc=0, origin=xxx-b/crmd/88, version=0.164.37)

This the repeats forevermore ...

Thanks for any hints,

cheers,

Stefan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180410/e7b57942/attachment.html>


More information about the Users mailing list