[Pacemaker] resources does not start on survied node after reboot

Tue Oct 29 14:12:51 UTC 2013

Hi!

I have a 2-node cluster with shared storage and SBD-fencing.
One node was down for maintenance.
Due to external reasons, second node was rebotted. After reboot service
never got up:

Oct 29 13:04:21 wcs2 pengine[2362]:  warning: stage6: Scheduling Node wcs1
for STONITH
Oct 29 13:04:21 wcs2 crmd[2363]:   notice: te_fence_node: Executing reboot
fencing operation (53) on wcs1 (timeout=60000)
Oct 29 13:05:33 wcs2 stonith-ng[2359]:    error: remote_op_done: Operation
reboot of wcs1 by wcs2 for crmd.2363 at wcs2.4a3b045d: Timer expired
Oct 29 13:05:33 wcs2 crmd[2363]:   notice: tengine_stonith_callback:
Stonith operation 2/53:0:0:f56c4538-1ad8-4871-825e-167eb9304677: Timer
expired (-62)
Oct 29 13:05:33 wcs2 crmd[2363]:   notice: tengine_stonith_callback:
Stonith operation 2 for wcs1 failed (Timer expired): aborting transition.
Oct 29 13:05:33 wcs2 crmd[2363]:   notice: tengine_stonith_notify: Peer
wcs1 was not terminated (st_notify_fence) by wcs2 for wcs2: Timer expired
(ref=4a3b045d-cc08-4e2f-8279-a85d113781b2) by client crmd.2363
Oct 29 13:05:33 wcs2 crmd[2363]:   notice: run_graph: Transition 0
(Complete=20, Pending=0, Fired=0, Skipped=29, Incomplete=0,
Source=/usr/var/lib/pacemaker/pengine/pe-warn-54.bz2): Stopped
Oct 29 13:05:33 wcs2 pengine[2362]:   notice: unpack_config: On loss of CCM
Quorum: Ignore
Oct 29 13:05:33 wcs2 pengine[2362]:  warning: stage6: Scheduling Node wcs1
for STONITH

And this runs forever in cycle...

The node wcs1 is off, should not SBD determine that, and should not the
cluster start the resources?

Best regards,
Alexandr A. Alexandrov

-- 
С уважением, ААА.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131029/6dfccc13/attachment-0003.html>