[Pacemaker] Fencing order
Pavel Levshin
pavel at levshin.spb.ru
Mon Mar 21 15:06:13 UTC 2011
Hi.
Today, we had a network outage. Quite a few problems suddenly arised in
out setup, including crashed corosync, known notify bug in DRBD RA and
some problem with VirtualDomain RA timeout on stop.
But particularly strange was fencing behaviour.
Initially, one node (wapgw1-1) has parted from the cluster. When
connection was restored, corosync has died on that node. It was
considered "offline unclean" and was scheduled to be fenced. Fencing by
HP iLO did not work (currently, I do not know why). Second priority
fencing method is meatware, and it did take time.
Second node, wapgw1-2, hit DRBD notify bug and failed to stop some
resources. It was "online unclean". It also was scheduled to be fenced.
HP iLO was available for this node, but it had not been STONITHed until
I manually confirmed STONITH for wapgw1-1.
When I confirmed first node restart, second node was fenced automatically.
Is this ordering intended behaviour or a bug?
It's pacemaker 1.0.10, corosync 1.2.7. Three-node cluster.
--
Pavel Levshin
More information about the Pacemaker
mailing list