[Pacemaker] [RFC PATCH] Try to fix startup-fencing not happening
simone.gotti at gmail.com
Fri Mar 25 06:41:14 EDT 2011
On 03/25/2011 11:10 AM, Andrew Beekhof wrote:
> On Thu, Mar 17, 2011 at 11:54 PM, Simone Gotti <simone.gotti at gmail.com> wrote:
>> When using corosync + pcmk v1 starting both corosync and pacemakerd (and
>> I think also using heartbeat or anything other than cman) as quorum
>> provider, at startup in the CIB will not be a <node_state/> entry for
>> the nodes that are not in cluster.
> No, I'm pretty sure heartbeat has the same behavior.
I didn't tested it bit if it works like cman then I think that
startup-fencing won't work also on it. But this will be very strange.
>> Instead when using cman as quorum provider there will be a <node_state>
>> for every node known by cman as lib/common/ais.c:cman_event_callback
>> calls crm_update_peer for every node reported by cman_get_nodes.
>> Something similar will happen when using corosync+pcmkv1 if corosync is
>> started on N nodes but pacemakerd is started only on N-M nodes.
> Probably true.
>> All of this will break 'startup-fencing' because, from my understanding,
>> the logic is this:
>> 1) At startup all the nodes are marked (in
>> lib/pengine/unpack.c:unpack_node) as unclean.
>> 2) lib/pengine/unpack.c:unpack_status will cycle only the available
>> <node_state/> in the cib status section resetting them to a clean status
>> at the start and then putting them as unclean if some conditions are met.
>> 3) pengine/allocate.c:stage6 all the unclean nodes are fenced.
>> In the above conditions you'll have a <node_state/> in the cib status
>> section also for nodes without pacemakerd enabled and the startup
>> fencing won't happen because there isn't any condition in unpack_status
>> that will mark them as unclean.
> But they're unclean by default... so the lack of a node_state
> shouldn't affect that.
> Or did you mean "clean" instead of "unclean"?
The problem is not the lack of node state but the opposite, the presence
of a node state also if the nodes that haven't joined the cluster. This
happens with the current cman integration.
The nodes known to pacemaker are all setted as unclean by default (point
But if their <node_state/> is available in the CIB, then in point 2 they
will be set as clean (unclean=false) and no condition check in
unpack_status will mark them as unclean=true again.
>> I'm not very expert of the code. I discarded the solution to not
>> register at startup all the nodes known by cman but only the active ones
>> as it won't fix the corosync+pcmkv1 case.
>> Instead I tried to understand when a node that has its status in the cib
>> should be startup fenced and a possible solution is in the attached patch.
>> I noticed that when crm_update_peer inserts a new node this one doesn't
>> have the expected attribute set. So if startup-fencing is enabled I'm
>> going to set the node as expected up.
> You lost me there... isn't this covered by just setting startup-fencing=false?
I lost you too :D . The problem is that startup-fencing is not working.
Anyway. This first patche is a sort of attempt to make startup-fencing
work when in the CIB there are <node_state/> tags also for nodes not in
the cluster. But it was a fast attempt that I don't like it as my
intention was primarily to explain the actual problem. But probably I
wasn't very clear in doing this. Sorry.
In the mail a sent after this one, I tried to make a first step changing
the behavior of the cman integration to make it work like the other
implementations: add <node_state/> tag only for the hosts that joined
More information about the Pacemaker