[Pacemaker] [RFC PATCH] Try to fix startup-fencing not happening
Andrew Beekhof
andrew at beekhof.net
Fri Mar 25 10:10:52 UTC 2011
On Thu, Mar 17, 2011 at 11:54 PM, Simone Gotti <simone.gotti at gmail.com> wrote:
> Hi,
>
> When using corosync + pcmk v1 starting both corosync and pacemakerd (and
> I think also using heartbeat or anything other than cman) as quorum
> provider, at startup in the CIB will not be a <node_state/> entry for
> the nodes that are not in cluster.
No, I'm pretty sure heartbeat has the same behavior.
>
> Instead when using cman as quorum provider there will be a <node_state>
> for every node known by cman as lib/common/ais.c:cman_event_callback
> calls crm_update_peer for every node reported by cman_get_nodes.
Yep
> Something similar will happen when using corosync+pcmkv1 if corosync is
> started on N nodes but pacemakerd is started only on N-M nodes.
Probably true.
> All of this will break 'startup-fencing' because, from my understanding,
> the logic is this:
>
> 1) At startup all the nodes are marked (in
> lib/pengine/unpack.c:unpack_node) as unclean.
> 2) lib/pengine/unpack.c:unpack_status will cycle only the available
> <node_state/> in the cib status section resetting them to a clean status
> at the start and then putting them as unclean if some conditions are met.
> 3) pengine/allocate.c:stage6 all the unclean nodes are fenced.
>
> In the above conditions you'll have a <node_state/> in the cib status
> section also for nodes without pacemakerd enabled and the startup
> fencing won't happen because there isn't any condition in unpack_status
> that will mark them as unclean.
But they're unclean by default... so the lack of a node_state
shouldn't affect that.
Or did you mean "clean" instead of "unclean"?
>
> I'm not very expert of the code. I discarded the solution to not
> register at startup all the nodes known by cman but only the active ones
> as it won't fix the corosync+pcmkv1 case.
>
> Instead I tried to understand when a node that has its status in the cib
> should be startup fenced and a possible solution is in the attached patch.
> I noticed that when crm_update_peer inserts a new node this one doesn't
> have the expected attribute set. So if startup-fencing is enabled I'm
> going to set the node as expected up.
You lost me there... isn't this covered by just setting startup-fencing=false?
More information about the Pacemaker
mailing list