[Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

Wed Apr 10 18:15:44 EDT 2013

Hi,

[I did go through the mail thread titled: "RHEL6 and clones: CMAN needed
anyway?", but was not sure about some answers there]

I recently moved from pacemaker 1.1.7 to 1.1.8-7 on centos 6.2. I see the
following in syslog:

corosync[2966]:   [pcmk  ] ERROR: process_ais_conf: You have configured a
cluster using the Pacemaker plugin for Corosync. The plugin is not
supported in this environment and will be removed very soon.
corosync[2966]:   [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8
of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on
using Pacemaker with CMAN

Does this mean that my current configuration is incorrect and will not work
as it used to with pacemaker 1.1.7/Corosync?

I looked at the "Clusters from Scratch" instructions and it talks mostly
about GFS2. I don't have any filesystem requirements. In that case, can I
live with Pacemaker/Corosync?

I do understand that this config is not recommended, but the reason I ask
is because I am hitting a weird problem with this setup which I will
explain below. Just want to make sure that I don't start off with an
erroneous setup.

I have a two-node multi-state resource configured with the following config:

[root at vsanqa4 ~]# crm configure show
node vsanqa3
node vsanqa4
primitive vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e
ocf:heartbeat:vgc-cm-agent.ocf \
        params cluster_uuid="6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e" \
        op monitor interval="30s" role="Master" timeout="100s" \
        op monitor interval="31s" role="Slave" timeout="100s"
ms ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e
vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
        meta clone-max="2" globally-unique="false" target-role="Started"
location ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes
ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
        rule $id="ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes-rule" -inf:
#uname ne vsanqa4 and #uname ne vsanqa3
property $id="cib-bootstrap-options" \
        dc-version="1.1.8-7.el6-394e906" \
        cluster-infrastructure="classic openais (with plugin)" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"

With this config, if I simulate a crash on the master with "echo c >
/proc/sysrq-trigger", the slave does not get promoted for about 15 minutes.
It does detect the peer going down, but does not seem to issue the promote
immediately:

Apr 10 14:12:32 vsanqa4 corosync[2966]:   [TOTEM ] A processor failed,
forming new configuration.
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] notice:
pcmk_peer_update: Transitional membership event on ring 166060: memb=1,
new=0, lost=1
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update:
memb: vsanqa4 1967394988
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update:
lost: vsanqa3 1950617772
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] notice:
pcmk_peer_update: Stable membership event on ring 166060: memb=1, new=0,
lost=0
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update:
MEMB: vsanqa4 1967394988
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info:
ais_mark_unseen_peer_dead: Node vsanqa3 was not seen in the previous
transition
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: update_member:
Node 1950617772/vsanqa3 is now: lost
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info:
send_member_notification: Sending membership update 166060 to 2 children
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Apr 10 14:12:38 vsanqa4 cib[3386]:   notice: ais_dispatch_message:
Membership 166060: quorum lost
Apr 10 14:12:38 vsanqa4 crmd[3391]:   notice: ais_dispatch_message:
Membership 166060: quorum lost
Apr 10 14:12:38 vsanqa4 cib[3386]:   notice: crm_update_peer_state:
crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost
Apr 10 14:12:38 vsanqa4 crmd[3391]:   notice: crm_update_peer_state:
crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [CPG   ] chosen downlist: sender
r(0) ip(172.16.68.117) ; members(old:2 left:1)
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [MAIN  ] Completed service
synchronization, ready to provide service.

Then (after about 15 minutes), I see the following:

Apr 10 14:26:46 vsanqa4 crmd[3391]:   notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: unpack_config: On loss of
CCM Quorum: Ignore
Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: LogActions: Promote
vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave -> Master vsanqa4)
Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: process_pe_message:
Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2

Thanks,
Pavan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130410/0edfadc6/attachment-0002.html>