[Pacemaker] trouble with quorum

Wed May 22 22:46:01 UTC 2013

On 22/05/2013, at 10:25 PM, Groshev Andrey <greenx at yandex.ru> wrote:

> Hello,
> 
> I try build cluster with 2 nodes + one quorum node (without pacemaker).

This is the root of your problem.

Your config has:

> service {
>         name: pacemaker
>         ver: 1
> }

So even though you thought you only started corosync, you also started part of pacemaker.
Specifically the part of pacemaker that gets loaded into corosync to provides membership and _quorum_ APIs to the other daemons.

The output from corosync-quorumtool is completely irrelevant to pacemaker in this kind of setup.

Since you're on a RHEL derivative, I highly suggest using Pacemaker with CMAN (and updating to 6.4 while you're there :-).
In this case, the pacemaker daemons DO see the same quorum as corosync-quorumtool and your expectations would be correct.

Check out the quickstart: http://clusterlabs.org/quickstart-redhat.html

> The sequence of actions like the following:
> 
> 1. setup/start corosync on TREE nodes - all right.
> # corosync-quorumtool -l|sed 's/\..*$//'
> Nodeid    Votes  Name
> 295521290    1  dev-cluster2-node2
> 312298506    1  dev-cluster2-node3
> 329075722    1  dev-cluster2-node4
> 
> 2. start pacemaer on FIRST node.
> 3. write config with crmsh  .... stonith-enabled="false"
> 4. .... no-quorum-policy="ignore"
> 5. write main config ocf:heartbeat:pgsql
>     Like: https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication
>     But with one VIP on master PG
>     Resources are started on first node.
> 
> 6. Next. Sync PG data with TWO node.
> 7. start pacemaker on TWO node. Resource started too.
> 8. no-quorum-policy="stop".
> 
> Ok. All resources work on two nodes.
> See # corosync-quorumtool -l|sed 's/\..*$//'
> Nodeid    Votes  Name
> 295521290    1  dev-cluster2-node2
> 312298506    1  dev-cluster2-node3
> 329075722    1  dev-cluster2-node4
> 
> # corosync-quorumtool -s
> Version:          1.4.5
> Nodes:            3
> Ring ID:          12440
> Quorum type:      corosync_votequorum
> Quorate:          Yes
> Node votes:      1
> Expected votes:  3
> Highest expected: 3
> Total votes:      3
> Quorum:          2
> Flags:            Quorate
> 
> See crm_mon.
> # crm_mon -1|grep quor
> Current DC: dev-cluster2-node3.unix.tensor.ru - partition with quorum
> 
> Now, stop pacemaker on one node.
> #service pacemaker stop
> 
> # corosync-quorumtool -s
> Version:          1.4.5
> Nodes:            3
> Ring ID:          12440
> Quorum type:      corosync_votequorum
> Quorate:          Yes
> Node votes:      1
> Expected votes:  3
> Highest expected: 3
> Total votes:      3
> Quorum:          2
> Flags:            Quorate
> 
> Now, on too node stop corosync.
> crm_mon - says he lost a quorum, but the resources are not stopped.
> crm_mon -1|grep quor
> Current DC: dev-cluster2-node4.unix.tensor.ru - partition WITHOUT quorum
> 
> But corosync says that everything is fine ....
> # corosync-quorumtool -l|sed 's/\..*$//'
> Nodeid    Votes  Name
> 295521290    1  dev-cluster2-node2
> 329075722    1  dev-cluster2-node4
> 
> # corosync-quorumtool -s
> Version:          1.4.5
> Nodes:            2
> Ring ID:          12440
> Quorum type:      corosync_votequorum
> Quorate:          Yes
> Node votes:      1
> Expected votes:  3
> Highest expected: 3
> Total votes:      2
> Quorum:          2
> Flags:            Quorate
> 
> Configs corosync:
> totem {
>         version: 2
>         secauth: off
>         clear_node_high_bit: yes
>         threads: 0
>         interface {
>                 ringnumber: 0
> bindnetaddr: 10.76.157.18
> mcastaddr: 239.94.1.56
>                 mcastport: 5405
>                 ttl: 1
>         }
> }
> logging {
>         fileline: off
>         to_stderr: no
>         to_logfile: yes
>         to_syslog: no
>         logfile: /var/log/cluster/corosync.log
>         debug: on
>         timestamp: on
>         logger_subsys {
>                 subsys: AMF
>                 debug: on
>         }
> }
> 
> amf {
>         mode: disabled
> }
> service {
>         name: pacemaker
>         ver: 1
> }
> quorum {
>         provider: corosync_votequorum
>         expected_votes: 3
>         votes:  1
> }
> 
> 
> Why this strange behavior?
> 
> My environment:
> CentOS 6.3
> corosync 1.4.5 from opensuse-ha
> pacemaker 1.1.9 from http://clusterlabs.org/rpm-next/rhel-6/
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org