[Pacemaker] Getting Started

Wed Dec 5 09:44:03 UTC 2012

Ok, almost there :)

  I'm  having some trouble with VIPs either not starting or starting on the wrong node (so something isn't right :)).

Lab04 should be the master (vipMaster), lab05 slave (vipSlave)

(Postgres is up and running as a replication slave on lab05, although it's being reported as stopped...)

Output from crm_mon -Af

Last updated: Wed Dec  5 09:35:58 2012
Last change: Wed Dec  5 09:35:57 2012 via crm_attribute on lab04
Stack: openais
Current DC: lab04 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
6 Resources configured.
============

Online: [ lab05 lab04 ]

 Master/Slave Set: msPostgreSQL [pgsql]
     Masters: [ lab04 ]
     Stopped: [ pgsql:1 ]
vipSlave        (ocf::heartbeat:IPaddr2):       Started lab04
 Clone Set: clnPingCheck [pingCheck]
     Started: [ lab04 ]
     Stopped: [ pingCheck:1 ]
vipMaster       (ocf::heartbeat:IPaddr2):       Started lab04

Node Attributes:
* Node lab05:
    + master-pgsql:0                    : -INFINITY
    + master-pgsql:1                    : 100
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : STOP
* Node lab04:
    + master-pgsql:0                    : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-master-baseline             : 000000000A000200
    + pgsql-status                      : PRI
    + pingNodes                         : 200

Migration summary:
* Node lab04:
* Node lab05:

 How do I migrate vipSalve to node lab05?

  I've tried
  # crm resource migrate vipSlave lab05

I did find this in the corosync log
Dec 05 09:35:58 [2064] lab04    pengine:   notice: unpack_rsc_op:       Operation monitor found resource vipMaster active on lab04
Dec 05 09:35:58 [2064] lab04    pengine:   notice: unpack_rsc_op:       Operation monitor found resource pgsql:0 active in master mode on lab04
Dec 05 09:35:58 [2064] lab04    pengine:   notice: unpack_rsc_op:       Operation monitor found resource vipSlave active on lab04
Dec 05 09:35:58 [2064] lab04    pengine:   notice: unpack_rsc_op:       Operation monitor found resource pingCheck:0 active on lab04
Dec 05 09:35:58 [2064] lab04    pengine:   notice: unpack_rsc_op:       Operation monitor found resource pgsql:1 active on lab05
Dec 05 09:35:58 [2064] lab04    pengine:  warning: common_apply_stickiness:     Forcing clnPingCheck away from lab05 after 1 failures (max=1)
Dec 05 09:35:58 [2064] lab04    pengine:  warning: common_apply_stickiness:     Forcing clnPingCheck away from lab05 after 1 failures (max=1)

If it helps, pingCheck config:

primitive pingCheck ocf:pacemaker:ping \
        params \
                name="pingNodes" \
                host_list="192.168.0.12 192.168.0.13" \
                multiplier="100" \
        op start interval="0" timeout="60s" on-fail="restart" \
        op monitor interval="10" timeout="60s" on-fail="restart" \
        op stop interval="0" timeout="60s" on-fail="ignore"

Thanks again,
Brett