[Pacemaker] pgsql stays in Disconnect after online node

Schaefer, Diane E diane.schaefer at unisys.com
Fri Nov 15 07:46:42 EST 2013


We are working with the pgsql RA and pacemaker 1.1.9.  We are having trouble understanding why our slave resource doesn't come out of DISCONNECT after we online the node.  We did the following:

1)      Put node running the Master (usrv-tsegp8) in Standby.  The old slave became the master and the Ip resources moved over.

2)      Took node (usrv-tsegp8) out of standby.  Postgres started and went into the HS:alone state but data status stayed disconnect.

How do we get back to the sync state?

usrv-tsegp9:~ # crm_mon -Afo1
Last updated: Fri Nov 15 07:43:59 2013
Last change: Fri Nov 15 07:43:46 2013 by root via crm_attribute on usrv-tsegp9
Stack: classic openais (with plugin)
Current DC: usrv-tsegp8 - partition with quorum
Version: 1.1.9-2db99f1
2 Nodes configured, 2 expected votes
5 Resources configured.


Online: [ usrv-tsegp8 usrv-tsegp9 ]

stonith-sbd    (stonith:external/sbd): Started usrv-tsegp9
Master/Slave Set: msPostgresql [FM_pgsql]
     Masters: [ usrv-tsegp9 ]
     Slaves: [ usrv-tsegp8 ]
Resource Group: master-group
     FM_access_ip       (ocf::unisys:IPaddr4):  Started usrv-tsegp9
     FM_pgsqlrep_ip     (ocf::unisys:IPaddr4):  Started usrv-tsegp9

Node Attributes:
* Node usrv-tsegp8:
    + FM_pgsql-data-status              : DISCONNECT
    + FM_pgsql-status                   : HS:alone
    + master-FM_pgsql                   : -INFINITY
* Node usrv-tsegp9:
    + FM_pgsql-data-status              : LATEST
    + FM_pgsql-master-baseline          : 0000000004000000
    + FM_pgsql-status                   : PRI
    + master-FM_pgsql                   : 1000

Operations:
* Node usrv-tsegp8:
   stonith-sbd: migration-threshold=1
    + (2825) stop: rc=0 (ok)
   FM_access_ip: migration-threshold=1
    + (2936) monitor: interval=10000ms rc=0 (ok)
    + (2984) stop: rc=0 (ok)
   FM_pgsqlrep_ip: migration-threshold=1
    + (2942) monitor: interval=10000ms rc=0 (ok)
    + (2978) stop: rc=0 (ok)
   FM_pgsql:0: migration-threshold=1
    + (2951) probe: rc=8 (master)
    + (2954) monitor: interval=3000ms rc=8 (master)
    + (2993) start: rc=0 (ok)
    + (2999) monitor: interval=4000ms rc=0 (ok)
* Node usrv-tsegp9:
   stonith-sbd: migration-threshold=1
    + (2075) start: rc=0 (ok)
   FM_access_ip: migration-threshold=1
    + (2226) start: rc=0 (ok)
    + (2230) monitor: interval=10000ms rc=0 (ok)
   FM_pgsqlrep_ip: migration-threshold=1
    + (2232) start: rc=0 (ok)
    + (2236) monitor: interval=10000ms rc=0 (ok)
   FM_pgsql:0: migration-threshold=1
    + (2218) promote: rc=0 (ok)
    + (2224) monitor: interval=3000ms rc=8 (master)

usrv-tsegp9:~ # crm configure show
node usrv-tsegp8 \
        attributes standby="off" FM_pgsql-data-status="LATEST"
node usrv-tsegp9 \
        attributes standby="off" FM_pgsql-data-status="STREAMING|SYNC"
primitive FM_access_ip ocf:unisys:IPaddr4 \
        params ip="172.32.229.101" dotted_netmask="255.255.0.0" nic="eth1" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="block"
primitive FM_pgsql ocf:unisys:pgsql \
        params rep_mode="sync" start_opt="-p 5432" node_list="usrv-tsegp8 usrv-tsegp9" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="172.32.229.102" logfile="/var/tmp/pslog" restart_on_promote="true" \
        op start interval="0s" timeout="60s" \
        op monitor interval="4s" timeout="60s" \
        op monitor interval="3s" role="Master" timeout="60s" \
        op demote interval="0s" timeout="60s" on-fail="stop" \
        op stop interval="0s" timeout="60s" on-fail="block"
primitive FM_pgsqlrep_ip ocf:unisys:IPaddr4 \
        params ip="172.32.229.102" dotted_netmask="255.255.0.0" nic="eth1" \
        op start interval="0s" timeout="60s" on-fail="stop" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="ignore"
primitive stonith-sbd stonith:external/sbd
group master-group FM_access_ip FM_pgsqlrep_ip
ms msPostgresql FM_pgsql \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation pgsql_colocation inf: master-group msPostgresql:Master
order pgsql_order-1 0: msPostgresql:promote master-group:start symmetrical=false
order pgsql_order-2 0: msPostgresql:demote master-group:stop symmetrical=false
property $id="cib-bootstrap-options" \
        stonith-enabled="true" \
        no-quorum-policy="ignore" \
        placement-strategy="balanced" \
        dc-version="1.1.9-2db99f1" \
        cluster-infrastructure="classic openais (with plugin)" \
        expected-quorum-votes="2" \
        crmd-transition-delay="0s" \
        last-lrm-refresh="1384519310"
rsc_defaults $id="rsc-options" \
        resource-stickiness="INFINITY" \
        migration-threshold="1" \
        failure-timeout="60s"
op_defaults $id="op-options" \
        timeout="600" \
        record-pending="false"

Thanks for any help
Diane Schaefer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131115/c2e57827/attachment-0002.html>


More information about the Pacemaker mailing list