[Pacemaker] pgsql stays in Disconnect after online node

Tue Nov 19 00:16:02 EST 2013

Hi,

(2013/11/15 21:46), Schaefer, Diane E wrote:
> We are working with the pgsql RA and pacemaker 1.1.9.  We are having trouble understanding why our slave resource doesn't come out of DISCONNECT after we online the node.  We did the following:
>
> 1)      Put node running the Master (usrv-tsegp8) in Standby.  The old slave became the master and the Ip resources moved over.
>
> 2)      Took node (usrv-tsegp8) out of standby.  Postgres started and went into the HS:alone state but data status stayed disconnect.
>
> How do we get back to the sync state?

Did you sync data from new master to old master?

The follwing document may be helpful for you.

https://github.com/t-matsuo/resource-agents/wiki/Operation-examples-for-none-shared-wal-archives-environment

> usrv-tsegp9:~ # crm_mon -Afo1
> Last updated: Fri Nov 15 07:43:59 2013
> Last change: Fri Nov 15 07:43:46 2013 by root via crm_attribute on usrv-tsegp9
> Stack: classic openais (with plugin)
> Current DC: usrv-tsegp8 - partition with quorum
> Version: 1.1.9-2db99f1
> 2 Nodes configured, 2 expected votes
> 5 Resources configured.
>
>
> Online: [ usrv-tsegp8 usrv-tsegp9 ]
>
> stonith-sbd    (stonith:external/sbd): Started usrv-tsegp9
> Master/Slave Set: msPostgresql [FM_pgsql]
>       Masters: [ usrv-tsegp9 ]
>       Slaves: [ usrv-tsegp8 ]
> Resource Group: master-group
>       FM_access_ip       (ocf::unisys:IPaddr4):  Started usrv-tsegp9
>       FM_pgsqlrep_ip     (ocf::unisys:IPaddr4):  Started usrv-tsegp9
>
> Node Attributes:
> * Node usrv-tsegp8:
>      + FM_pgsql-data-status              : DISCONNECT
>      + FM_pgsql-status                   : HS:alone
>      + master-FM_pgsql                   : -INFINITY
> * Node usrv-tsegp9:
>      + FM_pgsql-data-status              : LATEST
>      + FM_pgsql-master-baseline          : 0000000004000000
>      + FM_pgsql-status                   : PRI
>      + master-FM_pgsql                   : 1000
>
> Operations:
> * Node usrv-tsegp8:
>     stonith-sbd: migration-threshold=1
>      + (2825) stop: rc=0 (ok)
>     FM_access_ip: migration-threshold=1
>      + (2936) monitor: interval=10000ms rc=0 (ok)
>      + (2984) stop: rc=0 (ok)
>     FM_pgsqlrep_ip: migration-threshold=1
>      + (2942) monitor: interval=10000ms rc=0 (ok)
>      + (2978) stop: rc=0 (ok)
>     FM_pgsql:0: migration-threshold=1
>      + (2951) probe: rc=8 (master)
>      + (2954) monitor: interval=3000ms rc=8 (master)
>      + (2993) start: rc=0 (ok)
>      + (2999) monitor: interval=4000ms rc=0 (ok)
> * Node usrv-tsegp9:
>     stonith-sbd: migration-threshold=1
>      + (2075) start: rc=0 (ok)
>     FM_access_ip: migration-threshold=1
>      + (2226) start: rc=0 (ok)
>      + (2230) monitor: interval=10000ms rc=0 (ok)
>     FM_pgsqlrep_ip: migration-threshold=1
>      + (2232) start: rc=0 (ok)
>      + (2236) monitor: interval=10000ms rc=0 (ok)
>     FM_pgsql:0: migration-threshold=1
>      + (2218) promote: rc=0 (ok)
>      + (2224) monitor: interval=3000ms rc=8 (master)
>
> usrv-tsegp9:~ # crm configure show
> node usrv-tsegp8 \
>          attributes standby="off" FM_pgsql-data-status="LATEST"
> node usrv-tsegp9 \
>          attributes standby="off" FM_pgsql-data-status="STREAMING|SYNC"
> primitive FM_access_ip ocf:unisys:IPaddr4 \
>          params ip="172.32.229.101" dotted_netmask="255.255.0.0" nic="eth1" \
>          op start interval="0s" timeout="60s" on-fail="restart" \
>          op monitor interval="10s" timeout="60s" on-fail="restart" \
>          op stop interval="0s" timeout="60s" on-fail="block"
> primitive FM_pgsql ocf:unisys:pgsql \
>          params rep_mode="sync" start_opt="-p 5432" node_list="usrv-tsegp8 usrv-tsegp9" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="172.32.229.102" logfile="/var/tmp/pslog" restart_on_promote="true" \
>          op start interval="0s" timeout="60s" \
>          op monitor interval="4s" timeout="60s" \
>          op monitor interval="3s" role="Master" timeout="60s" \
>          op demote interval="0s" timeout="60s" on-fail="stop" \
>          op stop interval="0s" timeout="60s" on-fail="block"
> primitive FM_pgsqlrep_ip ocf:unisys:IPaddr4 \
>          params ip="172.32.229.102" dotted_netmask="255.255.0.0" nic="eth1" \
>          op start interval="0s" timeout="60s" on-fail="stop" \
>          op monitor interval="10s" timeout="60s" on-fail="restart" \
>          op stop interval="0s" timeout="60s" on-fail="ignore"
> primitive stonith-sbd stonith:external/sbd
> group master-group FM_access_ip FM_pgsqlrep_ip
> ms msPostgresql FM_pgsql \
>          meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> colocation pgsql_colocation inf: master-group msPostgresql:Master
> order pgsql_order-1 0: msPostgresql:promote master-group:start symmetrical=false
> order pgsql_order-2 0: msPostgresql:demote master-group:stop symmetrical=false
> property $id="cib-bootstrap-options" \
>          stonith-enabled="true" \
>          no-quorum-policy="ignore" \
>          placement-strategy="balanced" \
>          dc-version="1.1.9-2db99f1" \
>          cluster-infrastructure="classic openais (with plugin)" \
>          expected-quorum-votes="2" \
>          crmd-transition-delay="0s" \
>          last-lrm-refresh="1384519310"
> rsc_defaults $id="rsc-options" \
>          resource-stickiness="INFINITY" \
>          migration-threshold="1" \
>          failure-timeout="60s"
> op_defaults $id="op-options" \
>          timeout="600" \
>          record-pending="false"
>
> Thanks for any help
> Diane Schaefer
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
NTTデータ先端技術株式会社
中平 和友
TEL: 03-5860-5135 FAX: 03-5463-6490
Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp