[Pacemaker] pgsql RA - slave is in HS:ASYNC status and won; t promote
東一彦
higashi.kazuhiko at lab.ntt.co.jp
Tue Jan 14 01:50:59 UTC 2014
Hi,
> but after some tests something went wrong and i don't know what and why and how to get it back working ... now when i start crm, master is PRI, but slave gets into HS:ASYNC state .. and when master fails, and slave gets into HS:alone state
It is PostgreSQL to select the node whether "sync" or "async".
pgsql RA displays a result of the following SQL.
select application_name,upper(state),upper(sync_state) from pg_stat_replication;
So, at first, please watch PostgreSQL's log.
Possibly the data may become inconsistent.
You can resolve the inconsistency in the following operation.
http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#after_fail-over
Regards,
Kazuhiko HIGASHI
(2014/01/10 17:48), Tomáš Vajrauch wrote:
> Hi,
>
> i am trying to run postgresql cluster with streaming replication using pgsql RA and pacemaker ..
> i succeded once, master was as PRI, slave HS:sync, failover worked as it should (slave become master) ..
> but after some tests something went wrong and i don't know what and why and how to get it back working ... now when i start crm, master is PRI, but slave gets into HS:ASYNC state .. and when master fails, and slave gets into HS:alone state
>
> can somebody please give me hint what should i do or what should i look for?
>
> Thanks a lot for any help
> Tomas
>
> my configuration:
>
> node jboss-test \
> attributes pgsql-data-status="LATEST"
> node jboss-test2 \
> attributes pgsql-data-status="STREAMING|ASYNC"
> primitive pgsql ocf:heartbeat:pgsql \
> params pgctl="/opt/postgres/9.3/bin/pg_ctl" psql="/opt/postgres/9.3/bin/psql" pgdata="/opt/postgres/9.3/data/" rep_mode="sync" node_list="jboss-test jboss-test2" restore_command="cp /opt/postgres/9.3/data/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="172.16.111.120" stop_escalate="0" \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="block" \
> op monitor interval="11s" timeout="60s" on-fail="restart" \
> op monitor interval="10s" role="Master" timeout="60s" on-fail="restart" \
> op promote interval="0s" timeout="60s" on-fail="restart" \
> op demote interval="0s" timeout="60s" on-fail="block" \
> op notify interval="0s" timeout="60s"
> primitive pingCheck ocf:pacemaker:ping \
> params name="default_ping_set" host_list="172.16.0.1" multiplier="100" \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op monitor interval="2s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="ignore"
> primitive vip-master ocf:heartbeat:IPaddr2 \
> params ip="172.16.111.110" nic="eth0" cidr_netmask="24" \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op monitor interval="10s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="block"
> primitive vip-rep ocf:heartbeat:IPaddr2 \
> params ip="172.16.111.120" nic="eth0" cidr_netmask="24" \
> meta migration-threshold="0" \
> op start interval="0s" timeout="60s" on-fail="stop" \
> op monitor interval="10s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="block"
> primitive vip-slave ocf:heartbeat:IPaddr2 \
> params ip="172.16.111.111" nic="eth0" cidr_netmask="24" \
> meta resource-stickiness="1" \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op monitor interval="10s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="block"
> group master-group vip-master vip-rep \
> meta ordered="false"
> ms msPostgresql pgsql \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> clone clnPingCheck pingCheck
> location rsc_location-1 vip-slave \
> rule $id="rsc_location-1-rule" 200: pgsql-status eq HS:sync \
> rule $id="rsc_location-1-rule-0" 190: pgsql-status eq HS:async \
> rule $id="rsc_location-1-rule-1" 100: pgsql-status eq PRI \
> rule $id="rsc_location-1-rule-2" -inf: not_defined pgsql-status \
> rule $id="rsc_location-1-rule-3" -inf: pgsql-status ne HS:sync and pgsql-status ne PRI and pgsql-status ne HS:async
> location rsc_location-2 msPostgresql \
> rule $id="rsc_location-3-rule" -inf: not_defined default_ping_set or default_ping_set lt 100
> colocation rsc_colocation-1 inf: msPostgresql clnPingCheck
> colocation rsc_colocation-2 inf: master-group msPostgresql:Master
> order rsc_order-1 0: clnPingCheck msPostgresql
> order rsc_order-2 0: msPostgresql:promote master-group:start symmetrical=false
> order rsc_order-3 0: msPostgresql:demote master-group:stop symmetrical=false
> property $id="cib-bootstrap-options" \
> no-quorum-policy="ignore" \
> stonith-enabled="false" \
> crmd-transition-delay="0s" \
> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> last-lrm-refresh="1389301940"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="INFINITY" \
> migration-threshold="1"
>
> crm_mon -Afr:
> ============
> Last updated: Fri Jan 10 09:46:29 2014
> Last change: Fri Jan 10 09:46:29 2014 by root via crm_attribute on jboss-test
> Stack: openais
> Current DC: jboss-test - partition with quorum
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 2 Nodes configured, 2 expected votes
> 7 Resources configured.
> ============
>
> Online: [ jboss-test jboss-test2 ]
>
> Full list of resources:
>
> Clone Set: clnPingCheck [pingCheck]
> Started: [ jboss-test jboss-test2 ]
> Master/Slave Set: msPostgresql [pgsql]
> Masters: [ jboss-test ]
> Slaves: [ jboss-test2 ]
> vip-slave (ocf::heartbeat:IPaddr2): Started jboss-test2
> Resource Group: master-group
> vip-master (ocf::heartbeat:IPaddr2): Started jboss-test
> vip-rep (ocf::heartbeat:IPaddr2): Started jboss-test
>
> Node Attributes:
> * Node jboss-test:
> + default_ping_set : 100
> + master-pgsql:0 : 1000
> + pgsql-data-status : LATEST
> + pgsql-master-baseline : 0000000039004DF0
> + pgsql-status : PRI
> * Node jboss-test2:
> + default_ping_set : 100
> + master-pgsql:1 : -INFINITY
> + pgsql-data-status : STREAMING|ASYNC
> + pgsql-status : HS:async
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
--
----------------------------------------------------
東 一彦
NTT OSSセンタ 基盤技術ユニット 高信頼担当
(SV総研 ソフトウェアイノベーションセンタ OSS推進PJ)
Mail:higashi.kazuhiko at lab.ntt.co.jp
Tel :03-5860-5135
〒108-8019 東京都港区港南1-9-1 NTT品川TWINSビル11階
----------------------------------------------------
More information about the Pacemaker
mailing list