[Pacemaker] pgsql troubles.

Wed Jan 7 23:13:43 EST 2015

> On 5 Dec 2014, at 4:16 am, steve <steve at unliketea.com> wrote:
> 
> Good Afternoon,
> 
> 
> I am having loads of trouble with pacemaker/corosync/postgres. Defining the symptoms is rather difficult.   The primary being that postgres  starts as slave on both nodes.  I have tested the pgsqlRA start/stop/status/monitor and they work from the command line after I setup the environment.  I have not been able to get promote/demote to work, there are issues with NODENAME not being defined.

You're trying to follow http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster ?

Its not being promoted because of:

>    + master-pgsql:0                    : -INFINITY

which is set as part of the start action.

Typically I've seen this as a result of 'node_list' being "wrong" in some way.
I'm no expert though.

> 
> I am able to run postgres in master/slave mode outside of pacemaker.
> 
> I can provide additional logs but here is a start.
> 
> Distributor ID:	Ubuntu
> Description:	Ubuntu 12.04.3 LTS
> Release:	12.04
> Codename:	precise
> 
> latest verions of pgsql RA (yesterday)
> pacemaker          1.1.6-2ubuntu3.1   HA cluster resource manager
> corosync           1.4.2-2            Standards-based cluster framework (daemon and module
> resource-agents                  1:3.9.2-5ubuntu4.1           Cluster Resource Agents
> I have upgraded pgsqlRA to the lastest from git.
> 
> 
> ============
> Last updated: Wed Nov 26 13:55:59 2014
> Last change: Wed Nov 26 13:55:58 2014 via crm_attribute on tstdb04
> Stack: openais
> Current DC: tstdb04 - partition with quorum
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
> 
> Online: [ tstdb03 tstdb04 ]
> 
> Full list of resources:
> 
> Resource Group: master-group
>     vip-master (ocf::heartbeat:IPaddr2):       Stopped
>     vip-rep    (ocf::heartbeat:IPaddr2):       Stopped
> Master/Slave Set: msPostgresql [pgsql]
>     Slaves: [ tstdb04 ]
>     Stopped: [ pgsql:0 ]
> 
> Node Attributes:
> * Node tstdb03:
>    + master-pgsql:0                    : -INFINITY
>    + pgsql-data-status                 : DISCONNECT
> * Node tstdb04:
>    + master-pgsql:1                    : -INFINITY
>    + pgsql-data-status                 : DISCONNECT
> 
> Migration summary:
> * Node tstdb04:
> * Node tstdb03:
>   pgsql:0: migration-threshold=1 fail-count=1000000
> 
> Failed actions:
>    pgsql:0_start_0 (node=tstdb03, call=5, rc=1, status=complete): unknown error
> 
> 
> config:
> property \
>     no-quorum-policy="ignore" \
>     stonith-enabled="false" \
>     crmd-transition-delay="0"
> 
> rsc_defaults \
>     resource-stickiness="INFINITY" \
>     migration-threshold="1"
> 
> group master-group \
>       vip-master \
>       vip-rep
> 
> primitive vip-master ocf:heartbeat:IPaddr2 \
>     params \
>         ip="10.132.101.95" \
>         nic="eth0" \
>         cidr_netmask="24" \
>     op start   timeout="60s" interval="0"  on-fail="restart" \
>     op monitor timeout="60s" interval="10s" on-fail="restart" \
>     op stop    timeout="60s" interval="0"  on-fail="block"
> 
> primitive vip-rep ocf:heartbeat:IPaddr2 \
>     params \
>         ip="10.132.101.96" \
>         nic="eth0" \
>         cidr_netmask="24" \
>     meta \
>             migration-threshold="0" \
>     op start   timeout="60s" interval="0"  on-fail="stop" \
>     op monitor timeout="60s" interval="10s" on-fail="restart" \
>     op stop    timeout="60s" interval="0"  on-fail="ignore"
> 
> master msPostgresql pgsql \
>     meta \
>         master-max="1" \
>         master-node-max="1" \
>         clone-max="2" \
>         clone-node-max="1" \
>         notify="true"
> 
> primitive pgsql ocf:heartbeat:pgsql \
>     params \
>         pgctl="/usr/bin/pg_ctl" \
>         psql="/usr/bin/psql" \
>         pgdata="/database/9.3" \
> 	 config="/etc/postgresql/9.3/main/postgresql.conf" \
> 	 socketdir=/var/run/postgresql \
>         rep_mode="sync" \
>         node_list="tstdb03 tstdb04" \
>         restore_command="cp /database/archive/%f %p" \
>         primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \
>         master_ip="10.132.101.95" \
>         restart_on_promote="true" \
> 	 logfile=/var/log/postgresql/postgresql-9.3-main.log \
>     op start   timeout="60s" interval="0"  on-fail="restart" \
>     op monitor timeout="60s" interval="4s" on-fail="restart" \
>     op monitor timeout="60s" interval="3s"  on-fail="restart" role="Master" \
>     op promote timeout="60s" interval="0"  on-fail="restart" \
>     op demote  timeout="60s" interval="0"  on-fail="stop" \
>     op stop    timeout="60s" interval="0"  on-fail="block" \
>     op notify  timeout="60s" interval="0"
> 
> #colocation rsc_colocation-1 inf: vip-master msPostgresql:Master
> #order rsc_order-1 0: msPostgresql:promote  vip-master:start  symmetrical=false
> #order rsc_order-2 0: msPostgresql:demote   vip-rep:stop   symmetrical=false
> 
> colocation rsc_colocation-1 inf: master-group msPostgresql:Master
> order rsc_order-1 0: msPostgresql:promote  master-group:start  symmetrical=false
> order rsc_order-2 0: msPostgresql:demote   master-group:stop   symmetrical=false
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org