[Pacemaker] pgsql troubles.

Thu Dec 4 17:16:16 UTC 2014

Good Afternoon,

I am having loads of trouble with pacemaker/corosync/postgres. Defining 
the symptoms is rather difficult.   The primary being that postgres  
starts as slave on both nodes.  I have tested the pgsqlRA 
start/stop/status/monitor and they work from the command line after I 
setup the environment.  I have not been able to get promote/demote to 
work, there are issues with NODENAME not being defined.

I am able to run postgres in master/slave mode outside of pacemaker.

I can provide additional logs but here is a start.

Distributor ID:	Ubuntu
Description:	Ubuntu 12.04.3 LTS
Release:	12.04
Codename:	precise

latest verions of pgsql RA (yesterday)
pacemaker          1.1.6-2ubuntu3.1   HA cluster resource manager
corosync           1.4.2-2            Standards-based cluster framework 
(daemon and module
resource-agents                  1:3.9.2-5ubuntu4.1           Cluster 
Resource Agents
I have upgraded pgsqlRA to the lastest from git.

============
Last updated: Wed Nov 26 13:55:59 2014
Last change: Wed Nov 26 13:55:58 2014 via crm_attribute on tstdb04
Stack: openais
Current DC: tstdb04 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ tstdb03 tstdb04 ]

Full list of resources:

  Resource Group: master-group
      vip-master (ocf::heartbeat:IPaddr2):       Stopped
      vip-rep    (ocf::heartbeat:IPaddr2):       Stopped
  Master/Slave Set: msPostgresql [pgsql]
      Slaves: [ tstdb04 ]
      Stopped: [ pgsql:0 ]

Node Attributes:
* Node tstdb03:
     + master-pgsql:0                    : -INFINITY
     + pgsql-data-status                 : DISCONNECT
* Node tstdb04:
     + master-pgsql:1                    : -INFINITY
     + pgsql-data-status                 : DISCONNECT

Migration summary:
* Node tstdb04:
* Node tstdb03:
    pgsql:0: migration-threshold=1 fail-count=1000000

Failed actions:
     pgsql:0_start_0 (node=tstdb03, call=5, rc=1, status=complete): 
unknown error

config:
property \
      no-quorum-policy="ignore" \
      stonith-enabled="false" \
      crmd-transition-delay="0"

rsc_defaults \
      resource-stickiness="INFINITY" \
      migration-threshold="1"

group master-group \
        vip-master \
        vip-rep

primitive vip-master ocf:heartbeat:IPaddr2 \
      params \
          ip="10.132.101.95" \
          nic="eth0" \
          cidr_netmask="24" \
      op start   timeout="60s" interval="0"  on-fail="restart" \
      op monitor timeout="60s" interval="10s" on-fail="restart" \
      op stop    timeout="60s" interval="0"  on-fail="block"

primitive vip-rep ocf:heartbeat:IPaddr2 \
      params \
          ip="10.132.101.96" \
          nic="eth0" \
          cidr_netmask="24" \
      meta \
              migration-threshold="0" \
      op start   timeout="60s" interval="0"  on-fail="stop" \
      op monitor timeout="60s" interval="10s" on-fail="restart" \
      op stop    timeout="60s" interval="0"  on-fail="ignore"

master msPostgresql pgsql \
      meta \
          master-max="1" \
          master-node-max="1" \
          clone-max="2" \
          clone-node-max="1" \
          notify="true"

primitive pgsql ocf:heartbeat:pgsql \
      params \
          pgctl="/usr/bin/pg_ctl" \
          psql="/usr/bin/psql" \
          pgdata="/database/9.3" \
	 config="/etc/postgresql/9.3/main/postgresql.conf" \
	 socketdir=/var/run/postgresql \
          rep_mode="sync" \
          node_list="tstdb03 tstdb04" \
          restore_command="cp /database/archive/%f %p" \
          primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 
keepalives_count=5" \
          master_ip="10.132.101.95" \
          restart_on_promote="true" \
	 logfile=/var/log/postgresql/postgresql-9.3-main.log \
      op start   timeout="60s" interval="0"  on-fail="restart" \
      op monitor timeout="60s" interval="4s" on-fail="restart" \
      op monitor timeout="60s" interval="3s"  on-fail="restart" 
role="Master" \
      op promote timeout="60s" interval="0"  on-fail="restart" \
      op demote  timeout="60s" interval="0"  on-fail="stop" \
      op stop    timeout="60s" interval="0"  on-fail="block" \
      op notify  timeout="60s" interval="0"

#colocation rsc_colocation-1 inf: vip-master msPostgresql:Master
#order rsc_order-1 0: msPostgresql:promote  vip-master:start  
symmetrical=false
#order rsc_order-2 0: msPostgresql:demote   vip-rep:stop   
symmetrical=false

colocation rsc_colocation-1 inf: master-group msPostgresql:Master
order rsc_order-1 0: msPostgresql:promote  master-group:start  
symmetrical=false
order rsc_order-2 0: msPostgresql:demote   master-group:stop   
symmetrical=false