[Pacemaker] Postgresql Replication

Thu Sep 12 10:00:31 UTC 2013

Hi Eloy

2013/9/12 Eloy Coto Pereiro <eloy.coto at gmail.com>:
> Hi,
>
> I have issues with this config, for example when master is running corosync
> service use pg_ctl. But in the slave pg_ctl doesn't start and replication
> doesn't work.
>
> This is my data:
>
>
> Online: [ master slave ]
> OFFLINE: [ ]
>
> Full list of resources:
>
> ClusterIP (ocf::heartbeat:IPaddr2): Started master
> KAMAILIO        (lsb:kamailio): Started master
>  Master/Slave Set: msPostgresql [pgsql]
>      Masters: [ master ]
>      Stopped: [ pgsql:1 ]
>
> Node Attributes:
> * Node master:
>     + maintenance                       : off
>     + master-pgsql                      : 1000
>     + pgsql-data-status                 : LATEST
>     + pgsql-master-baseline             : 0000000019000080
>     + pgsql-status                      : PRI
> * Node slave:
>     + pgsql-data-status                 : DISCONNECT
>     + pgsql-status                      : HS:sync
>
>
> In my crm configure show is this:
> node master \
> attributes maintenance="off" pgsql-data-status="LATEST"
> node slave \
> attributes pgsql-data-status="DISCONNECT"
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
> params ip="10.1.1.1" cidr_netmask="24" \
> op monitor interval="15s" \
> op start timeout="60s" interval="0s" on-fail="stop" \
> op monitor timeout="60s" interval="10s" on-fail="restart" \
> op stop timeout="60s" interval="0s" on-fail="block"
> primitive KAMAILIO lsb:kamailio \
> op monitor interval="10s" \
> op start interval="0" timeout="120s" \
> op stop interval="0" timeout="120s" \
> meta target-role="Started"
> primitive pgsql ocf:heartbeat:pgsql \
> params pgctl="/usr/pgsql-9.2/bin/pg_ctl" psql="/usr/pgsql-9.2/bin/psql"
> pgdata="/var/lib/pgsql/9.2/data/" rep_mode="sync" node_list="master slave"
> restore_command="cp /var/lib/pgsql/9.2/pg_archive/%f %p"
> primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
> keepalives_count=5" master_ip="10.1.1.1" restart_on_promote="true" \
> op start timeout="60s" interval="0s" on-fail="restart" \
> op monitor timeout="60s" interval="4s" on-fail="restart" \
> op monitor timeout="60s" interval="3s" on-fail="restart" role="Master" \
> op promote timeout="60s" interval="0s" on-fail="restart" \
> op demote timeout="60s" interval="0s" on-fail="stop" \
> op stop timeout="60s" interval="0s" on-fail="block" \
> op notify timeout="60s" interval="0s"
> ms msPostgresql pgsql \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true" target-role="Started"
> location cli-prefer-KAMAILIO KAMAILIO \
> rule $id="cli-prefer-rule-KAMAILIO" inf: #uname eq master
> location cli-prefer-pgsql msPostgresql \
> rule $id="cli-prefer-rule-pgsql" inf: #uname eq master
> location cli-standby-ClusterIP ClusterIP \
> rule $id="cli-standby-rule-ClusterIP" -inf: #uname eq slave

This location is invalid.
It means that ClusterIP can't run on slave.

> colocation colocation-1 inf: ClusterIP msPostgresql KAMAILIO

PostgreSQL needs KAMAILIO to start ?
It means that Pacemaker can't start PostgreSQL on slave.

Sample setting is
  colocation rsc_colocation-1 inf: master-group msPostgresql:Master

At the very beginning, you might want to customize sample settings.
http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#sample_settings_for_crm_command

And please see logs because pgsql RA outputs some useful logs.

> order order-1 inf: ClusterIP msPostgresql KAMAILIO
> property $id="cib-bootstrap-options" \
> dc-version="1.1.8-7.el6-394e906" \
> cluster-infrastructure="classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-enabled="false"
>
> Any idea why doesn't start on the second slave?
>
> More info:
>
> Master:
>
> root at master ~]# netstat -putan | grep 5432 | grep LISTEN
> tcp        0      0 0.0.0.0:5432                0.0.0.0:*
> LISTEN      3241/postgres
> tcp        0      0 :::5432                     :::*
> LISTEN      3241/postgres
> [root at master ~]# ps axu | grep postgres
> postgres  3241  0.0  0.0  97072  7692 ?        S    11:41   0:00
> /usr/pgsql-9.2/bin/postgres -D /var/lib/pgsql/9.2/data -c
> config_file=/var/lib/pgsql/9.2/data//postgresql.conf
> postgres  3293  0.0  0.0  97072  1556 ?        Ss   11:41   0:00 postgres:
> checkpointer process
> postgres  3294  0.0  0.0  97072  1600 ?        Ss   11:41   0:00 postgres:
> writer process
> postgres  3295  0.0  0.0  97072  1516 ?        Ss   11:41   0:00 postgres:
> wal writer process
> postgres  3296  0.0  0.0  97920  2760 ?        Ss   11:41   0:00 postgres:
> autovacuum launcher process
> postgres  3297  0.0  0.0  82712  1500 ?        Ss   11:41   0:00 postgres:
> archiver process   failed on 000000010000000000000001
> postgres  3298  0.0  0.0  82872  1568 ?        Ss   11:41   0:00 postgres:
> stats collector process
> root     10901  0.0  0.0 103232   852 pts/0    S+   11:44   0:00 grep
> postgres
>
>
> On slave:
>
> [root at slave ~]# ps axu | grep postgre
> root      3332  0.0  0.0 103232   856 pts/0    S+   11:45   0:00 grep
> postgre
> [root at slave ~]# netstat -putan | grep 5432
> [root at slave ~]#
>
>
> If I make pg_ctl /var/lib/pgsql/9.2/data/ start work ok
>
> Any idea?
>
>
> 2013/9/11 Takatoshi MATSUO <matsuo.tak at gmail.com>
>>
>> Hi Eloy
>>
>> Please see http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster .
>> In the document, it uses virtual IP to receive connection,
>> so it doesn't need to change recovery.conf.
>>
>> Thanks,
>> Takatoshi MATSUO
>>
>>
>> 2013/9/11 Eloy Coto Pereiro <eloy.coto at gmail.com>:
>> > Hi,
>> >
>> > In Postgresql if you use wal replication
>> > <http://wiki.postgresql.org/wiki/Streaming_Replication> when the master
>> > servers fails need to change the recovery.conf on the slave server.
>> >
>> > In this case any tool, when the master is down, execute a command and
>> > get
>> > this info?
>> > Is this the right tool for postgresql's replication?
>> >
>> > Cheers