[Pacemaker] Postgresql streaming replication failover - RA needed

Wed Nov 23 10:53:25 EST 2011

Hi Takatoshi, All,

Thanks for your reply.
I see that you have invested significant effort in the development of the RA. I spent the last day trying to set up the RA, but without much success.

My infrastructure is very similar to yours, except for the fact that currently I am testing with a single network adapter.

Replication works nicely when I start the databases manually, not using corosync.

When I try to start using corosync,I see that the ping resources start normally, but the msPostgresql starts on both nodes in slave mode, and I see "HS:alone"

In the Wiki you state, the if I start on a signle node only, PSQL should start in Master mode (PRI), but this is not the case.

The recovery.conf file is created immediately, and from the logs I see no attempt at all to promote the node.
In the postgres logs I see that node1, which is supposed to be a master, tries to connect to the vip-rep IP address, which is NOT brought up, because it depends on the Master role...

Do you have any idea?

My environment:
Debian Squeeze, with backported pacemaker (Version: 1.1.5) - official pacemaker in debian is rather old and buggy
Postgres 9.1, streaming replication, sync mode
Node1: psql1, 10.12.1.21
Node1: psql2, 10.12.1.22

Crm config:

node psql1 \
        attributes standby="off"
node psql2 \
        attributes standby="off"
primitive pingCheck ocf:pacemaker:ping \
        params name="default_ping_set" host_list="10.12.1.1" multiplier="100" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="ignore"
primitive postgresql ocf:heartbeat:pgsql \
        params pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" psql="/usr/bin/psql" pgdata="/var/lib/postgresql/9.1/main" config="/etc/postgresql/9.1/main/postgresql.conf" pgctldata="/usr/lib/postgresql/9.1/bin/pg_controldata" rep_mode="sync" node_list="psql1 psql2" restore_command="cp /var/lib/postgresql/9.1/main/pg_archive/%f %p" master_ip="10.12.1.28" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="7s" timeout="60s" on-fail="restart" \
        op monitor interval="2s" role="Master" timeout="60s" on-fail="restart" \
        op promote interval="0s" timeout="60s" on-fail="restart" \
        op demote interval="0s" timeout="60s" on-fail="block" \
        op stop interval="0s" timeout="60s" on-fail="block" \
        op notify interval="0s" timeout="60s"
primitive vip-master ocf:heartbeat:IPaddr2 \
        params ip="10.12.1.20" nic="eth0" cidr_netmask="24" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="block" \
        meta target-role="Started"
primitive vip-rep ocf:heartbeat:IPaddr2 \
        params ip="10.12.1.28" nic="eth0" cidr_netmask="24" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="block" \
        meta target-role="Started"
primitive vip-slave ocf:heartbeat:IPaddr2 \
        params ip="10.12.1.27" nic="eth0" cidr_netmask="24" \
        meta resource-stickiness="1" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="block"
group master-group vip-master vip-rep
ms msPostgresql postgresql \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Master"
clone clnPingCheck pingCheck
location rsc_location-1 vip-slave \
        rule $id="rsc_location-1-rule" 200: pgsql-status eq HS:sync \
        rule $id="rsc_location-1-rule-0" 100: pgsql-status eq PRI \
        rule $id="rsc_location-1-rule-1" -inf: not_defined pgsql-status \
        rule $id="rsc_location-1-rule-2" -inf: pgsql-status ne HS:sync and pgsql-status ne PRI
location rsc_location-2 msPostgresql \
        rule $id="rsc_location-2-rule" $role="master" 200: #uname eq psql1 \
        rule $id="rsc_location-2-rule-0" $role="master" 100: #uname eq psql2 \
        rule $id="rsc_location-2-rule-1" $role="master" -inf: defined fail-count-vip-master \
        rule $id="rsc_location-2-rule-2" $role="master" -inf: defined fail-count-vip-rep \
        rule $id="rsc_location-2-rule-3" -inf: not_defined default_ping_set or default_ping_set lt 100
colocation rsc_colocation-1 inf: msPostgresql clnPingCheck
colocation rsc_colocation-2 inf: master-group msPostgresql:Master
order rsc_order-1 0: clnPingCheck msPostgresql
order rsc_order-2 0: msPostgresql:promote master-group:start symmetrical=false
order rsc_order-3 0: msPostgresql:demote master-group:stop symmetrical=false
property $id="cib-bootstrap-options" \
        dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="INFINITY" \
        migration-threshold="1"

Regards,
Attila

-----Original Message-----
From: Takatoshi MATSUO [mailto:matsuo.tak at gmail.com] 
Sent: 2011. november 17. 8:04
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed

Hi  All

I create a RA for PosstgrSQL 9.1 Streaming Replication based on pgsql.

RA
  https://github.com/t-matsuo/resource-agents/blob/pgsql91/heartbeat/pgsql
Documents
  https://github.com/t-matsuo/resource-agents/wiki

It is almost totally changed from previous patch http://lists.linux-ha.org/pipermail/linux-ha-dev/2011-February/018193.html
.
It create recovery.conf and promote PostgreSQL automatically.
Additionally it can switch between the synchronous and asynchronous replication automatically.

If you please, use them and comment.

Regards,
Takatoshi MATSUO

2011/11/17 Serge Dubrouski <sergeyfd at gmail.com>:
>
>
> On Wed, Nov 16, 2011 at 12:55 PM, Attila Megyeri 
> <amegyeri at minerva-soft.com>
> wrote:
>>
>> Hi Florian,
>>
>> -----Original Message-----
>> From: Florian Haas [mailto:florian at hastexo.com]
>> Sent: 2011. november 16. 11:49
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] Postgresql streaming replication failover - 
>> RA needed
>>
>> Hi Attila,
>>
>> On 2011-11-16 10:27, Attila Megyeri wrote:
>> > Hi All,
>> >
>> >
>> >
>> > We have a two-node postgresql 9.1 system configured using streaming 
>> > replicaiton(active/active with a read-only slave).
>> >
>> > We want to automate the failover process and I couldn't really find 
>> > a resource agent that could do the job.
>>
>> That is correct; the pgsql resource agent (unlike its mysql 
>> counterpart) does not support streaming replication. We've had a 
>> contributor submit a patch at one point, but it was somewhat 
>> ill-conceived and thus did not make it into the upstream repo. The relevant thread is here:
>>
>> http://lists.linux-ha.org/pipermail/linux-ha-dev/2011-February/018195
>> .html
>>
>> Would you feel comfortable modifying the pgsql resource agent to 
>> support replication? If so, we could revisit this issue and 
>> potentially add streaming replication support to pgsql.
>>
>>
>> Well I'm not sure I would be able to do that change. Failover is 
>> relatively easy to do but I really have no idea how to do the failback part.
>
> And that's exactly the reason why I haven't implemented it yet. With 
> the current way how replication is done in PostgreSQL there is no easy 
> way to switch between roles, or at least I don't know about a such way.
> Implementing just fail-over functionality by creating a trigger file 
> on a slave server in the case of failure on master side doesn't create 
> a full master-slave implementation in my opinion.
>
>>
>> I will definitively have to sort this out somehow, I am just unsure 
>> whether I will try to use the repmgr mentioned in the video, or 
>> pacemaker with some level of customization...
>>
>> Is the resource agent that you mentioned available somewhere?
>>
>> Thanks.
>> Attila
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacem
>> aker
>
>
>
> --
> Serge Dubrouski.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacema
> ker
>
>

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker