[Pacemaker] Postgresql streaming replication failover - RA needed

Fri Nov 25 04:07:53 EST 2011

A quick snippet from the corosync.log

Nov 23 05:43:05 psql1 pgsql[2845]: DEBUG: Checking right of master.
Nov 23 05:43:05 psql1 pgsql[2845]: INFO: My data status=.
Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql1 xlog location : 000000000D000000
Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql2 xlog location : 0000000008000000

As you see, the "my data status" returns an empty string.

-----Original Message-----
From: Attila Megyeri [mailto:amegyeri at minerva-soft.com] 
Sent: 2011. november 25. 9:28
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed

Hi Takatoshi,

I have restored the PSQL to run without corosync so I cannot send you the crm_mon output now.

What I can tell for sure:
- RA never promoted any of the nodes, no matter what the status was. It also did not promote the node, when it was the only one.
- I believe the issue is in the comparison of the xlogs. How could I troubleshoot that? I see from the logs that crm NEVER tried to invoke pgsql with "promote"
- I tried previously the crm_mon -A option, but there was never a " pgsql-data-status" attribute. The other attribs were there, including the HS:alone
- In the corosync log the only relevant RA message I see is " Master is not exist. " I never saw a message like  "My data is out-of-date"

Thank you!

Attila

-----Original Message-----
From: Takatoshi MATSUO [mailto:matsuo.tak at gmail.com]
Sent: 2011. november 25. 8:56
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed

Hi Attila

2011/11/24 Attila Megyeri <amegyeri at minerva-soft.com>:
> Hi Takatoshi, All,
>
> Thanks for your reply.
> I see that you have invested significant effort in the development of the RA. I spent the last day trying to set up the RA, but without much success.
>
> My infrastructure is very similar to yours, except for the fact that currently I am testing with a single network adapter.
>
> Replication works nicely when I start the databases manually, not using corosync.
>
> When I try to start using corosync,I see that the ping resources start normally, but the msPostgresql starts on both nodes in slave mode, and I see "HS:alone"

To see "HS:alone" is normal.
And RA compares xlog locations and promote the postgresql having new data.

> In the Wiki you state, the if I start on a signle node only, PSQL should start in Master mode (PRI), but this is not the case.

If the data is old, the node can't be master.
To be master needs pgsql-data-status="LATEST" or "STREAMING|SYNC".
Plese check it using "crm_mon -A".

And to become a master from stopped takes a few minutes because the RA compares xlog location on monitor.

> The recovery.conf file is created immediately, and from the logs I see no attempt at all to promote the node.
> In the postgres logs I see that node1, which is supposed to be a master, tries to connect to the vip-rep IP address, which is NOT brought up, because it depends on the Master role...
>
> Do you have any idea?

Please check HA log.
My RA outputs "My data is out-of-date. status=********" to log if the data is old.

Regards,
Takatoshi MATSUO

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org