[Pacemaker] Postgresql streaming replication failover - RA needed

Serge Dubrouski sergeyfd at gmail.com
Sun Dec 11 18:51:03 CET 2011


On Thu, Dec 8, 2011 at 10:34 PM, Takatoshi MATSUO <matsuo.tak at gmail.com>wrote:

> Hi Attila
>
> 2011/12/8 Attila Megyeri <amegyeri at minerva-soft.com>:
> > Hi Takatoshi,
> >
> > One strange thing I noticed and could probably be improved.
> > When there is data inconsistency, I have the following node properties:
> >
> > * Node psql2:
> >    + default_ping_set                  : 100
> >    + master-postgresql:1               : -INFINITY
> >    + pgsql-data-status                 : DISCONNECT
> >    + pgsql-status                      : HS:alone
> > * Node psql1:
> >    + default_ping_set                  : 100
> >    + master-postgresql:0               : 1000
> >    + master-postgresql:1               : -INFINITY
> >    + pgsql-data-status                 : LATEST
> >    + pgsql-master-baseline             : 58:000000004B000020
> >    + pgsql-status                      : PRI
> >
> > This is fine, and understandable - but I can see this only if I do a
> crm_mon -A.
> >
> > My problem is, that CRM shows the following:
> >
> > Master/Slave Set: db-ms-psql [postgresql]
> >     Masters: [ psql1 ]
> >     Slaves: [ psql2 ]
> >
> > So if I monitor the system from crm_mon, HAWK or ther tools - I have no
> indication at all that the slave is running in an inconsistent mode.
> >
> > I would expect the RA to stop the psql2 node in such cases, because:
> > - It is running, but has non-up-to-date data, therefore noone will use
> it (the slave IP points to the master as well, which is good)
> > - In CRM status eveything looks perfect, even though it is NOT perfect
> and admin intervention is required.
> >
> >
> > Shouldn't the disconnected PSQL server be stopped instead?
>
> hmm..
> It's not better to stop PGSQL server.
> RA cannot know whether PGSQL is disconnected because of
> data-inconsistent or network-down or
> starting-up and so on.
>

Why does it matter? If the state is degraded and inconsistent and there is
no way to fix it from inside of the RA, RA should probably stop it. Let's
say that there is pgpool running in front of the cluster, keeping an
inconsistent node up would lead to the routing SQL queries to it and
possibly getting wrong results.


>
>
> How about using dummy RA such as vip-slave?
> -------------------------------------------
> primitive runningSlaveOK ocf:heartbeat:Dummy
> .....(snip)
>
> location rsc_location-dummy runningSlaveOK \
>     rule  200: pgsql-status eq "HS:sync"
> -------------------------------------------
>

That probably fixes visibility issue. What about notifications on
DISCONNECT state? How administrator would know that cluster is
inconsistent? May be the better option in this case would be collocating
MailTo resource with "HS:alone"?


>
>
> Regards,
> Takatoshi MATSUO
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Serge Dubrouski.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20111211/16792b4d/attachment.html>


More information about the Pacemaker mailing list