[Pacemaker] OCF Resource agent promote question

Tue Mar 26 12:19:28 UTC 2013

Excellent thanks so much for the clarification.  I'll drop this new RA in and see if I can get things working.

STEVE

On Mar 26, 2013, at 7:38 AM, Rainer Brestan <rainer.brestan at gmx.net<mailto:rainer.brestan at gmx.net>>
 wrote:

Hi Steve,
pgsql RA does the same, it compares the last_xlog_replay_location of all nodes for master promotion.
Doing a promote as a restart instead of promote command to conserve timeline id is also on configurable option (restart_on_promote) of the current RA.
And the RA is definitely capable of having more than two instances. It goes through the parameter node_list and doing its actions for every member in the node list.
Originally it might be planned only to have only one slave, but the current implementation does not have this limitation. It has code for sync replication of more than two nodes, when some of them fall back into async to not promote them.

Of course, i will share the extension with the community, when they are ready for use. And the feature of having more than two instances is not removed. I am not running more than two instances on one site, current usage is to have two instances on one site and having two sites and manage master by booth. But it also under discussion to have more than two instances on one site, just to have no availability interruption in case of one server down and the other promote with restart.
The implementation is nearly finished, then begins the stress tests of failure scenarios.

Rainer
Gesendet: Dienstag, 26. März 2013 um 11:55 Uhr
Von: "Steven Bambling" <smbambling at arin.net<mailto:smbambling at arin.net>>
An: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org<mailto:pacemaker at oss.clusterlabs.org>>
Betreff: Re: [Pacemaker] OCF Resource agent promote question

On Mar 26, 2013, at 6:32 AM, Rainer Brestan <rainer.brestan at gmx.net<x-msg://211/rainer.brestan@gmx.net>> wrote:

Hi Steve,
when Pacemaker does promotion, it has already selected a specific node to become master.
It is far too late in this state to try to update master scores.

But there is another problem with xlog in PostgreSQL.

According to some discussion on PostgreSQL mailing lists, not relevant xlog entries dont go into the xlog counter during redo and/or start. This is specially true for CHECKPOINT xlog records, where this situation can be easely reproduced.
This can lead to the situation, where the replication is up to date, but the slave shows an lower xlog value.
This issue was solved in 9.2.3, where wal receiver always counts the end of applied records.

We are currently testing with 9.2.3.  I'm using the functions http://www.databasesoup.com/2012/10/determining-furthest-ahead-replica.html along with tweaking a function to get the replay_lag in bytes to have a more accurate measurement.

There is also a second boring issue. The timeline change is replicated to the slaves, but they do not save it anywhere. In case slave starts up again and do not have access to the WAL archive, it cannot start any more. This was also addressed as patch in 9.2 branch, but i havent test if also fixed in 9.2.3.

After talking with one of the Postgres guys it was recommended that we look at an alternative solution to the built in trigger file that will make the master jump to a new timeline.  We are in place moving the recovery.conf to recovery.done via the resource agent and then restarting the the postgresql service on the "new" master so that it maintains the original timeline that the slaves will recognize.

For data replication, no matter if PostgreSQL or any other database, you have always two choices of work.
- Data consistency is the top most priority. Dont go in operation, unless everything fine.
- Availability is the top most priority. Always try to have at least one running instance, even if data might not be latest.

The current pgsql RA does quite a good job for the first choice.

It currently has some limitations.
- After switchover, no matter of manual/automatic, it needs some work from maintenance personnel.
- Some failure scenarios of fault series lead to a non existing master without manual work.
- Geo-redundant replication with multi-site cluster ticket system (booth) does not work.
- If availability or unattended work is the priority, it cannot be used out of the box.

But it has a very good structure to be extended for other needs.

And this is what i currently implement.
Extend the RA to support both choices of work and prepare it for a multi-site cluster ticket system.

Would you be willing to share your extended RA?  Also do you run a cluster with more then 2 nodes ?

v/r

STEVE

Regards, Rainer
Gesendet: Dienstag, 26. März 2013 um 00:01 Uhr
Von: "Andreas Kurz" <andreas at hastexo.com<x-msg://211/andreas@hastexo.com>>
An: pacemaker at oss.clusterlabs.org<x-msg://211/pacemaker@oss.clusterlabs.org>
Betreff: Re: [Pacemaker] OCF Resource agent promote question
Hi Steve,

On 2013-03-25 18:44, Steven Bambling wrote:
> All,
>
> I'm trying to work on a OCF resource agent that uses postgresql
> streaming replication. I'm running into a few issues that I hope might
> be answered or at least some pointers given to steer me in the right
> direction.

Why are you not using the existing pgsql RA? It is capable of doing
synchronous and asynchronous replication and it is known to work fine.

Best regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now

>
> 1. A quick way of obtaining a list of "Online" nodes in the cluster
> that a resource will be able to migrate to. I've accomplished it with
> some grep and see but its not pretty or fast.
>
> # time pcs status | grep Online | sed -e "s/.*\[\(.*\)\]/\1/" | sed 's/ //'
> p1.example.net<http://p1.example.net/> <http://p1.example.net<http://p1.example.net/>> p2.example.net<http://p2.example.net/>
> <http://p2.example.net<http://p2.example.net/>>
>
> real0m2.797s
> user0m0.084s
> sys0m0.024s
>
> Once I get a list of active/online nodes in the cluster my thinking was
> to use PSQL to get the current xlog location and lag or each of the
> remaining nodes and compare them. If the node has a greater log
> position and/or less lag it will be given a greater master preference.
>
> 2. How to force a monitor/probe before a promote is run on ALL nodes to
> make sure that the master preference is up to date before
> migrating/failing over the resource.
> - I was thinking that maybe during the promote call it could get the log
> location and lag from each of the nodes via an psql call ( like above)
> and then force the resource to a specific node. Is there a way to do
> this and does it sound like a sane idea ?
>
>
> The start of my RA is located here suggestions and comments 100%
> welcome https://github.com/smbambling/pgsqlsr/blob/master/pgsqlsr
>
> v/r
>
> STEVE
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org<x-msg://211/Pacemaker@oss.clusterlabs.org>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>
>

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org<x-msg://211/Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org<x-msg://211/Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org<mailto:Pacemaker at oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130326/5d5eb014/attachment.htm>