[Pacemaker] Human confirmation of dead node?

J Brack jbrack6 at gmail.com
Tue Oct 13 15:57:25 UTC 2009


On 10/13/09, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> Hi,
>
> On Tue, Oct 13, 2009 at 03:23:11PM +0200, J Brack wrote:
>> Hi,
>>
>> I'm currently using heartbeat. I heard that I'm meant to be using
>> pacemaker. I will switch in a heartbeat (sorry) if I can get pacemaker
>> to do what I need.
>
> http://clusterlabs.org/wiki/Project_History
>
>> I have a clustered nfs server, primary is in datacenter1 close to the
>> users, secondary is in datacenter2 not close to the users. There is
>> only an ethernet connection between the two data centers.
>>
>> In the event of a failure of the primary in datacenter1 (or of
>> datacenter1 itself), I would like to switch to the secondary in
>> datacenter2. The catch? I want a human to confirm that the primary is
>> really dead.
>>
>> My current heartbeat setup uses meatclient to confirm that a node has
>> been reset. This happens to do the same thing as confirming primary is
>> really dead for when primary's hardware dies - but for a network
>> outage I see the service bounce between the servers after the network
>> comes back up again. This is not ideal. I'm kind of hoping the
>> pacemaker can handle this more gracefully.
>
> It can't. The meatware/meatclient combination replaces a fencing
> operation. It is even expected that the node fenced is going to
> come up after a while.
>
>> Can pacemaker be configured to allow manual (human) confirmation that
>> the primary node is dead before ever switching services? (i.e. requrie
>> human confirmation for all cases when it cannot talk to the other
>> node).
>
> If your network goes yo-yo, the cluster will follow. The only
> way is to remove a node from the configuration or put it into
> standby.

What is the reasoning for this though?

Here I have pri and sec, both with meatware.

My expectiation:
Network dies, pri stays primary, sec waits for confirmation that pri
is dead. It never gets it.
Network comes back, sec sees pri is primary. All is well with the world.

What really happens.
Same, but when the network comes back, sec gets pri's resources, then
pri gets them back again.

This seems wrong.




More information about the Pacemaker mailing list