[Pacemaker] Behavior of Corosync+Pacemaker with DRBD primary power loss
Andrew Martin
amartin at xes-inc.com
Tue Oct 23 15:04:31 UTC 2012
Hello,
Under the Clusters from Scratch documentation, allow-two-primaries is set in the DRBD configuration for an active/passive cluster:
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Clusters_from_Scratch/index.html#_write_the_drbd_config
"TODO: Explain the reason for the allow-two-primaries option"
Is the reason for allow-two-primaries in this active/passive cluster (using ext4, a non-cluster filesystem) to allow for failover in the type of situation I have described (where the old primary/master is suddenly offline like with a power supply failure)? Are split-brains prevented because Pacemaker ensures that only one node is promoted to Primary at any time?
Is it possible to recover from such a failure without allow-two-primaries?
Thanks,
Andrew
----- Original Message -----
From: "Andrew Martin" <amartin at xes-inc.com>
To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
Sent: Friday, October 19, 2012 10:45:04 AM
Subject: [Pacemaker] Behavior of Corosync+Pacemaker with DRBD primary power loss
Hello,
I have a 3 node Pacemaker + Corosync cluster with 2 "real" nodes, node0 and node1, running a DRBD resource (single-primary) and the 3rd node in standby acting as a quorum node. If node0 were running the DRBD resource, and thus is DRBD primary, and its power supply fails, will the DRBD resource be promoted to primary on node1?
If I simply cut the DRBD replication link, node1 reports the following state:
Role:
Secondary/Unknown
Disk State:
UpToDate/DUnknown
Connection State:
WFConnection
I cannot manually promote the DRBD resource because the peer is not outdated:
0: State change failed: (-7) Refusing to be Primary while peer is not outdated
Command 'drbdsetup 0 primary' terminated with exit code 11
I have configured the CIB-based crm-fence-peer.sh utility in my drbd.conf
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
but I do not believe it would be applicable in this scenario.
If node0 goes offline like this and doesn't come back (e.g. after a STONITH), does Pacemaker have a way to tell node1 that its peer is outdated and to proceed with promoting the resource to primary?
Thanks,
Andrew
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20121023/dfe33346/attachment.htm>
More information about the Pacemaker
mailing list