[Pacemaker] Behavior of Corosync+Pacemaker with DRBD primary power loss

Andreas Kurz andreas at hastexo.com
Wed Oct 24 05:13:03 EDT 2012


On 10/23/2012 05:04 PM, Andrew Martin wrote:
> Hello,
> 
> Under the Clusters from Scratch documentation, allow-two-primaries is
> set in the DRBD configuration for an active/passive cluster:
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Clusters_from_Scratch/index.html#_write_the_drbd_config
> 
> "TODO: Explain the reason for the allow-two-primaries option"
> 
> Is the reason for allow-two-primaries in this active/passive cluster
> (using ext4, a non-cluster filesystem) to allow for failover in the type
> of situation I have described (where the old primary/master is suddenly
> offline like with a power supply failure)? Are split-brains prevented
> because Pacemaker ensures that only one node is promoted to Primary at
> any time?

no "allow-two-primaries" needed in an active/passive setup, the
fence-handler (executed on the Primary if connection to Secondary is
lost) inserts a location-constraint into the Pacemaker configuration so
the cluster does not even "think about" to promote an outdated Secondary

> 
> Is it possible to recover from such a failure without allow-two-primaries?

Yes. If you only disconnect DRBD as in you test described below and
cluster communication over redundant network is still possible (and
Pacemaker is up and running), the Primary will insert that
location-constraint and prevents a Secondary from becoming Primary
because the constraint is already placed ... if Pacemaker is _not_
running during your disconnection test, you also receive an error
because obviously it is also impossible to place that constraint.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now
> 
> Thanks,
> 
> Andrew
> 
> ------------------------------------------------------------------------
> *From: *"Andrew Martin" <amartin at xes-inc.com>
> *To: *"The Pacemaker cluster resource manager"
> <pacemaker at oss.clusterlabs.org>
> *Sent: *Friday, October 19, 2012 10:45:04 AM
> *Subject: *[Pacemaker] Behavior of Corosync+Pacemaker with DRBD primary
> power        loss
> 
> Hello,
> 
> I have a 3 node Pacemaker + Corosync cluster with 2 "real" nodes, node0
> and node1, running a DRBD resource (single-primary) and the 3rd node in
> standby acting as a quorum node. If node0 were running the DRBD
> resource, and thus is DRBD primary, and its power supply fails, will the
> DRBD resource be promoted to primary on node1?
> 
> If I simply cut the DRBD replication link, node1 reports the following
> state:
> Role:
> Secondary/Unknown
> 
> Disk State:
> UpToDate/DUnknown
> 
> Connection State:
> WFConnection
> 
> 
> I cannot manually promote the DRBD resource because the peer is not
> outdated:
> 0: State change failed: (-7) Refusing to be Primary while peer is not
> outdated
> Command 'drbdsetup 0 primary' terminated with exit code 11
> 
> I have configured the CIB-based crm-fence-peer.sh utility in my drbd.conf
> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> but I do not believe it would be applicable in this scenario.
> 
> If node0 goes offline like this and doesn't come back (e.g. after a
> STONITH), does Pacemaker have a way to tell node1 that its peer is
> outdated and to proceed with promoting the resource to primary?
> 
> Thanks,
> 
> Andrew
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20121024/a031b02c/attachment-0003.sig>


More information about the Pacemaker mailing list