[Pacemaker] Getting split brain after all reboot of a cluster node
Gianluca Cecchi
gianluca.cecchi at gmail.com
Thu Mar 6 09:12:35 UTC 2014
On Wed, Mar 5, 2014 at 9:28 AM, Anne Nicolas wrote:
> Hi
>
> I'm having trouble setting a very simple cluster with 2 nodes. After all
> reboot I'm getting split brain that I have to solve by hand then.
> Looking for a solution for that one...
>
> Both nodes have 4 network interfaces. We use 3 of them: one for an IP
> cluster, one for a bridge for a vm and the last one for the private
> network of the cluster
>
> I'm using
> drbd : 8.3.9
> drbd-utils: 8.3.9
>
> DRBD configuration:
> ============
> $ cat global_common.conf
> global {
> usage-count no;
> disable-ip-verification;
> }
> common { syncer { rate 500M; } }
>
> cat server.res
> resource server {
> protocol C;
> net {
> cram-hmac-alg sha1;
> shared-secret "eafcupps";
> }
> on dzacupsvr {
> device /dev/drbd0;
> disk /dev/vg0/server;
> address 172.16.1.1:7788;
> flexible-meta-disk internal;
> }
> on dzacupsvr2 {
> device /dev/drbd0;
> disk /dev/vg0/server;
> address 172.16.1.2:7788;
> flexible-meta-disk internal;
> }
> }
>
[snip]
>
> After looking for more information, I've added fences in drbd configuration
>
> handlers {
> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> }
> but still without any success...
>
> Any help appreciated
>
> Cheers
>
> --
> Anne
Hello Anne,
for sure follow the stonith advises from digimer and emmanuel.
As a starting point I think you can add this part in your resource
definition that seems missing at the moment:
resource <resource> {
disk {
fencing resource-only;
...
}
}
This should manage without problem a clean shutdown of cluster's nodes
and some failure scenarios.
But it doesn't completely protect you from data corruption in some
cases (such as intercommunication network that suddenly goes down and
up with both nodes active where both could become primary in some
moments).
At least this worked for me during initial tests before stonith
configuration with
SLES 11 sp2 (corosync/pacemaker)
CentOS 6.5 (cman/pacemaker)
What you exactly mean with
"
After all reboot I'm getting split brain that I have to solve by hand then
"
?
HIH,
Gianluca
More information about the Pacemaker
mailing list