[Pacemaker] Failover after fail on Ethernet Fails (unSolved again)

Wed May 19 03:20:36 EDT 2010

Hiii,

I also faced a similar situation....

The real problem was with DRBD. It was in split brain condition and it was
the reason for the ping/pingd resource's improper functioning...

Check whether your DRBD is also in split brain by using

             # cat /proc/drbd

and

           # drbdadm verify <your drbd resource name>
           # echo $?

if the output is '0', the DRBD is healthy... else resolve it manually by:

Select a bad node <thats up to you!!!!>
       # drbdadm disconnect all
        # drbdadm -- --discard-my-data connect all

Now in the good node:

         # drbdadm connect all
         # drbdadm -- --overwrite-data-of-peer primary <your drbd resource
name>

This will enable drbd synchronization again... Now try to pull out the cable
and check whether failover is proper or not.. *It worked for me.....*

I have one more doubt.... In your rule you have specified the RA name not
your "resource id"

primitive pri_pingsys ocf:pacemaker:ping \
       params host_list="192.168.1.1 / 192.168.4.10" multiplier="100"
dampen="5" \
       op monitor interval="15"

clone clo_ping pri_pingsys \
        meta globally_unique="false" interleave="true" target-role="Started"
location loc_drbd_on_connected_node ms_drbd_service \
       rule $id="loc_group_t3_on_
connected_node-rule" ping: defined ping

I have given the resource id in rule, that is in your case 'pri_pingsys'...
I shall also check it by changing my configuration to RA name....

But if there is some intelligent method to find the DRBD split brain by the
pacemaker cluster, it would be better.... I am searching for that...

-- 
Regards,

Jayakrishnan. L

Visit:
www.foralllinux.blogspot.com
www.jayakrishnan.bravehost.com

On Fri, Apr 23, 2010 at 5:21 PM, Andrew Beekhof <andrew at beekhof.net> wrote:

> On Wed, Apr 21, 2010 at 9:41 AM, Stefan Kelemen <Stefan.Kelemen at gmx.de>
> wrote:
> > So i madde a complete hb_report (attachment) over ten minutes in that i
> made a deconnect of the ethernet without failover and a reconnect.
>
> Could you reproduce with "debug 1" in ha.cf please?
> There's not enough detail in the lrmd logs to know what the problem is :-(
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>

-- 
Regards,

Jayakrishnan. L

Visit:
www.foralllinux.blogspot.com
www.jayakrishnan.bravehost.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100519/b5477ab9/attachment.html>