[Pacemaker] Reason for cluster resource migration

Fri Feb 1 22:32:26 UTC 2013

----- Original Message -----
> From: "Andrew Beekhof" <andrew at beekhof.net>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Thursday, December 6, 2012 8:36:27 PM
> Subject: Re: [Pacemaker] Reason for cluster resource migration
> 
> On Wed, Dec 5, 2012 at 8:29 AM, Andrew Martin <amartin at xes-inc.com>
> wrote:
> > Hello,
> >
> > I am running a 3-node Pacemaker cluster (2 "real" nodes and 1
> > quorum node in
> > standby) on Ubuntu 12.04 server (amd64) with Pacemaker 1.1.8 and
> > Corosync
> > 2.1.0. My cluster configuration is:
> > http://pastebin.com/6TPkWtbt
> >
> > Recently, pengine died on storage0 (where the resources were
> > running) which
> > also happened to be the DC at the time. Consequently, Pacemaker
> > went into
> > recovery mode and released its role as DC, at which point storage1
> > took over
> > the DC role and migrated the resources away from storage0 and onto
> > storage1.
> > Looking through the logs, it seems like storage0 came back into the
> > cluster
> > before the migration of the resources began:
> > Dec 03 08:31:20 [3165] storage1       crmd:     info:
> > peer_update_callback:
> > Client storage0/peer now has status [online] (DC=true)
> > ...
> > Dec 03 08:31:20 [3164] storage1    pengine:   notice: LogActions:
> > Start   rscXXX    (storage1)
> >
> > Thus, why did the migration occur, rather than aborting and having
> > the
> > resources simply remain running on storage0? Here are the logs from
> > each of
> > the nodes:
> > storage0: http://pastebin.com/ZqqnH9uf
> > storage1: http://pastebin.com/rvSLVcZs
> 
> Hmm, thats an interesting one.
> Can you provide this file?  It will hold the answer:
> 
> Dec 03 08:31:31 [3164] storage1    pengine:   notice:
> process_pe_message: 	Calculated Transition 1:
> /var/lib/pacemaker/pengine/pe-input-28.bz2
> 
> 
> >
> > Thanks,
> >
> > Andrew
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

Andrew,

Sorry for the delayed response. Here is the file you requested:
http://sources.xes-inc.com/downloads/pe-input-28.bz2

This same condition just occurred again on storage1 today (pengine died, and then storage1 was STONITHed).

Thanks,

Andrew