[Pacemaker] How to perform a clean shutdown of Pacemaker in the event of network connection loss

Wed Jul 24 22:39:16 EDT 2013

On 24/07/2013, at 6:40 PM, Tan Tai hock <taihock at gmail.com> wrote:

> 
> I did not enable fencing. I observe the process running and see that when the node is up, I will see the following processes:
> 
> Corosync
> --------------
> /usr/sbin/corosync
> 
> Pacemaker
> -----------------
> /usr/libexec/pacemaker/lrmd
> /usr/libexec/pacemaker/pengine
> pacemakerd
> /usr/libexec/pacemaker/stonith
> /usr/libexec/pacemaker/cib
> /usr/libexec/pacemaker/crmd
> 
> If I were to shutdown the network connection of any node and then list out the processes, I will see that "/usr/sbin/corosync" is no longer running  and for Pacemaker, the following processes are left:
> 
> Pacemaker
> -----------------
> /usr/libexec/pacemaker/lrmd
> /usr/libexec/pacemaker/pengine

corosync has probably crashed and taken most of pacemaker with it (all this bits that connect to corosync).
what versions are you running? because neither "Pacemaker 2.3" nor "Corosync 1.19" exist.

In any case, the resources are still running there - even if pacemaker isn't.

If you want to simulate an outage, use firewall rules to block traffic to/from the node.
If you want to simulate someone opening the box and pulling out it's NIC - keep using ifconfig down.

> 
> If there is no network connectivity loss and I perform a clean shutdown, I do not see any of the processes listed for Corosync and Pacemaker. I tried to kill the remaining process after network connection is lost but that does not prevent the fallen node from getting back the resource if it used to be holding it before going down. 
> 
> Is there a way to perform a clean shutdown if pacemaker was shutdown improperly? 
> 
> 
> On Wed, Jul 24, 2013 at 8:21 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
> On 24/07/2013, at 9:54 AM, Tan Tai hock <taihock at gmail.com> wrote:
> 
> > No I did not. It seems like corosync and pacemaker stop running when the network connection is lost.
> 
> Do you have fencing enabled?
> If not, I'd be surprised if corosync or pacemaker stopped running.
> 
> > I am trying to simulate a scenario whereby a node which started the resource loses network connection and observe how it reacts upon joining back the cluster. Is there any proper way to shutdown both corosync and pacemaker in such scenario?
> 
> They are not supposed to stop running just because connectivity was lost.
> 
> >
> > On Jul 24, 2013 6:55 AM, "Andrew Beekhof" <andrew at beekhof.net> wrote:
> >
> > On 23/07/2013, at 11:28 AM, Tan Tai hock <taihock at gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have currently set up 3 machines with Pacemaker 2.3 with Corosync 1.19. I have tested some scenarios and have encountered some problem which I hope to get some advice on.
> > >
> > > My scenario is as follows:
> > >
> > > The 3 machines, name A,B,C are all running with A being the node which started the resource as seen in cm_mon. If I were to cut off the network connection for A, B will take over as the node which started the resource. I then resume the network connection and start both corosync and pacemaker on A again
> >
> > Did you stop it there first?
> >
> > > and the node which started the resource now returns to node A.
> > > I have set stickness and perform an identical test but with proper shutdown of pacemaker and corosync and it is working fine.
> > > Is there anyway to perform a clean shutdown in the event that a node loses network connection so that it will not attempt to take back the resource it used to be holding before it was uncleanly shutdown?
> > >
> > > Thanks
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org