[Pacemaker] Master/Slave not failing over

Fri Jun 25 12:27:15 EDT 2010

After looking at the drbd master/slave RA, I think it is now clear. It looks like crm_master, being a wrapper for crm_attribute, actually specifies everything I need, and all I need to add to the command line are the few additional options like lifetime of the attribute modification, value to set it to, or whether to delete the attribute.

So, if I delete the attribute when a STOP is issued and keep the attribute's lifetime set to "reboot", it should be sufficient to cause a failover, correct?

Also, I am thinking that in my START action, after I have performed enough monitoring on it to ensure that everything came up correctly, I should at that point issue crm_master again with -v option to set a score for the node so it is a good candidate to become master, correct?

Eliot Gable
Senior Product Developer
1228 Euclid Ave, Suite 390
Cleveland, OH 44115

Direct: 216-373-4808
Fax: 216-373-4657
egable at broadvox.net

CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, please call me immediately.  BROADVOX is a registered trademark of Broadvox, LLC.

-----Original Message-----
From: Eliot Gable [mailto:egable at broadvox.com]
Sent: Friday, June 25, 2010 12:17 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Master/Slave not failing over

Thanks. Should I update my RA to use crm_master when it detects the resource in FAILED_MASTER state, or should I put it in the demote action or something else?

What's the command line needed to "reduce the promotion score"? I looked at the Pacemaker_Explained.pdf document, and while it mentions using crm_master to provide a promotion score, it does not tell me what actual attribute it is that needs to be modified. Is there another command that can print out all available attributes, or a document somewhere that lists them?

Eliot Gable
Senior Product Developer
1228 Euclid Ave, Suite 390
Cleveland, OH 44115

Direct: 216-373-4808
Fax: 216-373-4657
egable at broadvox.net

CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, please call me immediately.  BROADVOX is a registered trademark of Broadvox, LLC.

-----Original Message-----
From: Andrew Beekhof [mailto:andrew at beekhof.net]
Sent: Friday, June 25, 2010 8:26 AM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Master/Slave not failing over

On Fri, Jun 25, 2010 at 12:43 AM, Eliot Gable <egable at broadvox.com> wrote:
> Thanks for pointing that out.
>
> I am still having issues with the master/slave resource. When I cause one of the monitoring actions to fail,

as well as failing it should also use crm_master to reduce the promotion score

> the master node gets a DEMOTE, STOP, START, PROMOTE and the slave resource just sits there. I want to see DEMOTE on the failed master, then PROMOTE on the slave, then STOP on the failed master, followed by START on the failed master.

The stop will always happen before the promote. Regardless of which
instance is being promoted.

> How can I achieve this? Is there some sort of constraint or something I can put in place to make it happen?
>
> Thanks again for any insights.
>
>
>
> Eliot Gable
> Senior Product Developer
> 1228 Euclid Ave, Suite 390
> Cleveland, OH 44115
>
> Direct: 216-373-4808
> Fax: 216-373-4657
> egable at broadvox.net
>
>
> CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, please call me immediately.  BROADVOX is a registered trademark of Broadvox, LLC.
>
>
> -----Original Message-----
> From: Dejan Muhamedagic [mailto:dejanmm at fastmail.fm]
> Sent: Thursday, June 24, 2010 12:37 PM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Master/Slave not failing over
>
> Hi,
>
> On Thu, Jun 24, 2010 at 12:12:34PM -0400, Eliot Gable wrote:
>> On another note, I cannot seem to get Pacemaker to monitor the master node. It monitors the slave node just fine. These are the operations I have defined:
>>
>>         op monitor interval="5" timeout="30s" \
>>         op monitor interval="10" timeout="30s" OCF_CHECK_LEVEL="10" \
>>         op monitor interval="5" role="Master" timeout="30s" \
>>         op monitor interval="10" role="Master" timeout="30s" OCF_CHECK_LEVEL="10" \
>>         op start interval="0" timeout="40s" \
>>         op stop interval="0" timeout="20s"
>>
>> Did I do something wrong?
>
> Yes, all monitor intervals have to be different. I don't know
> what happened without looking at the logs, but you should set sth
> like this:
>
>         op monitor interval="6" role="Master" timeout="30s" \
>         op monitor interval="11" role="Master" timeout="30s" OCF_CHECK_LEVEL="10" \
>
> Thanks,
>
> Dejan
>
>> Eliot Gable
>> Senior Product Developer
>> 1228 Euclid Ave, Suite 390
>> Cleveland, OH 44115
>>
>> Direct: 216-373-4808
>> Fax: 216-373-4657
>> egable at broadvox.net<mailto:egable at broadvox.net>
>>
>> [cid:image001.gif at 01CB1396.87214DC0]
>> CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, please call me immediately.  BROADVOX is a registered trademark of Broadvox, LLC.
>>
>> From: Eliot Gable [mailto:egable at broadvox.com]
>> Sent: Thursday, June 24, 2010 11:55 AM
>> To: The Pacemaker cluster resource manager
>> Subject: [Pacemaker] Master/Slave not failing over
>>
>> I am using the latest CentOS 5.5 packages for pacemaker/corosync. I have a master/slave resource up and running, and when I make the master fail, instead of immediately promoting the slave, it restarts the failed master and re-promotes it back to master. This takes longer than if it would just immediately promote the slave. I can understand it waiting for a DEMOTE action to succeed on the failed master before it promotes the slave, but that is all it should need to do it. Is there any way I can change this behavior? Am I missing some key point in the process?
>>
>>
>> Eliot Gable
>> Senior Product Developer
>> 1228 Euclid Ave, Suite 390
>> Cleveland, OH 44115
>>
>> Direct: 216-373-4808
>> Fax: 216-373-4657
>> egable at broadvox.net<mailto:egable at broadvox.net>
>>
>> [cid:image001.gif at 01CB1396.87214DC0]
>> CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, please call me immediately.  BROADVOX is a registered trademark of Broadvox, LLC.
>>
>>
>> ________________________________
>> CONFIDENTIAL. This e-mail and any attached files are confidential and should be destroyed and/or returned if you are not the intended and proper recipient.
>>
>> ________________________________
>> CONFIDENTIAL. This e-mail and any attached files are confidential and should be destroyed and/or returned if you are not the intended and proper recipient.
>
>
>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> CONFIDENTIAL.  This e-mail and any attached files are confidential and should be destroyed and/or returned if you are not the intended and proper recipient.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

CONFIDENTIAL.  This e-mail and any attached files are confidential and should be destroyed and/or returned if you are not the intended and proper recipient.

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

CONFIDENTIAL.  This e-mail and any attached files are confidential and should be destroyed and/or returned if you are not the intended and proper recipient.