[Pacemaker] Service restoration in clone resource group
Sean Lutner
sean at rentul.net
Tue Oct 15 23:39:13 UTC 2013
On Oct 15, 2013, at 6:21 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>
> On 10/10/2013, at 12:52 PM, Sean Lutner <sean at rentul.net> wrote:
>
>>
>> On Oct 8, 2013, at 9:45 AM, Sean Lutner <sean at rentul.net> wrote:
>>
>>>
>>> On Oct 8, 2013, at 9:33 AM, Lars Marowsky-Bree <lmb at suse.com> wrote:
>>>
>>>> On 2013-10-08T09:29:14, Sean Lutner <sean at rentul.net> wrote:
>>>>
>>>>> The clone was created using the interleave=true option, yes.
>
> You might want to trawl the raw xml to make sure pcs did the right thing.
> cibadmin -Ql | grep interleave
>
> would tell you.
Thanks, that's very helpful. I'll have a look.
>
>>>>
>>>> Ok, so pcs hides that (interesting to know).
>>>>
>>>>> Does this have an affect on what I'm trying to accomplish?
>>>>
>>>> Yes, if you hadn't set that, it might have been an explanation. My best
>>>> guess right now would be to upgrade first; the PE has gotten quite a few
>>>> fixes since 1.1.8 again.
>>>
>>> Are you indicating that the behavior I expect to see, which is the resource being marked as Started on the now passive node, is what pacemaker should be doing and this could be a bug?
>>>
>>> If it would help, I can provide a full cib configuration and logs while I execute the tests I've been running. I won't be able to do that until tonight (EST time) but can if it may help.
>>>
>>> Thanks
>>> Sean
>>
>> Sorry for following up on my own post but I have a follow-up question about the failcount for a resource. Does a crm_resource --cleanup erase the failcount on the resource it's run against?
>
> Older versions didn't but I don't exactly recall when we started doing that.
In practice that's what I'm observing so it seems that with 1.1.8 it does.
>
>> I'm looking at making changes to the failure-timeout and cluster-recheck-interval which when combined with my values of resource-stickiness=100 and migration-threshold=1 should allow for the services on the now failed node to be restarted and be marked as Started in the cluster without causing an unnecessary failover.
>>
>> Does this make sense?
>
> yes
I currently have my failure-timeout and cluster-recheck-interval both set to 10m but I'm not seeing the failcount clear. If I trigger a failover by stopping the resource/service the failover works as expected. But if I then manually restart the services on a previously failed node pacemaker never marks the resources as Started again.
I think I may be hitting this bug you fixed back in May. The commit for the fix is https://github.com/beekhof/pacemaker/commit/d87de1b and the thread discussing the issue is http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg15979.html.
I think that fits and is what I'm seeing because the default on-fail behavior for a stop operation is block.
I will be pulling a newer version of pacemaker from git and building an RPM to test with.
>
>>
>>>
>>>>
>>>>
>>>> Regards,
>>>> Lars
>>>>
>>>> --
>>>> Architect Storage/HA
>>>> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
>>>> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131015/245d6041/attachment-0004.sig>
More information about the Pacemaker
mailing list