[Pacemaker] killing corosync leaves crmd, stonithd, lrmd, cib and attrd to hog up the cpu

Andreas Kurz andreas at hastexo.com
Mon Nov 14 09:30:27 EST 2011


On 11/14/2011 02:36 PM, ihjaz Mohamed wrote:
> Yes I neither have stonith nor MCP configured.
> 
> I just changed the pacemaker version to 1 in the corosync.conf and tried
> the same thing.(i.e kill corosync). I still see the same issue as
> before. Is there anything else I need to do to enable the MCP?

MCP is available since Pacemaker 1.1.3, beside changing the version for
pacemaker plugin in corosync.conf you need to start the pacemaker init
script after starting corosync.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> ------------------------------------------------------------------------
> *From:* Florian Haas <florian at hastexo.com>
> *To:* The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> *Sent:* Monday, 14 November 2011 6:22 PM
> *Subject:* Re: [Pacemaker] killing corosync leaves crmd, stonithd, lrmd,
> cib and attrd to hog up the cpu
> 
> On 2011-11-14 13:18, Dan Frincu wrote:
>> Hi,
>>
>> On Mon, Nov 14, 2011 at 1:32 PM, ihjaz Mohamed
> <ihjazmohamed at yahoo.co.in <mailto:ihjazmohamed at yahoo.co.in>> wrote:
>>> Hi All,
>>> As part of some robustness test for my cluster, I tried killing the
> corosync
>>> process using kill -9 <pid>. After this I see that the pacemakerd
> service is
>>> stopped but the processes crmd, stonithd, lrmd, cib and attrd are still
>>> running and are hogging up the cpu.
>>
>> I have seen this kind of testing before and I have to say I don't
>> consider it the recommended way of testing the cluster stack's
>> "robustness". Pacemaker processes rely on corosync for proper
>> functioning. You kill corosync and then want to "cleanup" the
>> processes? You have to go through a lot more literature in order to
>> understand how this cluster stack works.
> 
> Well I, for my part, don't consider this kind of testing unreasonable at
> all. If Corosync dies, say due to a segfault, then the cluster had
> better recover to a consistent state.
> 
> Thus, this (very valid) testing highlights that the cluster is evidently
> misconfigured; it's either not using Pacemaker MCP at all, or doesn't
> have STONITH configured, or neither.
> 
> Florian
> 
> -- 
> Need help with High Availability?
> http://www.hastexo.com/now
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> <mailto:Pacemaker at oss.clusterlabs.org>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111114/ba317358/attachment-0003.sig>


More information about the Pacemaker mailing list