[Pacemaker] hangs pending

Andrey Groshev greenx at yandex.ru
Thu Mar 20 00:56:21 EDT 2014



20.03.2014, 07:13, "Andrew Beekhof" <andrew at beekhof.net>:
> On 19 Mar 2014, at 4:00 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>
>>  19.03.2014, 03:29, "Andrew Beekhof" <andrew at beekhof.net>:
>>>  On 19 Mar 2014, at 6:19 am, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>   12.03.2014, 02:53, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>   Sorry for the delay, sometimes it takes a while to rebuild the necessary context
>>>>   I'm sorry too for the answer delay.
>>>>   I switched to using "upstart" for initializing corosync and pacemaker (with respawn).
>>>>   Now the behavior of the system has changed and it suits me. (yet :) )
>>>>   I must kill crmd/lrmd in infinite loop, then STONITH shoot.
>>>>   Else very fast respawn and do nothing.
>>>>
>>>>   Of course, I still found a other way to hang the system.
>>>>   This requires only one idiot.
>>>>   1. He decides to update pacemaker (and/or erase incomprehensible service).
>>>>   2. Then kills the process corosync or simply reboot the server.
>>>>   Everything! This node will remain hang in "pending".
>>>  While trying to shutdown?
>>>  Our spec files shut pacemaker down prior to upgrades FWIW.
>>  Not so simple ... we have a national tradition - care and cherish idiots.
>>  Therefore, they are clever, quirky and unpredictable. ;)
>>  He can simply delete files of package, without uninstall.
>>  (In reality, it may be just crash of the file system).
>
> Dunno, I think hanging is somewhat reasonable behaviour if parts of pacemaker have been removed :)
> It didn't get fenced though?

Yes, it is logical.
But I have another idea.

"The client part" STONITH agent does not depend on pacemaker. 
(at least in the case of my )
On the other nodah STONITH should still work .
Ie they can restart / off node.
And I think, we should have possible set different behavior of the cluster.
Ie if STONITH can shut down / restart the node , but the resource is not start - 
DO NOT assume node UNCLEAN. Assume that it is, but without resources .

Next there is a chicken and egg problem !
Suppose I removed myself PCMK or crash file system, it was on the "master" STONITH. 
The idea should start STONITH resource to another node, but this does not happen, because one node is "hanging pending". 
I have reproduced this behavior.

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list