[Pacemaker] hangs pending

Fri Feb 21 01:21:09 EST 2014

21.02.2014, 10:18, "Andrew Beekhof" <andrew at beekhof.net>:
> btw. Whats with all these entries:
>
> Feb 19 10:49:27 [1641] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root
> Feb 19 10:49:27 [1641] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_xml_cleanup: Cleaning up memory from libxml2
> Feb 19 10:49:27 [1772] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/hacluster
> Feb 19 10:49:27 [1772] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_xml_cleanup: Cleaning up memory from libxml2
> Feb 19 10:49:29 [1851] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root
> Feb 19 10:49:29 [1851] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_xml_cleanup: Cleaning up memory from libxml2
> Feb 19 10:49:35 [2130] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root
> Feb 19 10:49:35 [2130] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_xml_cleanup: Cleaning up memory from libxml2
> Feb 19 10:49:35 [2191] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root
> Feb 19 10:49:35 [2191] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_xml_cleanup: Cleaning up memory from libxml2
> Feb 19 10:49:40 [2288] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root
> Feb 19 10:49:40 [2288] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_xml_cleanup: Cleaning up memory from libxml2
> Feb 19 10:49:45 [2388] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root
> Feb 19 10:49:45 [2388] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_xml_cleanup: Cleaning up memory from libxml2
> Feb 19 10:49:51 [2468] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root
> Feb 19 10:49:51 [2468] dev-cluster2-node2.unix.tensor.ru pacemakerd:     info: crm_xml_cleanup: Cleaning up memory from libxml2
>
> are you calling pacemakerd for some reason?
>

No, in this test, I did not touch pacemakerd.
Only kill -4 `lrmd.pid`

> On 19 Feb 2014, at 7:53 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>
>>  19.02.2014, 09:49, "Andrew Beekhof" <andrew at beekhof.net>:
>>>  On 19 Feb 2014, at 4:18 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>   19.02.2014, 09:08, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>   On 19 Feb 2014, at 4:00 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>    19.02.2014, 06:48, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>    On 18 Feb 2014, at 11:05 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>     Hi, ALL and Andrew!
>>>>>>>>
>>>>>>>>     Today is a good day - I killed a lot, and a lot of shooting at me.
>>>>>>>>     In general - I am happy (almost like an elephant)   :)
>>>>>>>>     Except resources on the node are important to me eight processes: corosync,pacemakerd,cib,stonithd,lrmd,attrd,pengine,crmd.
>>>>>>>>     I killed them with different signals (4,6,11 and even 9).
>>>>>>>>     Behavior does not depend of number signal - it's good.
>>>>>>>>     If STONITH send reboot to the node - it rebooted and rejoined the cluster - too it's good.
>>>>>>>>     But the behavior is different from killing various demons.
>>>>>>>>
>>>>>>>>     Turned four groups:
>>>>>>>>     1. corosync,cib - STONITH work 100%.
>>>>>>>>     Kill via any signals - call STONITH and reboot.
>>>>>>>>
>>>>>>>>     2. lrmd,crmd - strange behavior STONITH.
>>>>>>>>     Sometimes called STONITH - and the corresponding reaction.
>>>>>>>>     Sometimes restart daemon and restart resources with large delay MS:pgsql.
>>>>>>>>     One time after restart crmd - pgsql don't restart.
>>>>>>>>
>>>>>>>>     3. stonithd,attrd,pengine - not need STONITH
>>>>>>>>     This daemons simple restart, resources - stay running.
>>>>>>>>
>>>>>>>>     4. pacemakerd - nothing happens.
>>>>>>>>     And then I can kill any process of the third group. They do not restart.
>>>>>>>>     Generaly don't touch corosync,cib and maybe lrmd,crmd.
>>>>>>>>
>>>>>>>>     What do you think about this?
>>>>>>>>     The main question of this topic - we decided.
>>>>>>>>     But this varied behavior - another big problem.
>>>>>>>>
>>>>>>>>     Forgоt logs http://send2me.ru/pcmk-Tue-18-Feb-2014.tar.bz2
>>>>>>>    Which of the various conditions above do the logs cover?
>>>>>>    All various in day.
>>>>>   Are you trying to torture me?
>>>>>   Can you give me a rough idea what happened when?
>>>>   No, there is 8 processes on the 4th signal and repeats the experiments with unknown outcome :)
>>>>   Easier to conduct new experiments and individual new logs .
>>>>   Which variant is more interesting?
>>>  The long delay in restarting pgsql.
>>>  Everything else seems correct.
>>  He even don't tried start pgsql.
>>  In Logs tree the tests.
>>  kill -s4 lrmd pid.
>>  1. STONITH
>>  2. STONITH
>>  3. hangs
>>  http://send2me.ru/pcmk-Wed-19-Feb-2014.tar.bz2
>>>  _______________________________________________
>>>  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>>  Project Home: http://www.clusterlabs.org
>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs: http://bugs.clusterlabs.org
>>  _______________________________________________
>>  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
>
> ,
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org