[Pacemaker] hangs pending
Andrey Groshev
greenx at yandex.ru
Wed Feb 19 07:05:44 UTC 2014
19.02.2014, 09:49, "Andrew Beekhof" <andrew at beekhof.net>:
> On 19 Feb 2014, at 4:18 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>
>> 19.02.2014, 09:08, "Andrew Beekhof" <andrew at beekhof.net>:
>>> On 19 Feb 2014, at 4:00 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>> 19.02.2014, 06:48, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>> On 18 Feb 2014, at 11:05 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>> Hi, ALL and Andrew!
>>>>>>
>>>>>> Today is a good day - I killed a lot, and a lot of shooting at me.
>>>>>> In general - I am happy (almost like an elephant) :)
>>>>>> Except resources on the node are important to me eight processes: corosync,pacemakerd,cib,stonithd,lrmd,attrd,pengine,crmd.
>>>>>> I killed them with different signals (4,6,11 and even 9).
>>>>>> Behavior does not depend of number signal - it's good.
>>>>>> If STONITH send reboot to the node - it rebooted and rejoined the cluster - too it's good.
>>>>>> But the behavior is different from killing various demons.
>>>>>>
>>>>>> Turned four groups:
>>>>>> 1. corosync,cib - STONITH work 100%.
>>>>>> Kill via any signals - call STONITH and reboot.
>>>>>>
>>>>>> 2. lrmd,crmd - strange behavior STONITH.
>>>>>> Sometimes called STONITH - and the corresponding reaction.
>>>>>> Sometimes restart daemon and restart resources with large delay MS:pgsql.
>>>>>> One time after restart crmd - pgsql don't restart.
>>>>>>
>>>>>> 3. stonithd,attrd,pengine - not need STONITH
>>>>>> This daemons simple restart, resources - stay running.
>>>>>>
>>>>>> 4. pacemakerd - nothing happens.
>>>>>> And then I can kill any process of the third group. They do not restart.
>>>>>> Generaly don't touch corosync,cib and maybe lrmd,crmd.
>>>>>>
>>>>>> What do you think about this?
>>>>>> The main question of this topic - we decided.
>>>>>> But this varied behavior - another big problem.
>>>>>>
>>>>>> Forgоt logs http://send2me.ru/pcmk-Tue-18-Feb-2014.tar.bz2
>>>>> Which of the various conditions above do the logs cover?
>>>> All various in day.
>>> Are you trying to torture me?
>>> Can you give me a rough idea what happened when?
>> No, there is 8 processes on the 4th signal and repeats the experiments with unknown outcome :)
>> Easier to conduct new experiments and individual new logs .
>> Which variant is more interesting?
>
> The long delay in restarting pgsql.
> Everything else seems correct.
>
> ,
Now build tests, first and second - STONITH, third lrmd restart and wait.
clonePing work - but hir is "stateless"
I'll wait pgsql start and build crm_report (already 10 munuts)
While I see in crm_simulate -sLVVVV:
.......
debug: native_assign_node: All nodes for resource pgsql:3 are unavailable, unclean or shutting down (dev-cluster2-node2: 1, -1000000)
debug: native_assign_node: Could not allocate a node for pgsql:3
info: native_color: Resource pgsql:3 cannot run anywhere
debug: clone_color: Allocated 3 msPostgresql instances of a possible 4
......
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list