[ClusterLabs] [OCF] Pacemaker reports a multi-state clone resource instance as running while it is not in fact
Bogdan Dobrelya
bdobrelia at mirantis.com
Thu Feb 4 14:43:29 UTC 2016
Hello.
Regarding the original issue, good news are the resource-agents
ocf-shellfuncs is no more causing fork bombs to the dummy OCF RA [0]
after the fix [1] done. The bad news are that "self-forking" monitors
issue seems remaining for the rabbitmq OCF RA [2], and I can reproduce
it for another custom agent [3], so I'd guess it may be a valid for
another ones as well.
IIUC, the issue seems related to how lrmd's forking monitor actions.
I tried to debug both pacemaker 1.1.10, 1.1.12 with gdb as the following:
# cat ./cmds
set follow-fork-mode child
set detach-on-fork off
set follow-exec-mode new
catch fork
catch vfork
cont
# gdb -x cmds /usr/lib/pacemaker/lrmd `pgrep lrmd`
I can confirm it catches forked monitors and makes nested forks as well.
But I have *many* debug symbols missing, bt is full of question marks
and, honestly, I'm not a gdb guru and do not now that to check in for
reproduced cases.
So any help with how to troubleshooting things further are very appreciated!
[0] https://github.com/bogdando/dummy-ocf-ra
[1] https://github.com/ClusterLabs/resource-agents/issues/734
[2]
https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf
[3]
https://git.openstack.org/cgit/openstack/fuel-library/tree/files/fuel-ha-utils/ocf/ns_vrouter
On 04.01.2016 17:33, Bogdan Dobrelya wrote:
> On 04.01.2016 17:14, Dejan Muhamedagic wrote:
>> Hi,
>>
>> On Mon, Jan 04, 2016 at 04:52:43PM +0100, Bogdan Dobrelya wrote:
>>> On 04.01.2016 16:36, Ken Gaillot wrote:
>>>> On 01/04/2016 09:25 AM, Bogdan Dobrelya wrote:
>>>>> On 04.01.2016 15:50, Bogdan Dobrelya wrote:
>> [...]
>>>>> Also note, that lrmd spawns *many* monitors like:
>>>>> root 6495 0.0 0.0 70268 1456 ? Ss 2015 4:56 \_
>>>>> /usr/lib/pacemaker/lrmd
>>>>> root 31815 0.0 0.0 4440 780 ? S 15:08 0:00 | \_
>>>>> /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
>>>>> root 31908 0.0 0.0 4440 388 ? S 15:08 0:00 |
>>>>> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
>>>>> root 31910 0.0 0.0 4440 384 ? S 15:08 0:00 |
>>>>> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
>>>>> root 31915 0.0 0.0 4440 392 ? S 15:08 0:00 |
>>>>> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
>>>>> ...
>>>>
>>>> At first glance, that looks like your monitor action is calling itself
>>>> recursively, but I don't see how in your code.
>>>
>>> Yes, it should be a bug in the ocf-shellfuncs's ocf_log().
>>
>> If you're sure about that, please open an issue at
>> https://github.com/ClusterLabs/resource-agents/issues
>
> Submitted [0]. Thank you!
> Note, that it seems the very import action causes the issue, not the
> ocf_run or ocf_log code itself.
>
> [0] https://github.com/ClusterLabs/resource-agents/issues/734
>
>>
>> Thanks,
>>
>> Dejan
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
--
Best regards,
Bogdan Dobrelya,
Irc #bogdando
More information about the Users
mailing list