[Pacemaker] Signal hangup handling for pacemaker and corosync

Fri Jul 25 09:53:02 CEST 2014

25.07.2014 02:20, Andrew Beekhof wrote:

...

>>>>>>>>>>> On 15 Jul 2014, at 8:00 pm, Arjun Pandey <apandepublic at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Right. Actually the issue i am facing is that i am starting the
>>>>>>>>>>>> pacemaker service remotely from a wrapper and thus pacemakerd dies
>>>>>>>>>>>> when the wrapper exits.nohup solves the problem but then HUP cannot be
>>>>>>>>>>>> used by pacemaker. Is this workaround ok ?
>>>>>>>>>>>
>>>>>>>>>>> I guess. How are you starting pacemaker?  Usually its with some variant of 'service pacemaker start'.
>>>>>>>>>> I am using 'service pacemaker start'. However this is being called
>>>>>>>>>> from my script. So when the script exits pacemaker gets SIGHUP.
>>>>>>>>>
>>>>>>>>> Release testing starts clusters as:
>>>>>>>>>
>>>>>>>>> ssh -l root somenode -- service pacemaker start
>>>>>>>>
>>>>>>>> It could depend on what "service" is.
>>>>>>>> It would either schedule systemd to run job (el7/fc18+), or just run
>>>>>>>> init script itself (el6). In latter case, if process didn't detach from
>>>>>>>> its controlling terminal when that terminal gone away, it will be sent a
>>>>>>>> SIGHUP.
>>>>>>>
>>>>>>> Except we test rhel6 the same way...
>>>>>>
>>>>>> I understand. This issue is from "sometimes happens on some systems"
>>>>>> folder. I recall I had problems ages ago with a daemon run from rc.local
>>>>>> sometimes exists with HUP. 'sleep 1' after its launch was the easiest fix.
>>>>>
>>>>> How about: https://github.com/beekhof/pacemaker/commit/95175f5
>>>>
>>>> That doesn't hurt, but could be just not enough, as pacemakerd does not
>>>> daemonize itself, but is put into background by shell means. Thus, when
>>>> you add signal handler, pacemakerd already runs some time in the
>>>> background. If terminal (ssh session) disconnects before signal handler
>>>> is installed, then process exits anyways.
>>>
>>> Just moving it earlier would seem the simplest option
>>
>> Yes, but it still remains racy. Strictly speaking, handler should be
>> installed in the child before the parent process exits. And you cannot
>> control this when shell does the fork.
> 
> I guess this is one of the few times I've been grateful for systemd

The main question for me is "Why does it work at all on EL6?" ;)
Particularly, why it doesn't stop when the init sequence finishes...
Probably because of ' > /dev/null 2>&1 ', but stdin is still not closed
there (f.e. with '  0<&- '). see
http://stackoverflow.com/questions/3430330/best-way-to-make-a-shell-script-daemon,
answers 2 and 3 are relevant for daemonization in shell.
Or, that may be an init (upstart) boot-sequence implementation side-effect.

Actually, daemonization code in C is not so hard to write properly.
corosync_tty_detach() is a pretty good example.

> 
>>
>>>
>>>> I'd suggest to add '-d' option and daemonize (double-fork or fork+setsid
>>>> plus common daemonization cleanups) if it is set after signal handlers
>>>> are installed but before main loop is run.
>>>> Also, SIGTTIN and SIGTTOU could be added to ignore list for a daemon mode.