[Pacemaker] node status does not change even if pacemakerd dies

Tue Jan 8 00:16:10 UTC 2013

On Wed, Dec 19, 2012 at 8:15 PM, Kazunori INOUE
<inouekazu at intellilink.co.jp> wrote:
> (12.12.13 08:26), Andrew Beekhof wrote:
>>
>> On Wed, Dec 12, 2012 at 8:02 PM, Kazunori INOUE
>> <inouekazu at intellilink.co.jp> wrote:
>>>
>>>
>>> Hi,
>>>
>>> I recognize that pacemakerd is much less likely to crash.
>>> However, a possibility of being killed by OOM_Killer etc. is not 0%.
>>
>>
>> True.  Although we just established in another thread that we don't
>> have any leaks :)
>>
>>> So I think that a user gets confused. since behavior at the time of
>>> process
>>> death differs even if pacemakerd is running.
>>>
>>> case A)
>>>   When pacemakerd and other processes (crmd etc.) are the parent-child
>>> relation.
>>>
>>
>> [snip]
>>
>>>
>>>   For example, crmd died.
>>>   However, since it is relaunched, the state of the cluster is not
>>> affected.
>>
>>
>> Right.
>>
>> [snip]
>>
>>>
>>> case B)
>>>   When pacemakerd and other processes are NOT the parent-child relation.
>>>   Although pacemakerd was killed, it assumed the state where it was
>>> respawned
>>> by Upstart.
>>>
>>>    $ service corosync start ; service pacemaker start
>>>    $ pkill -9 pacemakerd
>>>    $ ps -ef|egrep 'corosync|pacemaker|UID'
>>>    UID      PID  PPID  C STIME TTY       TIME CMD
>>>    root   21091     1  1 14:52 ?     00:00:00 corosync
>>>    496    21099     1  0 14:52 ?     00:00:00 /usr/libexec/pacemaker/cib
>>>    root   21100     1  0 14:52 ?     00:00:00
>>> /usr/libexec/pacemaker/stonithd
>>>    root   21101     1  0 14:52 ?     00:00:00 /usr/libexec/pacemaker/lrmd
>>>    496    21102     1  0 14:52 ?     00:00:00
>>> /usr/libexec/pacemaker/attrd
>>>    496    21103     1  0 14:52 ?     00:00:00
>>> /usr/libexec/pacemaker/pengine
>>>    496    21104     1  0 14:52 ?     00:00:00 /usr/libexec/pacemaker/crmd
>>>    root   21128     1  1 14:53 ?     00:00:00 /usr/sbin/pacemakerd
>>
>>
>> Yep, looks right.
>>
>
> Hi Andrew,
>
> We discussed this behavior.
> Behavior when pacemakerd and other processes are not parent-child
> relation (case B) reached the conclusion that there is room for
> improvement.
>
> Since not all users are experts, they may kill pacemakerd accidentally.
> Such a user will get confused if the behavior after crmd death changes
> with the following conditions.
> case A: pacemakerd and others (crmd etc.) are the parent-child relation.
> case B: pacemakerd and others are not the parent-child relation.
>
> So, we want to *always* obtain the same behavior as the case where
> there is parent-child relation.
> That is, when crmd etc. die, we want pacemaker to always relaunch
> the process always immediately.

No. Sorry.
Writing features to satisfy an artificial test case is not a good practice.

We can speed up the failure detection for case B (I'll agree that 60s
is way too long, 5s or 2s might be better depending on the load is
creates), but causing downtime now to _maybe_ avoid downtime in the
future makes no sense.
Especially when you consider that the node will likely be fenced if
the crmd fails anyway.

Take a look at the logs from a some ComponentFail test runs and you'll
see that the parent-child relationship regularly _fails_ to prevent
downtime.

>
> Regards,
> Kazunori INOUE
>
>
>>>   In this case, the node will be set to UNCLEAN if crmd dies.
>>>   That is, the node will be fenced if there is stonith resource.
>>
>>
>> Which is exactly what happens if only pacemakerd is killed with your
>> proposal.
>> Except now you have time to do a graceful pacemaker restart to
>> re-establish the parent-child relationship.
>>
>> If you want to compare B with something, it needs to be with the old
>> "children terminate if pacemakerd dies" strategy.
>> Which is:
>>
>>>    $ service corosync start ; service pacemaker start
>>>    $ pkill -9 pacemakerd
>>>   ... the node will be set to UNCLEAN
>>
>>
>> Old way: always downtime because children terminate which triggers fencing
>> Our way: no downtime unless there is an additional failure (to the cib or
>> crmd)
>>
>> Given that we're trying for HA, the second seems preferable.
>>
>>>
>>>    $ pkill -9 crmd
>>>    $ crm_mon -1
>>>    Last updated: Wed Dec 12 14:53:48 2012
>>>    Last change: Wed Dec 12 14:53:10 2012 via crmd on dev2
>>>
>>>    Stack: corosync
>>>    Current DC: dev2 (2472913088) - partition with quorum
>>>    Version: 1.1.8-3035414
>>>
>>>    2 Nodes configured, unknown expected votes
>>>    0 Resources configured.
>>>
>>>    Node dev1 (2506467520): UNCLEAN (online)
>>>    Online: [ dev2 ]
>>>
>>>
>>> How about making behavior selectable with an option?
>>
>>
>> MORE_DOWNTIME_PLEASE=(true|false) ?
>>
>>>
>>> When pacemakerd dies,
>>> mode A) which behaves in an existing way. (default)
>>> mode B) which makes the node UNCLEAN.
>>>
>>> Best Regards,
>>> Kazunori INOUE
>>>
>>>
>>>
>>>> Making stop work when there is no pacemakerd process is a different
>>>> matter. We can make that work.
>>>>
>>>>>
>>>>> Though the best solution is to relaunch pacemakerd, if it is difficult,
>>>>> I think that a shortcut method is to make a node unclean.
>>>>>
>>>>>
>>>>> And now, I tried Upstart a little bit.
>>>>>
>>>>> 1) started the corosync and pacemaker.
>>>>>
>>>>>    $ cat /etc/init/pacemaker.conf
>>>>>    respawn
>>>>>    script
>>>>>        [ -f /etc/sysconfig/pacemaker ] && {
>>>>>            . /etc/sysconfig/pacemaker
>>>>>        }
>>>>>        exec /usr/sbin/pacemakerd
>>>>>    end script
>>>>>
>>>>>    $ service co start
>>>>>    Starting Corosync Cluster Engine (corosync):               [  OK  ]
>>>>>    $ initctl start pacemaker
>>>>>    pacemaker start/running, process 4702
>>>>>
>>>>>
>>>>>    $ ps -ef|egrep 'corosync|pacemaker'
>>>>>    root   4695     1  0 17:21 ?    00:00:00 corosync
>>>>>    root   4702     1  0 17:21 ?    00:00:00 /usr/sbin/pacemakerd
>>>>>    496    4703  4702  0 17:21 ?    00:00:00 /usr/libexec/pacemaker/cib
>>>>>    root   4704  4702  0 17:21 ?    00:00:00
>>>>> /usr/libexec/pacemaker/stonithd
>>>>>    root   4705  4702  0 17:21 ?    00:00:00 /usr/libexec/pacemaker/lrmd
>>>>>    496    4706  4702  0 17:21 ?    00:00:00
>>>>> /usr/libexec/pacemaker/attrd
>>>>>    496    4707  4702  0 17:21 ?    00:00:00
>>>>> /usr/libexec/pacemaker/pengine
>>>>>    496    4708  4702  0 17:21 ?    00:00:00 /usr/libexec/pacemaker/crmd
>>>>>
>>>>> 2) killed pacemakerd.
>>>>>
>>>>>    $ pkill -9 pacemakerd
>>>>>
>>>>>    $ ps -ef|egrep 'corosync|pacemaker'
>>>>>    root   4695     1  0 17:21 ?    00:00:01 corosync
>>>>>    496    4703     1  0 17:21 ?    00:00:00 /usr/libexec/pacemaker/cib
>>>>>    root   4704     1  0 17:21 ?    00:00:00
>>>>> /usr/libexec/pacemaker/stonithd
>>>>>    root   4705     1  0 17:21 ?    00:00:00 /usr/libexec/pacemaker/lrmd
>>>>>    496    4706     1  0 17:21 ?    00:00:00
>>>>> /usr/libexec/pacemaker/attrd
>>>>>    496    4707     1  0 17:21 ?    00:00:00
>>>>> /usr/libexec/pacemaker/pengine
>>>>>    496    4708     1  0 17:21 ?    00:00:00 /usr/libexec/pacemaker/crmd
>>>>>    root   4760     1  1 17:24 ?    00:00:00 /usr/sbin/pacemakerd
>>>>>
>>>>> 3) then I stopped pacemakerd. however, some processes did not stop.
>>>>>
>>>>>    $ initctl stop pacemaker
>>>>>    pacemaker stop/waiting
>>>>>
>>>>>
>>>>>    $ ps -ef|egrep 'corosync|pacemaker'
>>>>>    root   4695     1  0 17:21 ?    00:00:01 corosync
>>>>>    496    4703     1  0 17:21 ?    00:00:00 /usr/libexec/pacemaker/cib
>>>>>    root   4704     1  0 17:21 ?    00:00:00
>>>>> /usr/libexec/pacemaker/stonithd
>>>>>    root   4705     1  0 17:21 ?    00:00:00 /usr/libexec/pacemaker/lrmd
>>>>>    496    4706     1  0 17:21 ?    00:00:00
>>>>> /usr/libexec/pacemaker/attrd
>>>>>    496    4707     1  0 17:21 ?    00:00:00
>>>>> /usr/libexec/pacemaker/pengine
>>>>>
>>>>> Best Regards,
>>>>> Kazunori INOUE
>>>>>
>>>>>
>>>>>>>> This isnt the case when the plugin is in use though, but then I'd
>>>>>>>> also
>>>>>>>> have expected most of the processes to die also.
>>>>>>>>
>>>>>>> Since node status will also change if such a result is brought,
>>>>>>> we desire to become so.
>>>>>>>
>>>>>>>>>
>>>>>>>>> ----
>>>>>>>>> $ cat /etc/redhat-release
>>>>>>>>> Red Hat Enterprise Linux Server release 6.3 (Santiago)
>>>>>>>>>
>>>>>>>>> $ ./configure --sysconfdir=/etc --localstatedir=/var
>>>>>>>>> --without-cman
>>>>>>>>> --without-heartbeat
>>>>>>>>> -snip-
>>>>>>>>> pacemaker configuration:
>>>>>>>>>       Version                  = 1.1.8 (Build: 9c13d14)
>>>>>>>>>       Features                 = generated-manpages agent-manpages
>>>>>>>>>       ascii-docs
>>>>>>>>> publican-docs ncurses libqb-logging libqb-ipc lha-fencing
>>>>>>>>>     corosync-native
>>>>>>>>> snmp
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> $ cat config.log
>>>>>>>>> -snip-
>>>>>>>>> 6000 | #define BUILD_VERSION "9c13d14"
>>>>>>>>> 6001 | /* end confdefs.h.  */
>>>>>>>>> 6002 | #include <gio/gio.h>
>>>>>>>>> 6003 |
>>>>>>>>> 6004 | int
>>>>>>>>> 6005 | main ()
>>>>>>>>> 6006 | {
>>>>>>>>> 6007 | if (sizeof (GDBusProxy))
>>>>>>>>> 6008 |        return 0;
>>>>>>>>> 6009 |   ;
>>>>>>>>> 6010 |   return 0;
>>>>>>>>> 6011 | }
>>>>>>>>> 6012 configure:32411: result: no
>>>>>>>>> 6013 configure:32417: WARNING: Unable to support systemd/upstart.
>>>>>>>>> You need
>>>>>>>>> to use glib >= 2.26
>>>>>>>>> -snip-
>>>>>>>>> 6286 | #define BUILD_VERSION "9c13d14"
>>>>>>>>> 6287 | #define SUPPORT_UPSTART 0
>>>>>>>>> 6288 | #define SUPPORT_SYSTEMD 0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Kazunori INOUE
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> related bugzilla:
>>>>>>>>>>> http://bugs.clusterlabs.org/show_bug.cgi?id=5064
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Kazunori INOUE
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>
>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>> Getting started:
>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org