[Pacemaker] node status does not change even if pacemakerd dies

Kazunori INOUE inouekazu at intellilink.co.jp
Thu Apr 11 05:24:11 EDT 2013


Hi Andrew,

(13.03.01 11:10), Andrew Beekhof wrote:
> On Wed, Feb 13, 2013 at 8:14 PM, Kazunori INOUE
> <inouekazu at intellilink.co.jp> wrote:
>> Hi Andrew,
>>
>> Yes, please see attached pacemaker.conf. It controls only pacemakerd.
>
> I've pushed up the basic one in
> https://github.com/beekhof/pacemaker/commit/4bd8ac3
>
> Once you're happy with the pacemaker-corosync.conf version, let me
> know and we can update it.
>

I attached two upstart job files for pacemaker.

- pacemaker.conf.in
   It's basic job. I reviewed setting.
   Please replace it with mcp/pacemaker.upstart.

- pacemaker-corosync.conf.in
   Since jobs were added to Corosycn(*), this job uses them.

   * https://github.com/corosync/corosync/commit/ca389c3c598105223f30e2e760f92aa105e1c9b3

----
Best regards,
Kazunori INOUE


>>
>> Furthermore, I'm examining pacemaker-corosync.conf (it's a prototype) which
>> also controls corosync now.
>> This job starts corosync service before starting of pacemakerd, and stops
>> corosync service after the stop of pacemakerd. [1]
>>
>> - pacemaker-corosync.conf
>>    17
>>    18  pre-start script
>>    19      modprobe softdog soft_margin=60
>>    20      service corosync start               [1]
>>    21  end script
>>    22
>>    23  post-start script
>>    24      touch $LOCK_FILE
>>    25      pidof $prog > /var/run/$prog.pid
>>    26  end script
>>    27
>>    28  post-stop script
>>    29      rm -f $LOCK_FILE
>>    30      rm -f /var/run/$prog.pid
>>    31
>>    32      pidof crmd && killall -q -9 corosync
>>    33      pidof crmd || service corosync stop  [1]
>>    34  end script
>>
>> Line 32 is a somewhat tricky design.
>> When only pacemakerd disappeared, corosync is terminated immediately.
>> By doing so, the machine reboots by watchdog of corosync. (since we
>> want to poweroff/reset the machine *certainly* in this case.)
>>
>> Best Regards,
>> Kazunori INOUE
>>
>>
>> (13.02.08 10:03), Andrew Beekhof wrote:
>>> On Tue, Jan 22, 2013 at 9:09 PM, Kazunori INOUE
>>> <inouekazu at intellilink.co.jp> wrote:
>>>>
>>>> Hi Andrew,
>>>>
>>>> I understood that pacemakerd was not killed by OOM Killer.
>>>> However, because process failure may occur under the unexpected
>>>> circumstances, we let Upstart manage pacemakerd.
>>>
>>> This is an excellent idea.
>>> Do you have an upstart job for pacemaker that we can include in the source?
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
# pacemaker - High-Availability cluster resource manager
#
# Starts pacemakerd

stop on runlevel [0123456]
kill timeout 3600
respawn

env prog=pacemakerd
env rpm_sysconf=@sysconfdir@/sysconfig/pacemaker
env rpm_lockfile=@localstatedir@/lock/subsys/pacemaker
env deb_sysconf=@sysconfdir@/default/pacemaker
env deb_lockfile=@localstatedir@/lock/pacemaker

script
    [ -f "$rpm_sysconf" ] && . $rpm_sysconf
    [ -f "$deb_sysconf" ] && . $deb_sysconf
    exec $prog
end script

post-start script
    [ -f "$rpm_sysconf" ] && . $rpm_sysconf
    [ -f "$deb_sysconf" ] && . $deb_sysconf
    [ -z "$LOCK_FILE" -a -d @sysconfdir@/sysconfig ] && LOCK_FILE="$rpm_lockfile"
    [ -z "$LOCK_FILE" -a -d @sysconfdir@/default ] && LOCK_FILE="$deb_lockfile"
    touch $LOCK_FILE
    pidof $prog > @localstatedir@/run/$prog.pid
end script

post-stop script
    [ -f "$rpm_sysconf" ] && . $rpm_sysconf
    [ -f "$deb_sysconf" ] && . $deb_sysconf
    [ -z "$LOCK_FILE" -a -d @sysconfdir@/sysconfig ] && LOCK_FILE="$rpm_lockfile"
    [ -z "$LOCK_FILE" -a -d @sysconfdir@/default ] && LOCK_FILE="$deb_lockfile"
    rm -f $LOCK_FILE
    rm -f @localstatedir@/run/$prog.pid
end script

-------------- next part --------------
# pacemaker-corosync - High-Availability cluster
#
# Starts Corosync cluster engine and Pacemaker cluster manager.

kill timeout 3600

env prog=pacemakerd
env rpm_sysconf=@sysconfdir@/sysconfig/pacemaker
env rpm_lockfile=@localstatedir@/lock/subsys/pacemaker
env deb_sysconf=@sysconfdir@/default/pacemaker
env deb_lockfile=@localstatedir@/lock/pacemaker

script
    [ -f "$rpm_sysconf" ] && . $rpm_sysconf
    [ -f "$deb_sysconf" ] && . $deb_sysconf
    exec $prog
end script

pre-start script
    # setup the software watchdog which corosync uses in post-stop.
    # rewrite according to environment.
    modprobe softdog soft_margin=60
    start corosync

    # if you use corosync-notifyd, uncomment the line below.
    #start corosync-notifyd

    # give it time to fail.
    sleep 2
    pidof corosync || { exit 1; }
end script

post-start script
    [ -f "$rpm_sysconf" ] && . $rpm_sysconf
    [ -f "$deb_sysconf" ] && . $deb_sysconf
    [ -z "$LOCK_FILE" -a -d @sysconfdir@/sysconfig ] && LOCK_FILE="$rpm_lockfile"
    [ -z "$LOCK_FILE" -a -d @sysconfdir@/default ] && LOCK_FILE="$deb_lockfile"
    touch $LOCK_FILE
    pidof $prog > @localstatedir@/run/$prog.pid
end script

post-stop script
    [ -f "$rpm_sysconf" ] && . $rpm_sysconf
    [ -f "$deb_sysconf" ] && . $deb_sysconf
    [ -z "$LOCK_FILE" -a -d @sysconfdir@/sysconfig ] && LOCK_FILE="$rpm_lockfile"
    [ -z "$LOCK_FILE" -a -d @sysconfdir@/default ] && LOCK_FILE="$deb_lockfile"
    rm -f $LOCK_FILE
    rm -f @localstatedir@/run/$prog.pid

    # when pacemakerd disappeared unexpectedly, a machine is rebooted
    # by the watchdog of corosync.
    pidof crmd && killall -q -9 corosync
    stop corosync || true

    # if you use corosync-notifyd, uncomment the line below.
    #stop corosync-notifyd || true
end script



More information about the Pacemaker mailing list