[Pacemaker] chkconfig values in MCP init script (again)

Vladislav Bogdanov bubble at hoster-ok.com
Tue Sep 21 12:24:14 UTC 2010


Hi Andrew, hi all.

I decided to return to this issue again because of issues with
libvirt/KVM virtual domains controlled by pacemaker.

libvirt package on Fedora 13 has two init scripts: libvirtd and
libvirt-guests.
They have following chkconfig values:
libvirtd: 97 03
libvirt-guests: 98 02

Currently pacemaker MCP has 90 10.

If one wants to control libvirtd and virtual domains as HA resources
from within pacemaker, the first solution which comes to mind would be
to disable both libvirtd init scripts (set them to 'off' state).

So,
chkconfig libvirtd off
chkconfig libvirt-guests off,

Then add lsb libvirtd resource clone to pacemaker and then add
VirtualDomain resources. I actually didn't try to move libvirtd control
to pacemaker yet, just discovering possible pitfalls.

Unfortunately, this will not (?) work as seamlessly as expected:
While libvirtd will be skipped during initscripts start sequence and
started by pacemaker, which is OK, there should be some problems during
stop sequence execution:
1) (02) init stops libvirt-guests (saves their state and powers them off)
2) (03) init stops libvirtd.
3) (10) init sends stop signal to pacemaker MCP
4) pacemaker does unneeded movements trying to recover resources (I
suppose so)

What I see with libvirtd run from init - virtual domains hibernated, and
pacemaker starts them again right after that (it doesn't know that
system is shutting down yet). Then libvirtd is stopped and pacemaker
looses control on VirtualDomain resources, moving them to 'Started
(unmanaged)' state. Then pacemaker hangs (for a long time at least)
trying to stop all resources. I suppose that this is where stonith
should do the trick (it is disabled yet). I understand that my setup
could be considered "broken" in its current state, but problem is a bit
wider. Actually, no LSB resources should be stopped be init while
pacemaker runs, because that resources could be and will be
(incorrectly) considered by init as a subject to control.

Next what one can do is to remove such LSB resources from init's
"service zone" by issuing "chkconfig --del <service>". That will work,
but if some RPM package has "broken" (actually not) 'post' script, which
unconditionally add service to init's service zone again, then after
upgrade of such package system will return to the same state as before.

So, the next solution would be to move pacemaker to run really last (99)
and stop really first (01). This is what Vadim Chepkov suggested earlier
and what I am inclined to do (at least for my RPM packages). Of course,
there are services which have 99 01 too, but I'd shut eyes on them.



More information about the Pacemaker mailing list