[Pacemaker] limiting number of simultaneous acions for particular resource type

Dejan Muhamedagic dejanmm at fastmail.fm
Wed Nov 25 14:45:03 UTC 2009


Hi,

On Wed, Nov 25, 2009 at 02:37:23PM +0100, Nikola Ciprich wrote:
> Hello everybody,
> I'm trying to solve following issue:
> I've got specific resource type (virtual machine in particular)
> which takes quite long to start/stop and those actions cause
> considerable load on hosting system. on my cluster we're
> running tens of instances of vm resources, and trying to
> shutdown pacemaker on node causes it trying to stop many of
> those resources in parallel, which causes heavy machine
> overload. Then operations start timing out and whole cluster
> goes nuts. Is it possible to set some kind of constraint so
> that not more than ie 2 parallel actions are executed in time
> for vm class resource? I can't group them using group resource,
> because some of those can have target-role set to stopped if
> they're not needed...  Or how can I at least set some global
> limit on number of simultaneous actions in general? If
> possible, I'd like to limit even the monitor actions so they
> run in serial if possible...

Somebody else (Dominik I think) had a similar issue, but can't
recall the outcome now. At any rate, it's possible to set the
global limit on parallel actions per node in lrmd. It is included
in /etc/init.d/openais, but probably not in
/etc/init.d/heartbeat. This is how it's set:

# lrmadmin -p max-children $LRMD_MAX_CHILDREN

The default is 4. A child of the lrmd is actually an RA process
running some action (monitor, start, etc).

It's a bit more complicated in the init script since we have to
make sure that lrmd is ready to serve requests. This is the
relevant part:

wait_for_lrmd() {
        local maxwait=30
        local i=0
        while [ $i -lt $maxwait ]; do
                test -S /var/run/heartbeat/lrm_cmd_sock >/dev/null 2>&1 &&
                        break
                sleep 1
                i=$(($i+1))
        done
        if [ $i -lt $maxwait ]; then
                return 0
        else
                echo "lrmd apparently didn't start"
                return 1
        fi
}
set_lrmd_options() {
        if [ -n "$LRMD_MAX_CHILDREN" ]; then
                wait_for_lrmd || return
                $LRMADMIN -p max-children $LRMD_MAX_CHILDREN
        fi
}

I'll have that bit added to heartbeat for the next release.

Thanks,

Dejan

> Thanks a lot in advance!
> with best regards
> nik
> 
> -- 
> -------------------------------------
> Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
> 
> tel.:   +420 596 603 142
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: servis at linuxbox.cz
> -------------------------------------
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker




More information about the Pacemaker mailing list