[Pacemaker] Manging Virtual Machine's resource

Fri May 16 18:15:08 UTC 2008

On 2008-05-16T12:15:03, Lon Hohberger <lhh at redhat.com> wrote:

> rgmanager:
>  * parent/child relationships for implicit start-after/stop-before
>    * attribute inheritance (we have talked about this in the past;
>      it isn't hard, and may be beneficial)
>    * specification of child resource type ordering to prevent major
>      "gotchas" when defining resource groups (e.g. putting a
>      script on a file system but putting them in the wrong order,
>      causing errors)
>  * 'primary' attribute specification (not OCF compliant) is used to
> identify resource instances

That's all just meta-data, right?

>  * use of LSB 'status' to implement OCF 'monitor' function (status isn't
> specified in the RA API, but the monitor function as specified appears
> to map to the LSB status function... so most of our agents do
> monitor->status, though depth is still supported - maybe yours are the
> same; haven't fully investigated)

monitor is _not_ 1:1 the LSB status. That's exactly why we're not using
status.  ;-)
http://www.linux-foundation.org/spec/refspecs/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

In particular, 3 vs 7 is a crucial difference, and we didn't want to
have to special-case the exit codes depending on the action being
called.

>  * multiple references to the same resource instance - reference counts
> are used to prevent starting the same resource on the same node multiple
> times

We use explicit dependencies and thus can reference the same
primitive/clone/group in as-many places as needed.

>  * rgmanager allows reconfiguration of resource parameters without
> restarting the resource; maybe pacemaker does too; haven't checked; uses
> <parameter name="xxx" reconfig="1" .../> in the meta-data to enable it.

Our instance_attributes support a "reload" setting.

> pacemaker:
>  * promote / demote resource operations
>  * UUIDs used to identify resource instances (I like this better than
> what we do with type:primary_attr in rgmanager)

Yeah, well, the UUIDs are not the grandest idea we ever had - nowadays
at least the GUI tries to generate a shorter unique id w/o the full
cumbersomeness of UUIDs.

>  * clone resources and operations used to start (more or less) the same
> resource on multiple nodes

> General:
>  * resource migrate is likely done differently; not sure though (maybe
> you can tell me?):
>     <resource-agent> migrate <target_host_name>

Our model is both push and pull compatible. On the source, we execute a
"migrate_to" command (the target_host is passed via the environment),
and on the target, a "migrate_from". (That makes sense if you consider
this as _commands_ given to the nodes, otherwise it seems kind of the
wrong way around ;-)

The migrate_from also is our way of checking whether the migration
succeeded; I guess in your case you then run a monitor/status on the
target?

> There will be more that I will come across, no doubt.  Those are just
> the ones on the surface.  I do not believe any of them are hard to deal
> with.

Right. I was in particular interested in understanding those differences
which affect the RA API, as that could possibly affect the usability of
RAs written for RHCS vs those written for ours. I think it's probably a
good idea to find some time to sit down and chat how to resolve these.

I've got a presentation from last year's BrainShare on what our scripts
do, that should be a usable starting point. Not much has changed since.

A further matter might be the shell scripts calling out to various
scripts which assume things in the environment - ie, we supply
ocf-shellfuncs (a shell source file) which defines ocf_log() and a few
others.

> I think we both diverged in a compatible way here:
>  * <parameter ... required="1" .../> means this parameter must be
> specified for a given resource instance.

A compatible divergence can't possibly be a diverge ;-)

> I believe the idea was to use virtual machines resources, with those
> virtual machines in a cluster of their own.

Ah, OK.

> To clarify the requirements as stated: they were in the context of an
> existing implementation.
> 
> Generally, with clustered virtual machines that can run on more than one
> physical node, at a bare minimum, you need to know only a few things on
> the physical hosts in order to implement fencing:
> 
>  * where a particular vm is and its current state, or
>  * where that vm "was", and
>    * the state of the host running the vm, and
>    * if "bad" or "Dead", whether fencing has completed
> 
> Certainly, pacemaker knows all of the above!

Right, of course. The external/xen STONITH script which we already have
could likely use crm_resource to find out and/or control the state of
the resource representing the DomU in the Dom0 cluster.

Now I see what you're saying.

> I doubt it would be difficult to make the existing fence agent/host
> preferentially use pacemaker to locate & kill VMs when possible (as
> opposed to simply talking to libvirt + AIS Checkpoint APIs as it does
> now).

I think at least some interaction here would be needed, because
otherwise, pacemaker/LRM would eventually run the monitor action, find
out that it's gone and restart it, which might not be what is desired
;-)

Regards,
    Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde