[Pacemaker] Manging Virtual Machine's resource
Lars Marowsky-Bree
lmb at suse.de
Fri May 16 18:15:08 UTC 2008
On 2008-05-16T12:15:03, Lon Hohberger <lhh at redhat.com> wrote:
> rgmanager:
> * parent/child relationships for implicit start-after/stop-before
> * attribute inheritance (we have talked about this in the past;
> it isn't hard, and may be beneficial)
> * specification of child resource type ordering to prevent major
> "gotchas" when defining resource groups (e.g. putting a
> script on a file system but putting them in the wrong order,
> causing errors)
> * 'primary' attribute specification (not OCF compliant) is used to
> identify resource instances
That's all just meta-data, right?
> * use of LSB 'status' to implement OCF 'monitor' function (status isn't
> specified in the RA API, but the monitor function as specified appears
> to map to the LSB status function... so most of our agents do
> monitor->status, though depth is still supported - maybe yours are the
> same; haven't fully investigated)
monitor is _not_ 1:1 the LSB status. That's exactly why we're not using
status. ;-)
http://www.linux-foundation.org/spec/refspecs/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
In particular, 3 vs 7 is a crucial difference, and we didn't want to
have to special-case the exit codes depending on the action being
called.
> * multiple references to the same resource instance - reference counts
> are used to prevent starting the same resource on the same node multiple
> times
We use explicit dependencies and thus can reference the same
primitive/clone/group in as-many places as needed.
> * rgmanager allows reconfiguration of resource parameters without
> restarting the resource; maybe pacemaker does too; haven't checked; uses
> <parameter name="xxx" reconfig="1" .../> in the meta-data to enable it.
Our instance_attributes support a "reload" setting.
> pacemaker:
> * promote / demote resource operations
> * UUIDs used to identify resource instances (I like this better than
> what we do with type:primary_attr in rgmanager)
Yeah, well, the UUIDs are not the grandest idea we ever had - nowadays
at least the GUI tries to generate a shorter unique id w/o the full
cumbersomeness of UUIDs.
> * clone resources and operations used to start (more or less) the same
> resource on multiple nodes
> General:
> * resource migrate is likely done differently; not sure though (maybe
> you can tell me?):
> <resource-agent> migrate <target_host_name>
Our model is both push and pull compatible. On the source, we execute a
"migrate_to" command (the target_host is passed via the environment),
and on the target, a "migrate_from". (That makes sense if you consider
this as _commands_ given to the nodes, otherwise it seems kind of the
wrong way around ;-)
The migrate_from also is our way of checking whether the migration
succeeded; I guess in your case you then run a monitor/status on the
target?
> There will be more that I will come across, no doubt. Those are just
> the ones on the surface. I do not believe any of them are hard to deal
> with.
Right. I was in particular interested in understanding those differences
which affect the RA API, as that could possibly affect the usability of
RAs written for RHCS vs those written for ours. I think it's probably a
good idea to find some time to sit down and chat how to resolve these.
I've got a presentation from last year's BrainShare on what our scripts
do, that should be a usable starting point. Not much has changed since.
A further matter might be the shell scripts calling out to various
scripts which assume things in the environment - ie, we supply
ocf-shellfuncs (a shell source file) which defines ocf_log() and a few
others.
> I think we both diverged in a compatible way here:
> * <parameter ... required="1" .../> means this parameter must be
> specified for a given resource instance.
A compatible divergence can't possibly be a diverge ;-)
> I believe the idea was to use virtual machines resources, with those
> virtual machines in a cluster of their own.
Ah, OK.
> To clarify the requirements as stated: they were in the context of an
> existing implementation.
>
> Generally, with clustered virtual machines that can run on more than one
> physical node, at a bare minimum, you need to know only a few things on
> the physical hosts in order to implement fencing:
>
> * where a particular vm is and its current state, or
> * where that vm "was", and
> * the state of the host running the vm, and
> * if "bad" or "Dead", whether fencing has completed
>
> Certainly, pacemaker knows all of the above!
Right, of course. The external/xen STONITH script which we already have
could likely use crm_resource to find out and/or control the state of
the resource representing the DomU in the Dom0 cluster.
Now I see what you're saying.
> I doubt it would be difficult to make the existing fence agent/host
> preferentially use pacemaker to locate & kill VMs when possible (as
> opposed to simply talking to libvirt + AIS Checkpoint APIs as it does
> now).
I think at least some interaction here would be needed, because
otherwise, pacemaker/LRM would eventually run the monitor action, find
out that it's gone and restart it, which might not be what is desired
;-)
Regards,
Lars
--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Pacemaker
mailing list