[Pacemaker] Pacemaker remote nodes, naming, and attributes

Tue Jul 2 17:05:22 EDT 2013

Sorry for the delayed response, but I was out last week.  I've applied this
patch to 1.1.10-rc5 and have been testing:

# crm_attribute --type status --node "db02" --name "service_postgresql"
--update "true"
# crm_attribute --type status --node "db02" --name "service_postgresql"
scope=status  name=service_postgresql value=true
# crm resource stop vm-db02
# crm resource start vm-db02
### Wait a bit
# crm_attribute --type status --node "db02" --name "service_postgresql"
scope=status  name=service_postgresql value=(null)
Error performing operation: No such device or address
# crm_attribute --type status --node "db02" --name "service_postgresql"
--update "true"
# crm_attribute --type status --node "db02" --name "service_postgresql"
scope=status  name=service_postgresql value=true

Good so far.  But now look at this (every node was clean, and all services
were running, before we started):

# crm status
Last updated: Tue Jul  2 16:15:14 2013
Last change: Tue Jul  2 16:15:12 2013 via crmd on cvmh02
Stack: cman
Current DC: cvmh02 - partition with quorum
Version: 1.1.10rc5-1.el6.ccni-2718638
9 Nodes configured, unknown expected votes
59 Resources configured.

Node db02: UNCLEAN (offline)
Online: [ cvmh01 cvmh02 cvmh03 cvmh04 db02:vm-db02 ldap01:vm-ldap01
ldap02:vm-ldap02 ]
OFFLINE: [ swbuildsl6:vm-swbuildsl6 ]

Full list of resources:

 fence-cvmh01   (stonith:fence_ipmilan):        Started cvmh04
 fence-cvmh02   (stonith:fence_ipmilan):        Started cvmh04
 fence-cvmh03   (stonith:fence_ipmilan):        Started cvmh04
 fence-cvmh04   (stonith:fence_ipmilan):        Started cvmh01
 Clone Set: c-fs-libvirt-VM-xcm [fs-libvirt-VM-xcm]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-p-libvirtd [p-libvirtd]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-fs-bind-libvirt-VM-cvmh [fs-bind-libvirt-VM-cvmh]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-watch-ib0 [p-watch-ib0]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-fs-gpfs [p-fs-gpfs]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 vm-compute-test        (ocf::ccni:xcatVirtualDomain):  Started cvmh03
 vm-swbuildsl6  (ocf::ccni:xcatVirtualDomain):  Stopped
 vm-db02        (ocf::ccni:xcatVirtualDomain):  Started cvmh02
 vm-ldap01      (ocf::ccni:xcatVirtualDomain):  Started cvmh03
 vm-ldap02      (ocf::ccni:xcatVirtualDomain):  Started cvmh04
 DummyOnVM      (ocf::pacemaker:Dummy): Started cvmh01

Not so good, and I'm not sure how to clean this up.  I can't seem to stop
vm-db02 any more, even after I've entered:

# crm_node -R db02 --force
# crm resource start vm-db02

### Wait a bit

# crm status
Last updated: Tue Jul  2 16:32:38 2013
Last change: Tue Jul  2 16:27:28 2013 via cibadmin on cvmh01
Stack: cman
Current DC: cvmh02 - partition with quorum
Version: 1.1.10rc5-1.el6.ccni-2718638
8 Nodes configured, unknown expected votes
54 Resources configured.

Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ldap01:vm-ldap01 ldap02:vm-ldap02
swbuildsl6:vm-swbuildsl6 ]
OFFLINE: [ db02:vm-db02 ]

 fence-cvmh01   (stonith:fence_ipmilan):        Started cvmh03
 fence-cvmh02   (stonith:fence_ipmilan):        Started cvmh03
 fence-cvmh03   (stonith:fence_ipmilan):        Started cvmh04
 fence-cvmh04   (stonith:fence_ipmilan):        Started cvmh01
 Clone Set: c-fs-libvirt-VM-xcm [fs-libvirt-VM-xcm]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-p-libvirtd [p-libvirtd]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-fs-bind-libvirt-VM-cvmh [fs-bind-libvirt-VM-cvmh]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-watch-ib0 [p-watch-ib0]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-fs-gpfs [p-fs-gpfs]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 vm-compute-test        (ocf::ccni:xcatVirtualDomain):  Started cvmh02
 vm-swbuildsl6  (ocf::ccni:xcatVirtualDomain):  Started cvmh01
 vm-ldap01      (ocf::ccni:xcatVirtualDomain):  Started cvmh03
 vm-ldap02      (ocf::ccni:xcatVirtualDomain):  Started cvmh04
 DummyOnVM      (ocf::pacemaker:Dummy): Started cvmh01

My only recourse has been to reboot the cluster.  So let's do that and try
setting a location constraint on DummyOnVM, to force it on db02...

Last updated: Tue Jul  2 16:43:46 2013
Last change: Tue Jul  2 16:27:28 2013 via cibadmin on cvmh01
Stack: cman
Current DC: cvmh02 - partition with quorum
Version: 1.1.10rc5-1.el6.ccni-2718638
8 Nodes configured, unknown expected votes
54 Resources configured.

Online: [ cvmh01 cvmh02 cvmh03 cvmh04 db02:vm-db02 ldap01:vm-ldap01
ldap02:vm-ldap02 swbuildsl6:vm-swbuildsl6 ]

 fence-cvmh01   (stonith:fence_ipmilan):        Started cvmh04
 fence-cvmh02   (stonith:fence_ipmilan):        Started cvmh03
 fence-cvmh03   (stonith:fence_ipmilan):        Started cvmh04
 fence-cvmh04   (stonith:fence_ipmilan):        Started cvmh01
 Clone Set: c-fs-libvirt-VM-xcm [fs-libvirt-VM-xcm]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-p-libvirtd [p-libvirtd]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-fs-bind-libvirt-VM-cvmh [fs-bind-libvirt-VM-cvmh]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-watch-ib0 [p-watch-ib0]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 Clone Set: c-fs-gpfs [p-fs-gpfs]
     Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ]
     Stopped: [ db02 ldap01 ldap02 swbuildsl6 ]
 vm-compute-test        (ocf::ccni:xcatVirtualDomain):  Started cvmh01
 vm-swbuildsl6  (ocf::ccni:xcatVirtualDomain):  Started cvmh01
 vm-db02        (ocf::ccni:xcatVirtualDomain):  Started cvmh02
 vm-ldap01      (ocf::ccni:xcatVirtualDomain):  Started cvmh03
 vm-ldap02      (ocf::ccni:xcatVirtualDomain):  Started cvmh04
 DummyOnVM      (ocf::pacemaker:Dummy): Started cvmh03

# pcs constraint location DummyOnVM prefers db02
# crm status
...
Online: [ cvmh01 cvmh02 cvmh03 cvmh04 db02:vm-db02 ldap01:vm-ldap01
ldap02:vm-ldap02 swbuildsl6:vm-swbuildsl6 ]
...
 DummyOnVM      (ocf::pacemaker:Dummy): Started db02

That's what we want to see.  It would be interesting to stop db02.  I
expect DummyOnVM to stop.

# crm resource stop vm-db02
# crm status
...
Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ldap01:vm-ldap01 ldap02:vm-ldap02 ]
OFFLINE: [ db02:vm-db02 swbuildsl6:vm-swbuildsl6 ]
...
 DummyOnVM      (ocf::pacemaker:Dummy): Started cvmh02

Failed actions:
    vm-compute-test_migrate_from_0 (node=cvmh02, call=147, rc=1,
status=Timed Out, last-rc-change=Tue Jul  2 16:48:17 2013
, queued=20003ms, exec=0ms
): unknown error

Well, that is odd.  (It is the case that vm-swbuildsl6 has an order
dependency on vm-compute-test, as I was trying to understand how migrations
worked with order dependencies (not very well).  Once vm-compute-test
recovers, vm-swbuildsl6 does come back up.)  This isn't really very good --
if I am running services in VM or other containers, I need them to run only
in that container!

If I start vm-db02 back up, I see that DummyOnVM is stopped and moved to
db02.

On Thu, Jun 20, 2013 at 4:16 PM, David Vossel <dvossel at redhat.com> wrote:

> ----- Original Message -----
> > From: "David Vossel" <dvossel at redhat.com>
> > To: "The Pacemaker cluster resource manager" <
> pacemaker at oss.clusterlabs.org>
> > Sent: Thursday, June 20, 2013 1:35:44 PM
> > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes
> >
> > ----- Original Message -----
> > > From: "David Vossel" <dvossel at redhat.com>
> > > To: "The Pacemaker cluster resource manager"
> > > <pacemaker at oss.clusterlabs.org>
> > > Sent: Wednesday, June 19, 2013 4:47:58 PM
> > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes
> > >
> > > ----- Original Message -----
> > > > From: "Lindsay Todd" <rltodd.ml1 at gmail.com>
> > > > To: "The Pacemaker cluster resource manager"
> > > > <Pacemaker at oss.clusterlabs.org>
> > > > Sent: Wednesday, June 19, 2013 4:11:58 PM
> > > > Subject: [Pacemaker] Pacemaker remote nodes, naming, and attributes
> > > >
> > > > I built a set of rpms for pacemaker 1.1.0-rc4 and updated my test
> cluster
> > > > (hopefully won't be a "test" cluster forever), as well as my VMs
> running
> > > > pacemaker-remote. The OS everywhere is Scientific Linux 6.4. I am
> wanting
> > > > to
> > > > set some attributes on remote nodes, which I can use to control where
> > > > services run.
> > > >
> > > > The first deviation I note from the documentation is the naming of
> the
> > > > remote
> > > > nodes. I see:
> > > >
> > > >
> > > >
> > > >
> > > > Last updated: Wed Jun 19 16:50:39 2013
> > > > Last change: Wed Jun 19 16:19:53 2013 via cibadmin on cvmh04
> > > > Stack: cman
> > > > Current DC: cvmh02 - partition with quorum
> > > > Version: 1.1.10rc4-1.el6.ccni-d19719c
> > > > 8 Nodes configured, unknown expected votes
> > > > 49 Resources configured.
> > > >
> > > >
> > > > Online: [ cvmh01 cvmh02 cvmh03 cvmh04 db02:vm-db02 ldap01:vm-ldap01
> > > > ldap02:vm-ldap02 swbuildsl6:vm-swbuildsl6 ]
> > > >
> > > > Full list of resources:
> > > >
> > > > and so forth. The "remote-node" names are simply the hostname, so the
> > > > vm-db02
> > > > VirtualDomain resource has a remote-node name of db02. The "Pacemaker
> > > > Remote" manual suggests this should be displayed as "db02", not
> > > > "db02:vm-db02", although I can see how the latter format would be
> useful.
> > >
> > > Yep, this got changed since the documentation was published.  We wanted
> > > people to be able to recognize which remote-node went with which
> resource
> > > easily.
> > >
> > > >
> > > > So now let's set an attribute on this remote node. What name do I
> use?
> > > > How
> > > > about:
> > > >
> > > >
> > > >
> > > >
> > > > # crm_attribute --node "db02:vm-db02" \
> > > > --name "service_postgresql" \
> > > > --update "true"
> > > > Could not map name=db02:vm-db02 to a UUID
> > > > Please choose from one of the matches above and suppy the 'id' with
> > > > --attr-id
> > > >
> > > > Perhaps not the most informative output, but obviously it fails.
> Let's
> > > > try
> > > > the unqualified name:
> > > >
> > > >
> > > >
> > > >
> > > > # crm_attribute --node "db02" \
> > > > --name "service_postgresql" \
> > > > --update "true"
> > > > Remote-nodes do not maintain permanent attributes,
> > > > 'service_postgresql=true'
> > > > will be removed after db02 reboots.
> > > > Error setting service_postgresql=true (section=status,
> set=status-db02):
> > > > No
> > > > such device or address
> > > > Error performing operation: No such device or address
> >
> > I just tested this and ran into the same errors you did.  Turns out this
> > happens when the remote-node's status section is empty.  If you start a
> > resource on the node and then set the attribute it will work... obviously
> > this is a bug. I'm working on a fix.
>
> This should help with the attributes bit.
>
>
> https://github.com/ClusterLabs/pacemaker/commit/26d34a9171bddae67c56ebd8c2513ea8fa770204
>
> -- Vossel
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130702/9b402983/attachment-0002.html>