[Pacemaker] None of the standard agents in ocf:heartbeat are working in centos 6
David Barchas
dave at barchas.com
Mon Jul 23 05:06:42 UTC 2012
Hello.
I have been working on this for 3 days now, and must be so stressed out that I am being blinded to what is probably an obvious cause of this. In a word, HELP.
I am trying specifically to utilize ocf:heartbeat:IPaddr2, but this issue seems to occur with any of the ocf:heartbeat agents. I will just focus on IPaddr2 for purposes of figuring this out, but it happens exactly the same with any of the default agents. However, I can successfully use ocf:linbit:drbd for example. it seems to be limited to the RAs that are installed along with coro/pace in the resource-agents package.
I am using CentOS 6.3, fully updated (though this happens in 6.2 with no updates as well). Install pacemaker/coro from default repo. I have stripped everything down to figure this out in vmware and just install centos, update it, install pace/coro (no drbd for this discussion), configure coro, and then start it. pacemaker starts up fine (or at least I think its fine). I can set quorum ignore for example from crm. (crm configure property no-quorum-policy="ignore")
here is the process list
root 1447 0.3 0.6 556080 6636 ? Ssl 21:09 0:00 corosync
499 1453 0.0 0.5 88720 5556 ? S 21:09 0:00 \_ /usr/libexec/pacemaker/cib
root 1454 0.0 0.3 86968 3488 ? S 21:09 0:00 \_ /usr/libexec/pacemaker/stonithd
root 1455 0.0 0.2 76188 2492 ? S 21:09 0:00 \_ /usr/lib64/heartbeat/lrmd
499 1456 0.0 0.3 91160 3432 ? S 21:09 0:00 \_ /usr/libexec/pacemaker/attrd
499 1457 0.0 0.3 87440 3824 ? S 21:09 0:00 \_ /usr/libexec/pacemaker/pengine
499 1458 0.0 0.3 91312 3884 ? S 21:09 0:00 \_ /usr/libexec/pacemaker/crmd
499 is hacluster btw.
***BUT***
When I run as root the following:
# crm ra meta ocf:heartbeat:IPaddr2
I get this response:
lrmadmin[1484]: 2012/07/22_13:28:23 ERROR: lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply message of rmetadata with function get_ret_from_msg.
ERROR: ocf:heartbeat:IPaddr2: could not parse meta-data:
And this is in /var/log/messages:
Jul 22 16:35:14 MST lrmd: [48093]: ERROR: get_resource_meta: pclose failed: Resource temporarily unavailable
Jul 22 16:35:14 MST lrmd: [48093]: WARN: on_msg_get_metadata: empty metadata for ocf::heartbeat::IPaddr2.
Jul 22 16:35:14 MST lrmd: [48093]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 200 ms (> 100 ms) before being called (GSource: 0x187df10)
Jul 22 16:35:14 MST lrmd: [48093]: info: G_SIG_dispatch: started at 429616889 should have started at 429616869
Jul 22 16:35:14 MST lrmadmin: [48254]: ERROR: lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply message of rmetadata with function get_ret_from_msg.
I am using crm ra meta as a way to test, but crm will not accept my trying to add the resource as a primitive either.
In my research, I have found that often it's permissions. So just to rule that out i set my entire system to 777 permissions. no joy.
Another suggestion i find often has been to set OCF_ROOT (export OCF_ROOT=/usr/lib/ocf) and then do /usr/lib/ocf/resource.d/heartbeat/IPaddr2 meta-data.
That produces the desired output. But does not work before i export.
And CRM still does not accept my meta request
Another suggestion i find is to make sure that shellfuncs exists in the agents folder. the soft links exist
lrwxrwxrwx. 1 root root 32 Jul 22 04:08 .ocf-binaries -> ../../lib/heartbeat/ocf-binaries
lrwxrwxrwx. 1 root root 35 Jul 22 04:08 .ocf-directories -> ../../lib/heartbeat/ocf-directories
lrwxrwxrwx. 1 root root 35 Jul 22 04:08 .ocf-returncodes -> ../../lib/heartbeat/ocf-returncodes
lrwxrwxrwx. 1 root root 34 Jul 22 04:08 .ocf-shellfuncs -> ../../lib/heartbeat/ocf-shellfuncs
And just to make sure I did un-hidden soft links as well with no joy.
I have used assorted "how to's" to troubleshoot and make sure Im not missing something simple.
http://www.server-world.info/en/note?os=CentOS_6&p=pacemaker&f=1
http://snozberry.org/blog/2012/05/02/corosync-slash-pacemaker-on-centos-6/
one other strange (but might be normal) behavior is that I cannot manually start pacemaker via "service pacemaker start"
it fails, but I get no information in the logs. But I get the feeling this is normal behavior now?
# service pacemaker start
Starting Pacemaker Cluster Manager: [FAILED]
log shows 1 entry: Jul 22 22:00:50 MST pacemakerd[1511]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root
I have run through it about 30 times at this point.
I have tried cent 6.2 not updated. cent 6.3 fully updated. on a physical server (just in case my VM is doing something weird) and in VMs.
Frankly I am so baffled by this, and have been working so intensely on it, that I am hoping that I am just missing something subtle because of freaking out.
This should be very straightforward. No magic, but obviously "something" is amiss.
But what's really weird is that I cannot find a single post online of anyone having issues with the standard RAs like this.
I can try anything suggested, except change from centos 6. This is all being done in a pair of virtuals.
Any help or suggestions at all will be greatly appreciated.
I am a bit desperate now.
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120722/06595103/attachment-0003.html>
More information about the Pacemaker
mailing list