[Pacemaker] None of the standard agents in ocf:heartbeat are working in centos 6

Mon Jul 23 05:06:42 UTC 2012

Hello. 

I have been working on this for 3 days now, and must be so stressed out that I am being blinded to what is probably an obvious cause of this. In a word, HELP.

I am trying specifically to utilize ocf:heartbeat:IPaddr2, but this issue seems to occur with any of the ocf:heartbeat agents. I will just focus on IPaddr2 for purposes of figuring this out, but it happens exactly the same with any of the default agents. However, I can successfully use ocf:linbit:drbd for example. it seems to be limited to the RAs that are installed along with coro/pace in the resource-agents package.

I am using CentOS 6.3, fully updated (though this happens in 6.2 with no updates as well). Install pacemaker/coro from default repo. I have stripped everything down to figure this out in vmware and just install centos, update it, install pace/coro (no drbd for this discussion), configure coro, and then start it. pacemaker starts up fine (or at least I think its fine). I can set quorum ignore for example from crm. (crm configure property no-quorum-policy="ignore")

here is the process list
root      1447  0.3  0.6 556080  6636 ?        Ssl  21:09   0:00 corosync
499       1453  0.0  0.5  88720  5556 ?        S    21:09   0:00  \_ /usr/libexec/pacemaker/cib
root      1454  0.0  0.3  86968  3488 ?        S    21:09   0:00  \_ /usr/libexec/pacemaker/stonithd
root      1455  0.0  0.2  76188  2492 ?        S    21:09   0:00  \_ /usr/lib64/heartbeat/lrmd
499       1456  0.0  0.3  91160  3432 ?        S    21:09   0:00  \_ /usr/libexec/pacemaker/attrd
499       1457  0.0  0.3  87440  3824 ?        S    21:09   0:00  \_ /usr/libexec/pacemaker/pengine
499       1458  0.0  0.3  91312  3884 ?        S    21:09   0:00  \_ /usr/libexec/pacemaker/crmd

499 is hacluster btw.

***BUT***

When I run as root the following:
# crm ra meta ocf:heartbeat:IPaddr2

I get this response:
lrmadmin[1484]: 2012/07/22_13:28:23 ERROR: lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply message of rmetadata with function get_ret_from_msg.
ERROR: ocf:heartbeat:IPaddr2: could not parse meta-data: 

And this is in /var/log/messages:
Jul 22 16:35:14 MST lrmd: [48093]: ERROR: get_resource_meta: pclose failed: Resource temporarily unavailable
Jul 22 16:35:14 MST lrmd: [48093]: WARN: on_msg_get_metadata: empty metadata for ocf::heartbeat::IPaddr2.
Jul 22 16:35:14 MST lrmd: [48093]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 200 ms (> 100 ms) before being called (GSource: 0x187df10)
Jul 22 16:35:14 MST lrmd: [48093]: info: G_SIG_dispatch: started at 429616889 should have started at 429616869
Jul 22 16:35:14 MST lrmadmin: [48254]: ERROR: lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply message of rmetadata with function get_ret_from_msg.

I am using crm ra meta as a way to test, but crm will not accept my trying to add the resource as a primitive either.

In my research, I have found that often it's permissions. So just to rule that out i set my entire system to 777 permissions. no joy.

Another suggestion i find often has been to set OCF_ROOT (export OCF_ROOT=/usr/lib/ocf) and then do /usr/lib/ocf/resource.d/heartbeat/IPaddr2 meta-data.
That produces the desired output. But does not work before i export. 
And CRM still does not accept my meta request 

Another suggestion i find is to make sure that shellfuncs exists in the agents folder. the soft links exist
lrwxrwxrwx. 1 root root    32 Jul 22 04:08 .ocf-binaries -> ../../lib/heartbeat/ocf-binaries
lrwxrwxrwx. 1 root root    35 Jul 22 04:08 .ocf-directories -> ../../lib/heartbeat/ocf-directories
lrwxrwxrwx. 1 root root    35 Jul 22 04:08 .ocf-returncodes -> ../../lib/heartbeat/ocf-returncodes
lrwxrwxrwx. 1 root root    34 Jul 22 04:08 .ocf-shellfuncs -> ../../lib/heartbeat/ocf-shellfuncs

And just to make sure I did un-hidden soft links as well with no joy.

I have used assorted "how to's" to troubleshoot and make sure Im not missing something simple.
http://www.server-world.info/en/note?os=CentOS_6&p=pacemaker&f=1
http://snozberry.org/blog/2012/05/02/corosync-slash-pacemaker-on-centos-6/

one other strange (but might be normal) behavior is that I cannot manually start pacemaker via "service pacemaker start"
it fails, but I get no information in the logs. But I get the feeling this is normal behavior now?
# service pacemaker start
Starting Pacemaker Cluster Manager:                        [FAILED]

log shows 1 entry: Jul 22 22:00:50 MST pacemakerd[1511]:     info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root

I have run through it about 30 times at this point.
I have tried cent 6.2 not updated. cent 6.3 fully updated. on a physical server (just in case my VM is doing something weird) and in VMs. 

Frankly I am so baffled by this, and have been working so intensely on it, that I am hoping that I am just missing something subtle because of freaking out.
This should be very straightforward. No magic, but obviously "something" is amiss. 
But what's really weird is that I cannot find a single post online of anyone having issues with the standard RAs like this.

I can try anything suggested, except change from centos 6. This is all being done in a pair of virtuals. 

Any help or suggestions at all will be greatly appreciated.
I am a bit desperate now.
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120722/06595103/attachment-0003.html>