[Pacemaker] None of the standard agents in ocf:heartbeat are working in centos 6

Mon Jul 23 06:16:20 EDT 2012

On 07/23/2012 07:06 AM, David Barchas wrote:
> Hello.
> 
> I have been working on this for 3 days now, and must be so stressed out
> that I am being blinded to what is probably an obvious cause of this. In
> a word, HELP.
> 
> I am trying specifically to utilize ocf:heartbeat:IPaddr2, but this
> issue seems to occur with any of the ocf:heartbeat agents. I will just
> focus on IPaddr2 for purposes of figuring this out, but it happens
> exactly the same with any of the default agents. However, I can
> successfully use ocf:linbit:drbd for example. it seems to be limited to
> the RAs that are installed along with coro/pace in the resource-agents
> package.

What are the exact package versions you have installed?

pacemaker*
resource-agents
cluster-glue*

> 
> I am using CentOS 6.3, fully updated (though this happens in 6.2 with no
> updates as well). Install pacemaker/coro from default repo. I have
> stripped everything down to figure this out in vmware and just install
> centos, update it, install pace/coro (no drbd for this discussion),
> configure coro, and then start it. pacemaker starts up fine (or at least
> I think its fine). I can set quorum ignore for example from crm. (crm
> configure property no-quorum-policy="ignore")
> 
> here is the process list
> root      1447  0.3  0.6 556080  6636 ?        Ssl  21:09   0:00 corosync
> 499       1453  0.0  0.5  88720  5556 ?        S    21:09   0:00  \_
> /usr/libexec/pacemaker/cib
> root      1454  0.0  0.3  86968  3488 ?        S    21:09   0:00  \_
> /usr/libexec/pacemaker/stonithd
> root      1455  0.0  0.2  76188  2492 ?        S    21:09   0:00  \_
> /usr/lib64/heartbeat/lrmd
> 499       1456  0.0  0.3  91160  3432 ?        S    21:09   0:00  \_
> /usr/libexec/pacemaker/attrd
> 499       1457  0.0  0.3  87440  3824 ?        S    21:09   0:00  \_
> /usr/libexec/pacemaker/pengine
> 499       1458  0.0  0.3  91312  3884 ?        S    21:09   0:00  \_
> /usr/libexec/pacemaker/crmd

so you are using plugin version 0 to start Pacemaker .... That would
explain why /etc/init.d/pacemaker is unable to start ... it is already
started by Corosync.

> 
> 499 is hacluster btw.
> 
> ***BUT***
> 
> When I run as root the following:
> # crm ra meta ocf:heartbeat:IPaddr2
> 
> I get this response:
> lrmadmin[1484]: 2012/07/22_13:28:23 ERROR:
> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
> message of rmetadata with function get_ret_from_msg.
> ERROR: ocf:heartbeat:IPaddr2: could not parse meta-data: 
> 
> And this is in /var/log/messages:
> Jul 22 16:35:14 MST lrmd: [48093]: ERROR: get_resource_meta: pclose
> failed: Resource temporarily unavailable
> Jul 22 16:35:14 MST lrmd: [48093]: WARN: on_msg_get_metadata: empty
> metadata for ocf::heartbeat::IPaddr2.
> Jul 22 16:35:14 MST lrmd: [48093]: WARN: G_SIG_dispatch: Dispatch
> function for SIGCHLD was delayed 200 ms (> 100 ms) before being called
> (GSource: 0x187df10)
> Jul 22 16:35:14 MST lrmd: [48093]: info: G_SIG_dispatch: started at
> 429616889 should have started at 429616869
> Jul 22 16:35:14 MST lrmadmin: [48254]: ERROR:
> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
> message of rmetadata with function get_ret_from_msg.
> 
> I am using crm ra meta as a way to test, but crm will not accept my
> trying to add the resource as a primitive either.
> 
> In my research, I have found that often it's permissions. So just to
> rule that out i set my entire system to 777 permissions. no joy.
> 
> Another suggestion i find often has been to set OCF_ROOT (export
> OCF_ROOT=/usr/lib/ocf) and then do
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2 meta-data.
> That produces the desired output. But does not work before i export. 
> And CRM still does not accept my meta request 
> 
> Another suggestion i find is to make sure that shellfuncs exists in the
> agents folder. the soft links exist
> lrwxrwxrwx. 1 root root    32 Jul 22 04:08 .ocf-binaries ->
> ../../lib/heartbeat/ocf-binaries
> lrwxrwxrwx. 1 root root    35 Jul 22 04:08 .ocf-directories ->
> ../../lib/heartbeat/ocf-directories
> lrwxrwxrwx. 1 root root    35 Jul 22 04:08 .ocf-returncodes ->
> ../../lib/heartbeat/ocf-returncodes
> lrwxrwxrwx. 1 root root    34 Jul 22 04:08 .ocf-shellfuncs ->
> ../../lib/heartbeat/ocf-shellfuncs
> 
> And just to make sure I did un-hidden soft links as well with no joy.

Strange, that errors are typically related to wrong paths for
initialization of environment and helper functions:

# Initialization:

: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs

DRBD agent has an extra failback check, that may be the reason that it
still works ...

# Resource-agents have moved their ocf-shellfuncs file around.
# There are supposed to be symlinks or wrapper files in the old location,
# pointing to the new one, but people seem to get it wrong all the time.
# Try several locations.

if test -n "${OCF_FUNCTIONS_DIR}" ; then
	if test -e "${OCF_FUNCTIONS_DIR}/ocf-shellfuncs" ; then
		. "${OCF_FUNCTIONS_DIR}/ocf-shellfuncs"
	elif test -e "${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs" ; then
		. "${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs"
	fi
else
	if test -e "${OCF_ROOT}/lib/heartbeat/ocf-shellfuncs" ; then
		. "${OCF_ROOT}/lib/heartbeat/ocf-shellfuncs"
	elif test -e "${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs"; then
		. "${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs"
	fi
fi

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> I have used assorted "how to's" to troubleshoot and make sure Im not
> missing something simple.
> http://www.server-world.info/en/note?os=CentOS_6&p=pacemaker&f=1
> http://snozberry.org/blog/2012/05/02/corosync-slash-pacemaker-on-centos-6/
> 
> one other strange (but might be normal) behavior is that I cannot
> manually start pacemaker via "service pacemaker start"
> it fails, but I get no information in the logs. But I get the feeling
> this is normal behavior now?
> # service pacemaker start
> Starting Pacemaker Cluster Manager:                        [FAILED]
> log shows 1 entry: Jul 22 22:00:50 MST pacemakerd[1511]:     info:
> crm_log_init_worker: Changed active directory to
> /var/lib/heartbeat/cores/root
> 
> 
> I have run through it about 30 times at this point.
> I have tried cent 6.2 not updated. cent 6.3 fully updated. on a physical
> server (just in case my VM is doing something weird) and in VMs. 
> 
> Frankly I am so baffled by this, and have been working so intensely on
> it, that I am hoping that I am just missing something subtle because of
> freaking out.
> This should be very straightforward. No magic, but obviously "something"
> is amiss. 
> But what's really weird is that I cannot find a single post online of
> anyone having issues with the standard RAs like this.
> 
> I can try anything suggested, except change from centos 6. This is all
> being done in a pair of virtuals. 
> 
> Any help or suggestions at all will be greatly appreciated.
> I am a bit desperate now.
> Thanks.
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120723/2b624b0f/attachment-0003.sig>