[Pacemaker] WARN: do_lrm_control: Failed to sign on to the LRM 1 (30 max) times

chajo srichandu2007 at yahoo.co.in
Fri Apr 9 08:49:28 EDT 2010


Hi

     we have corosync(1.2.1) running on pacemkaer 1.0.6 on RHEL x86_64

     while building the code there were errors related to pointer types 
(GPOINTER_TO_INT in pacemaker/lib/common/remote.c :295) i changed references 
from /usr/lib/glib-2.0/include to /usr/lib64/glib-2.0/include to get rid of 
compilation errors


after starting corosync crmd is failing and local node is always shown as 
offline in two cluster node. and following error is logged repeatedly in 
var/log/message file


crmd: [3180]: info: do_cib_control: CIB connection established
.
.
.
.
.
crmd: [3180]: WARN:lrm_signon: can not initiate connection
crmd: [3180]: WARN: do_lrm_control: Failed to sign on to the LRM 3 (30 max) 
times
.


crmd is getting restarted after 30 tries


debugging crmd i found the connect() api is returning -1 while connecting to 
socket file /usr/var/run/heartbeat/lrm_cmd_soc

fileName::  ./lib/clplumbing/ipcsocket.c  < Reusable-Cluster-Components-
6c8645d6a4c2 Cluster Glue>
line Number: 962

connect(<fd>,
        {sun_family = 1, sun_path 
= "/usr/var/run/heartbeat/lrm_cmd_sock", '\0' <repeats 72 times>}

        )

for this the api is returning -1 



further info
# ls -l /usr/var/run/heartbeat/lrm_cmd_sock
srwxrwxrwx 1 root root 0 Apr  9 19:49 /usr/var/run/heartbeat/lrm_cmd_sock


# cat /etc/passwd | grep hacluster
hacluster:x:501:501::/home/hacluster:/bin/bash

[root at IbHost common]# cat /etc/group | grep ha
haldaemon:x:68:
hacluster:x:501:
haclient:x:502:hacluster


to find out why local node is is being shown offline using <crm status> 
command any help would be appreciated?

thanks
chajo








More information about the Pacemaker mailing list