[Pacemaker] WARN: do_lrm_control: Failed to sign on to the LRM 1 (30 max) times

chajo srichandu2007 at yahoo.co.in
Mon Apr 12 09:49:13 UTC 2010


Andrew Beekhof <andrew at ...> writes:

> 
> On Fri, Apr 9, 2010 at 2:49 PM, chajo <srichandu2007 at ...> wrote:
> > Hi
> >
> >     we have corosync(1.2.1) running on pacemkaer 1.0.6 on RHEL x86_64
> 
> I'm pretty sure that version combination wont work.
> IIRC Corosync support was added in 1.0.7
> 
> >
> >     while building the code there were errors related to pointer types
> > (GPOINTER_TO_INT in pacemaker/lib/common/remote.c :295) i changed 
references
> > from /usr/lib/glib-2.0/include to /usr/lib64/glib-2.0/include to get rid of
> > compilation errors
> >
> >
> > after starting corosync crmd is failing and local node is always shown as
> > offline in two cluster node. and following error is logged repeatedly in
> > var/log/message file
> >
> >
> > crmd: [3180]: info: do_cib_control: CIB connection established
> > .
> > .
> > .
> > .
> > .
> > crmd: [3180]: WARN:lrm_signon: can not initiate connection
> > crmd: [3180]: WARN: do_lrm_control: Failed to sign on to the LRM 3 (30 max)
> > times
> > .
> >
> >
> > crmd is getting restarted after 30 tries
> >
> >
> > debugging crmd i found the connect() api is returning -1 while connecting 
to
> > socket file /usr/var/run/heartbeat/lrm_cmd_soc
> >
> > fileName::  ./lib/clplumbing/ipcsocket.c  < Reusable-Cluster-Components-
> > 6c8645d6a4c2 Cluster Glue>
> > line Number: 962
> >
> > connect(<fd>,
> >        {sun_family = 1, sun_path
> > = "/usr/var/run/heartbeat/lrm_cmd_sock", '\0' <repeats 72 times>}
> >
> >        )
> >
> > for this the api is returning -1
> >
> >
> >
> > further info
> > # ls -l /usr/var/run/heartbeat/lrm_cmd_sock
> > srwxrwxrwx 1 root root 0 Apr  9 19:49 /usr/var/run/heartbeat/lrm_cmd_sock
> >
> >
> > # cat /etc/passwd | grep hacluster
> > hacluster:x:501:501::/home/hacluster:/bin/bash
> >
> > [root <at> IbHost common]# cat /etc/group | grep ha
> > haldaemon:x:68:
> > hacluster:x:501:
> > haclient:x:502:hacluster
> >
> >
> > to find out why local node is is being shown offline using <crm status>
> > command any help would be appreciated?
> >
> > thanks
> > chajo
> >
> >
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at ...
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> 


Hi Andrew
   thanks for the reply

    i tried with new version
    Corosync version 1.2.1 and
    Pacemaker version 1.0.8

    still facing same issue..?
is there any specific set of instructions to follow to compile 
corosync+pacemaker on 64 bit machines..please let me know ..i am ready to 
start from the scrach..but this looks an issue which could be solved.

posting the log with new versions

Apr 12 19:25:32 lHost corosync[24808]:   [MAIN  ] Corosync Cluster Engine 
('1.2.1'): started and ready to provide service.
Apr 12 19:25:32 lHost corosync[24808]:   [MAIN  ] Corosync built-in features:
Apr 12 19:25:32 lHost corosync[24808]:   [MAIN  ] Successfully read main 
configuration file '/usr/etc/corosync/corosync.conf'.
Apr 12 19:25:32 lHost corosync[24808]:   [TOTEM ] Initializing transport 
(UDP/IP).
Apr 12 19:25:32 lHost corosync[24808]:   [TOTEM ] Initializing 
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr 12 19:25:32 lHost corosync[24808]:   [TOTEM ] The network interface 
[172.28.14.19] is now up.
Apr 12 19:25:32 lHost corosync[24808]:   [pcmk  ] info: process_ais_conf: 
Reading configure
Apr 12 19:25:32 lHost corosync[24808]:   [pcmk  ] info: config_find_init: 
Local handle: 4730966301143465986 for logging
Apr 12 19:25:32 lHost corosync[24808]:   [pcmk  ] info: config_find_next: 
Processing additional logging options...
Apr 12 19:25:32 lHost corosync[24808]:   [pcmk  ] info: get_config_opt: 
Found 'on' for option: debug
Apr 12 19:25:32 lHost corosync[24808]:   [pcmk  ] info: get_config_opt: 
Found 'yes' for option: to_logfile
Apr 12 19:25:32 lHost corosync[24808]:   [pcmk  ] info: get_config_opt: 
Found '/tmp/corosync.log' for option: logfile
Apr 12 19:25:32 lHost cib: [24816]: info: Invoked: /usr/lib64/heartbeat/cib
Apr 12 19:25:32 lHost pengine: [24819]: info: 
Invoked: /usr/lib64/heartbeat/pengine
Apr 12 19:25:32 lHost crmd: [24820]: info: Invoked: /usr/lib64/heartbeat/crmd
Apr 12 19:25:32 lHost attrd: [24818]: info: Invoked: /usr/lib64/heartbeat/attrd
Apr 12 19:25:32 lHost stonithd: [24815]: info: G_main_add_SignalHandler: Added 
signal handler for signal 10
Apr 12 19:25:32 lHost stonithd: [24815]: info: G_main_add_SignalHandler: Added 
signal handler for signal 12
Apr 12 19:25:32 lHost corosync[24808]:   [pcmk  ] info: get_config_opt: 
Found 'yes' for option: to_syslog
Apr 12 19:25:32 lHost cib: [24816]: info: G_main_add_TriggerHandler: Added 
signal manual handler
Apr 12 19:25:32 lHost crmd: [24820]: info: main: CRM Hg Version: 
13c87913dfe66cf7963552db095a918f4ebfbb2b
Apr 12 19:25:32 lHost attrd: [24818]: info: main: Starting up
Apr 12 19:25:32 lHost lrmd: [24817]: info: G_main_add_SignalHandler: Added 
signal handler for signal 15
Apr 12 19:25:32 lHost corosync[24808]:   [pcmk  ] info: get_config_opt: 
Defaulting to 'daemon' for option: syslog_facility
Apr 12 19:25:32 lHost cib: [24816]: info: G_main_add_SignalHandler: Added 
signal handler for signal 17
Apr 12 19:25:33 lHost crmd: [24820]: info: crmd_init: Starting crmd
Apr 12 19:25:33 lHost attrd: [24818]: info: crm_cluster_connect: Connecting to 
OpenAIS
Apr 12 19:25:33 lHost lrmd: [24817]: info: G_main_add_SignalHandler: Added 
signal handler for signal 17
Apr 12 19:25:33 lHost stonithd: [24815]: info: crm_cluster_connect: Connecting 
to OpenAIS
Apr 12 19:25:33 lHost corosync[24808]:   [pcmk  ] info: config_find_init: 
Local handle: 7739444317642555395 for service
Apr 12 19:25:33 lHost cib: [24816]: info: retrieveCib: Reading cluster 
configuration from: /usr/var/lib/heartbeat/crm/cib.xml (digest: 

/usr/var/lib/heartbeat/crm/cib.xml.sig)
Apr 12 19:25:33 lHost attrd: [24818]: info: init_ais_connection: Creating 
connection to our AIS plugin
Apr 12 19:25:33 lHost lrmd: [24817]: info: enabling coredumps
Apr 12 19:25:33 lHost corosync[24808]:   [pcmk  ] info: config_find_next: 
Processing additional service options...
Apr 12 19:25:33 lHost stonithd: [24815]: info: init_ais_connection: Creating 
connection to our AIS plugin
Apr 12 19:25:33 lHost attrd: [24818]: info: init_ais_connection: AIS 
connection established
Apr 12 19:25:33 lHost lrmd: [24817]: info: G_main_add_SignalHandler: Added 
signal handler for signal 10
Apr 12 19:25:33 lHost corosync[24808]:   [pcmk  ] info: get_config_opt: 
Defaulting to 'pcmk' for option: clustername
Apr 12 19:25:33 lHost stonithd: [24815]: info: init_ais_connection: AIS 
connection established
Apr 12 19:25:33 lHost pengine: [24819]: info: main: Starting pengine
Apr 12 19:25:33 lHost attrd: [24818]: info: get_ais_nodeid: Server details: 
id=2 uname=lHost cname=pcmk
Apr 12 19:25:33 lHost lrmd: [24817]: info: G_main_add_SignalHandler: Added 
signal handler for signal 12
Apr 12 19:25:33 lHost corosync[24808]:   [pcmk  ] info: get_config_opt: 
Defaulting to 'no' for option: use_logd
Apr 12 19:25:33 lHost lrmd: [24817]: info: Started.
Apr 12 19:25:33 lHost corosync[24808]:   [pcmk  ] info: get_config_opt: 
Defaulting to 'no' for option: use_mgmtd
Apr 12 19:25:33 lHost attrd: [24818]: info: crm_new_peer: Node lHost now has 
id: 2
Apr 12 19:25:33 lHost corosync[24808]:   [pcmk  ] info: pcmk_startup: CRM: 
Initialized
Apr 12 19:25:33 lHost stonithd: [24815]: info: get_ais_nodeid: Server details: 
id=2 uname=lHost cname=pcmk
Apr 12 19:25:33 lHost crmd: [24820]: info: G_main_add_SignalHandler: Added 
signal handler for signal 17
Apr 12 19:25:33 lHost attrd: [24818]: info: crm_new_peer: Node 2 is now known 
as lHost
Apr 12 19:25:33 lHost corosync[24808]:   [pcmk  ] Logging: Initialized 
pcmk_startup
Apr 12 19:25:33 lHost stonithd: [24815]: info: crm_new_peer: Node lHost  now 
has id: 2
Apr 12 19:25:33 lHost stonithd: [24815]: info: crm_new_peer: Node 2 is now 
known as lHost
Apr 12 19:25:33 lHost corosync[24808]:   [pcmk  ] info: pcmk_startup: Maximum 
core file size is: 18446744073709551615
Apr 12 19:25:33 lHost attrd: [24818]: info: main: Cluster connection active
Apr 12 19:25:33 lHost corosync[24808]:   [pcmk  ] info: pcmk_startup: Service: 
9
Apr 12 19:25:34 lHost attrd: [24818]: info: main: Accepting attribute updates
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: pcmk_startup: Local 
hostname: lHost
Apr 12 19:25:34 lHost attrd: [24818]: info: main: Starting mainloop...
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: pcmk_update_nodeid: 
Local node id: 2
Apr 12 19:25:34 lHost stonithd: [24815]: notice: /usr/lib64/heartbeat/stonithd 
start up successfully.
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: update_member: 
Creating entry for node 2 born on 0
Apr 12 19:25:34 lHost stonithd: [24815]: info: G_main_add_SignalHandler: Added 
signal handler for signal 17
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: update_member: 
0x666370 Node 2 now known as lHost  (was: (null))
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: update_member: Node 
lHost  now has 1 quorum votes (was 0)
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: update_member: Node 
2/lHost  is now: member
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: spawn_child: Forked 
child 24815 for process stonithd
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: spawn_child: Forked 
child 24816 for process cib
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: spawn_child: Forked 
child 24817 for process lrmd
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: spawn_child: Forked 
child 24818 for process attrd
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: spawn_child: Forked 
child 24819 for process pengine
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: spawn_child: Forked 
child 24820 for process crmd
Apr 12 19:25:34 lHost corosync[24808]:   [SERV  ] Service engine loaded: 
Pacemaker Cluster Manager 1.0.8
Apr 12 19:25:34 lHost corosync[24808]:   [SERV  ] Service engine loaded: 
corosync extended virtual synchrony service
Apr 12 19:25:34 lHost corosync[24808]:   [SERV  ] Service engine loaded: 
corosync configuration service
Apr 12 19:25:34 lHost corosync[24808]:   [SERV  ] Service engine loaded: 
corosync cluster closed process group service v1.01
Apr 12 19:25:34 lHost corosync[24808]:   [SERV  ] Service engine loaded: 
corosync cluster config database access v1.01
Apr 12 19:25:34 lHost corosync[24808]:   [SERV  ] Service engine loaded: 
corosync profile loading service
Apr 12 19:25:34 lHost corosync[24808]:   [SERV  ] Service engine loaded: 
corosync cluster quorum service v0.1
Apr 12 19:25:34 lHost corosync[24808]:   [MAIN  ] Compatibility mode set to 
none.  Using V2 of the synchronization engine.
Apr 12 19:25:34 lHost cib: [24816]: info: startCib: CIB Initialization 
completed successfully
Apr 12 19:25:34 lHost cib: [24816]: info: crm_cluster_connect: Connecting to 
OpenAIS
Apr 12 19:25:34 lHost cib: [24816]: info: init_ais_connection: Creating 
connection to our AIS plugin
Apr 12 19:25:34 lHost cib: [24816]: info: init_ais_connection: AIS connection 
established
Apr 12 19:25:34 lHost cib: [24816]: info: get_ais_nodeid: Server details: id=2 
uname=lHost  cname=pcmk
Apr 12 19:25:34 lHost cib: [24816]: info: crm_new_peer: Node lHost  now has 
id: 2
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 2712: memb=0, new=0, lost=0
Apr 12 19:25:34 lHost cib: [24816]: info: crm_new_peer: Node 2 is now known as 
lHost 
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] notice: pcmk_peer_update: 
Stable membership event on ring 2712: memb=1, new=1, lost=0
Apr 12 19:25:34 lHost corosync[24808]:   [pcmk  ] info: pcmk_peer_update: 
NEW:  lHost  2
Apr 12 19:25:34 lHost cib: [24816]: info: cib_init: Starting cib mainloop
Apr 12 19:25:35 lHost corosync[24808]:   [pcmk  ] info: pcmk_peer_update: 
MEMB: lHost  2
Apr 12 19:25:35 lHost corosync[24808]:   [pcmk  ] info: update_member: Node 
lHost  now has process list: 00000000000000000000000000013312 (78610)
Apr 12 19:25:35 lHost corosync[24808]:   [TOTEM ] A processor joined or left 
the membership and a new membership was formed.
Apr 12 19:25:35 lHost cib: [24816]: info: ais_dispatch: Membership 2712: 
quorum still lost
Apr 12 19:25:35 lHost cib: [24816]: info: crm_update_peer: Node lHost : id=2 
state=member (new) addr=r(0) ip(172.28.14.19)  (new) votes=1 (new) born=0 

seen=2712 proc=00000000000000000000000000013312 (new)
Apr 12 19:25:35 lHost cib: [24828]: info: write_cib_contents: Archived 
previous version as /usr/var/lib/heartbeat/crm/cib-56.raw
Apr 12 19:25:35 lHost cib: [24828]: info: write_cib_contents: Wrote version 
0.154.0 of the CIB to disk (digest: b73c9c6de21721151d6170a67d2b7b2a)
Apr 12 19:25:35 lHost cib: [24828]: info: retrieveCib: Reading cluster 
configuration from: /usr/var/lib/heartbeat/crm/cib.kRAzsp (digest: 

/usr/var/lib/heartbeat/crm/cib.EuYX6R)
Apr 12 19:25:35 lHost cib: [24816]: info: Managed write_cib_contents process 
24828 exited with return code 0.
Apr 12 19:25:35 lHost crmd: [24820]: info: do_cib_control: CIB connection 
established
Apr 12 19:25:35 lHost crmd: [24820]: info: crm_cluster_connect: Connecting to 
OpenAIS
Apr 12 19:25:35 lHost crmd: [24820]: info: init_ais_connection: Creating 
connection to our AIS plugin
Apr 12 19:25:35 lHost crmd: [24820]: info: init_ais_connection: AIS connection 
established
Apr 12 19:25:35 lHost crmd: [24820]: info: get_ais_nodeid: Server details: 
id=2 uname=lHost  cname=pcmk
Apr 12 19:25:35 lHost crmd: [24820]: info: crm_new_peer: Node lHost  now has 
id: 2
Apr 12 19:25:35 lHost corosync[24808]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Apr 12 19:25:35 lHost crmd: [24820]: info: crm_new_peer: Node 2 is now known 
as lHost 
Apr 12 19:25:35 lHost crmd: [24820]: info: do_ha_control: Connected to the 
cluster
Apr 12 19:25:35 lHost corosync[24808]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x6740d0 for attrd/24818
Apr 12 19:25:35 lHost corosync[24808]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x674b60 for stonithd/24815
Apr 12 19:25:35 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:35 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 1 (30 max) times
Apr 12 19:25:35 lHost corosync[24808]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x6769f0 for cib/24816
Apr 12 19:25:35 lHost corosync[24808]:   [pcmk  ] info: pcmk_ipc: Sending 
membership update 2712 to cib
Apr 12 19:25:35 lHost corosync[24808]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x677780 for crmd/24820
Apr 12 19:25:35 lHost corosync[24808]:   [pcmk  ] info: pcmk_ipc: Sending 
membership update 2712 to crmd
Apr 12 19:25:35 lHost crmd: [24820]: info: crmd_init: Starting crmd's mainloop
Apr 12 19:25:35 lHost crmd: [24820]: info: config_query_callback: Checking for 
expired actions every 900000ms
Apr 12 19:25:35 lHost crmd: [24820]: info: config_query_callback: Sending 
expected-votes=3 to corosync
Apr 12 19:25:35 lHost crmd: [24820]: info: ais_dispatch: Membership 2712: 
quorum still lost
Apr 12 19:25:35 lHost corosync[24808]:   [pcmk  ] info: update_expected_votes: 
Expected quorum votes 2 -> 3
Apr 12 19:25:35 lHost crmd: [24820]: info: crm_update_peer: Node lHost : id=2 
state=member (new) addr=r(0) ip(172.28.14.19)  (new) votes=1 (new) born=0 

seen=2712 proc=00000000000000000000000000013312 (new)
Apr 12 19:25:35 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:35 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 2 (30 max) times
Apr 12 19:25:36 lHost crmd: [24820]: info: ais_dispatch: Membership 2712: 
quorum still lost
Apr 12 19:25:36 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:36 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 3 (30 max) times
Apr 12 19:25:37 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:37 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:37 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 4 (30 max) times
Apr 12 19:25:38 lHost OpenSM[4938]: SM port is down
Apr 12 19:25:39 lHost attrd: [24818]: info: cib_connect: Connected to the CIB 
after 1 signon attempts
Apr 12 19:25:39 lHost attrd: [24818]: info: cib_connect: Sending full refresh
Apr 12 19:25:39 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:39 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:39 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 5 (30 max) times
Apr 12 19:25:41 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:41 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:41 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 6 (30 max) times
Apr 12 19:25:43 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:43 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:43 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 7 (30 max) times
Apr 12 19:25:45 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:45 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:45 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 8 (30 max) times
Apr 12 19:25:47 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:47 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:47 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 9 (30 max) times
Apr 12 19:25:48 lHost OpenSM[4938]: SM port is down
Apr 12 19:25:49 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:49 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:49 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 10 (30 max) times
Apr 12 19:25:51 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:51 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:51 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 11 (30 max) times
Apr 12 19:25:53 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:53 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:53 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 12 (30 max) times
Apr 12 19:25:55 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:55 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:55 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:55 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 13 (30 max) times
Apr 12 19:25:57 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:57 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:57 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 14 (30 max) times
Apr 12 19:25:58 lHost OpenSM[4938]: SM port is down
Apr 12 19:25:59 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:25:59 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:25:59 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 15 (30 max) times
Apr 12 19:26:01 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:01 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:01 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 16 (30 max) times
Apr 12 19:26:03 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:03 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:03 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 17 (30 max) times
Apr 12 19:26:05 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:05 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:05 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 18 (30 max) times
Apr 12 19:26:07 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:07 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:07 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 19 (30 max) times
Apr 12 19:26:08 lHost OpenSM[4938]: SM port is down
Apr 12 19:26:09 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:09 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:09 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 20 (30 max) times
Apr 12 19:26:11 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:11 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:11 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 21 (30 max) times
Apr 12 19:26:13 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:13 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:13 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 22 (30 max) times
Apr 12 19:26:15 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:15 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:15 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 23 (30 max) times
Apr 12 19:26:17 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:17 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:17 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 24 (30 max) times
Apr 12 19:26:18 lHost OpenSM[4938]: SM port is down
Apr 12 19:26:19 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:19 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:19 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 25 (30 max) times
Apr 12 19:26:21 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:21 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:21 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 26 (30 max) times
Apr 12 19:26:23 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:23 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:23 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 27 (30 max) times
Apr 12 19:26:25 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:25 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:25 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 28 (30 max) times
Apr 12 19:26:27 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:27 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:27 lHost crmd: [24820]: WARN: do_lrm_control: Failed to sign on 
to the LRM 29 (30 max) times
Apr 12 19:26:28 lHost OpenSM[4938]: SM port is down
Apr 12 19:26:29 lHost crmd: [24820]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Apr 12 19:26:29 lHost crmd: [24820]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:29 lHost crmd: [24820]: ERROR: do_lrm_control: Failed to sign on 
to the LRM 30 (max) times
Apr 12 19:26:29 lHost crmd: [24820]: ERROR: do_log: FSA: Input I_ERROR from 
do_lrm_control() received in state S_STARTING
Apr 12 19:26:29 lHost crmd: [24820]: info: do_state_transition: State 
transition S_STARTING -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL 

origin=do_lrm_control ]
Apr 12 19:26:29 lHost crmd: [24820]: ERROR: do_recover: Action A_RECOVER 
(0000000001000000) not supported
Apr 12 19:26:29 lHost crmd: [24820]: ERROR: do_started: Start cancelled... 
S_RECOVERY
Apr 12 19:26:29 lHost crmd: [24820]: ERROR: do_log: FSA: Input I_TERMINATE 
from do_recover() received in state S_RECOVERY
Apr 12 19:26:29 lHost crmd: [24820]: info: do_state_transition: State 
transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL 

origin=do_recover ]
Apr 12 19:26:29 lHost crmd: [24820]: info: do_ha_control: Disconnected from 
OpenAIS
Apr 12 19:26:29 lHost crmd: [24820]: info: do_cib_control: Disconnecting CIB
Apr 12 19:26:29 lHost crmd: [24820]: info: crmd_cib_connection_destroy: 
Connection to the CIB terminated...
Apr 12 19:26:29 lHost crmd: [24820]: info: do_exit: Performing A_EXIT_0 - 
gracefully exiting the CRMd
Apr 12 19:26:29 lHost crmd: [24820]: ERROR: do_exit: Could not recover from 
internal error
Apr 12 19:26:29 lHost crmd: [24820]: info: free_mem: Dropping I_TERMINATE: [ 
state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
Apr 12 19:26:29 lHost crmd: [24820]: info: do_exit: [crmd] stopped (2)
Apr 12 19:26:30 lHost corosync[24808]:   [pcmk  ] info: pcmk_ipc_exit: Client 
crmd (conn=0x677780, async-conn=0x677780) left
Apr 12 19:26:30 lHost corosync[24808]:   [pcmk  ] ERROR: pcmk_wait_dispatch: 
Child process crmd exited (pid=24820, rc=2)
Apr 12 19:26:30 lHost corosync[24808]:   [pcmk  ] notice: pcmk_wait_dispatch: 
Respawning failed child process: crmd
Apr 12 19:26:30 lHost corosync[24808]:   [pcmk  ] info: spawn_child: Forked 
child 24918 for process crmd
Apr 12 19:26:30 lHost crmd: [24918]: info: Invoked: /usr/lib64/heartbeat/crmd
Apr 12 19:26:30 lHost crmd: [24918]: info: main: CRM Hg Version: 
13c87913dfe66cf7963552db095a918f4ebfbb2b
Apr 12 19:26:30 lHost crmd: [24918]: info: crmd_init: Starting crmd
Apr 12 19:26:30 lHost crmd: [24918]: info: G_main_add_SignalHandler: Added 
signal handler for signal 17
Apr 12 19:26:31 lHost crmd: [24918]: info: do_cib_control: CIB connection 
established
Apr 12 19:26:31 lHost crmd: [24918]: info: crm_cluster_connect: Connecting to 
OpenAIS
Apr 12 19:26:31 lHost crmd: [24918]: info: init_ais_connection: Creating 
connection to our AIS plugin
Apr 12 19:26:31 lHost crmd: [24918]: info: init_ais_connection: AIS connection 
established
Apr 12 19:26:31 lHost corosync[24808]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x677780 for crmd/24918
Apr 12 19:26:31 lHost crmd: [24918]: info: get_ais_nodeid: Server details: 
id=2 uname=lHost  cname=pcmk
Apr 12 19:26:31 lHost corosync[24808]:   [pcmk  ] info: pcmk_ipc: Sending 
membership update 2712 to crmd
Apr 12 19:26:31 lHost crmd: [24918]: info: crm_new_peer: Node lHost  now has 
id: 2
Apr 12 19:26:31 lHost crmd: [24918]: info: crm_new_peer: Node 2 is now known 
as lHost 
Apr 12 19:26:31 lHost crmd: [24918]: info: do_ha_control: Connected to the 
cluster
Apr 12 19:26:31 lHost crmd: [24918]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:31 lHost crmd: [24918]: WARN: do_lrm_control: Failed to sign on 
to the LRM 1 (30 max) times
Apr 12 19:26:31 lHost crmd: [24918]: info: crmd_init: Starting crmd's mainloop
Apr 12 19:26:31 lHost crmd: [24918]: info: ais_dispatch: Membership 2712: 
quorum still lost
Apr 12 19:26:31 lHost crmd: [24918]: info: crm_update_peer: Node lHost : id=2 
state=member (new) addr=r(0) ip(172.28.14.19)  (new) votes=1 (new) born=0 

seen=2712 proc=00000000000000000000000000013312 (new)
Apr 12 19:26:31 lHost crmd: [24918]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:31 lHost crmd: [24918]: WARN: do_lrm_control: Failed to sign on 
to the LRM 2 (30 max) times
Apr 12 19:26:31 lHost crmd: [24918]: info: config_query_callback: Checking for 
expired actions every 900000ms
Apr 12 19:26:31 lHost crmd: [24918]: info: config_query_callback: Sending 
expected-votes=3 to corosync
Apr 12 19:26:31 lHost crmd: [24918]: WARN: lrm_signon: can not initiate 
connection
Apr 12 19:26:31 lHost crmd: [24918]: WARN: do_lrm_control: Failed to sign on 
to the LRM 3 (30 max) times



thanks
chajo





More information about the Pacemaker mailing list