[Pacemaker] pacemaker-1.0.6 + corosync 1.1.2 crashing
Steven Dake
sdake at redhat.com
Wed Nov 11 00:27:18 UTC 2009
Nikola,
yet another possibility is your box doesn't have any/enough shared
memory available. Usually this is in the directory /dev/shm.
Unfortunately bad things happen and error handling around this condition
needs some work. Its hard to tell because the signal delivered to the
application on failure is not shown in your backtrace.
For example I have plenty of shared memory available (command is from
df).
tmpfs 1027020 3560 1023460 1% /dev/shm
Regards
-steve
On Tue, 2009-11-10 at 10:28 +0100, Nikola Ciprich wrote:
> Hello Andrew et al,
> few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> reasong attrd and cib seem to be crashing.
>
> here's snippet from corosync log:
> Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync built-in features: nss rdma
> Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:12:21 vbox3 corosync[4299]: [pcmk ] info: process_ais_conf: Reading configure
> Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync built-in features: nss rdma
> Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:16 vbox3 corosync[4348]: [pcmk ] info: process_ais_conf: Reading configure
> Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync built-in features: nss rdma
> Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:24 vbox3 corosync[4357]: [pcmk ] info: process_ais_conf: Reading configure
> Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync built-in features: nss rdma
> Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: process_ais_conf: Reading configure
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 9213452461992312833 for logging
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional logging options...
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'off' for option: debug
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'off' for option: to_file
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 2013064636357672962 for service
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional service options...
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'no' for option: use_mgmtd
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: CRM: Initialized
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] Logging: Initialized pcmk_startup
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Service: 9
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Local hostname: vbox3
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_update_nodeid: Local node id: 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Creating entry for node 16792074 born on 0
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null))
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has 1 quorum votes (was 0)
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node 16792074/vbox3 is now: member
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4384 for process stonithd
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4385 for process cib
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4386 for process lrmd
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4387 for process attrd
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4388 for process pengine
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4389 for process crmd
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.0.6
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync configuration service
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync profile loading service
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: NEW: vbox3 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: MEMB: vbox3 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610)
> Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Nov 10 14:13:58 vbox3 corosync[4380]: [MAIN ] Completed service synchronization, ready to provide service.
> Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib
> Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully
> Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine
> Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started.
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3
> Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully.
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4391 for process cib
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4392 for process attrd
> Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed
> Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
> Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop
> Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib
> Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully
> Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false)
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4393 for process cib
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false)
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4394 for process attrd
> and last few lines then keep repeating...
>
> here's gdb backtrace obtained from core files:
> cib:
> #0 0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #1 0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> #2 0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0,
> our_uname=0x616f28, nodeid=0x0) at ais.c:588
> #3 0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0)
> at cluster.c:56
> #4 0x000000000040d753 in cib_init () at main.c:424
> #5 0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218
>
>
> attrd:
> #0 0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #1 0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> #2 0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>,
> our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588
> #3 0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0)
> at cluster.c:56
> #4 0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569
>
> Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I
> deleted old (testing) packages. But the versions are the same.
> Any idea where I should look for possible culprit?
> thanks a lot for reply!
> with best regards
> nik
>
>
> --
> -------------------------------------
> Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
>
> tel.: +420 596 603 142
> fax: +420 596 621 273
> mobil: +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis at linuxbox.cz
> -------------------------------------
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
More information about the Pacemaker
mailing list