[Pacemaker] pacemaker-1.0.6 + corosync 1.1.2 crashing
Nikola Ciprich
extmaillist at linuxbox.cz
Wed Nov 11 08:56:30 UTC 2009
Hi Steve,
I'm running CentOS5 based x86_64 system, 2.6.31.6 kernel, selinux is disabled,
corosync libraries seem to be properly installed, and I've got big enough /dev/shm
ramdisk. libc should be OK as well.
I just tried rebuilding all packages from scratch and the problem persists :(
regards
nik
On Tue, Nov 10, 2009 at 05:21:32PM -0700, Steven Dake wrote:
> One possibility is selinux is enabled and your selinux policies are out
> dated.
>
> Another possibility is you have improper coroipcc libraries (duplicates)
> installed on your system.
>
> Check your installed lib dir for coroipcc.so.4 and 4.0.0 and
> coroipcc.so. They should all link to the same file.
>
> Another possibility is your compiling on a libc which does not support
> posix semaphores.
>
> Could you explain more of your platform?
>
> regards
> -steve
>
> On Tue, 2009-11-10 at 21:48 -0200, Mark Horton wrote:
> > Nikola,
> > Sorry, I don't have a solution, but I'm curious about your setup.
> > Which version of DLM are you using? Did you have to compile it
> > yourself?
> >
> > Regards,
> > Mark
> >
> > On Tue, Nov 10, 2009 at 7:28 AM, Nikola Ciprich <extmaillist at linuxbox.cz> wrote:
> > > Hello Andrew et al,
> > > few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> > > It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> > > reasong attrd and cib seem to be crashing.
> > >
> > > here's snippet from corosync log:
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync built-in features: nss rdma
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transport (UDP/IP).
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [pcmk ] info: process_ais_conf: Reading configure
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync built-in features: nss rdma
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transport (UDP/IP).
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [pcmk ] info: process_ais_conf: Reading configure
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync built-in features: nss rdma
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transport (UDP/IP).
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [pcmk ] info: process_ais_conf: Reading configure
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync built-in features: nss rdma
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transport (UDP/IP).
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: process_ais_conf: Reading configure
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 9213452461992312833 for logging
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional logging options...
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'off' for option: debug
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'off' for option: to_file
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 2013064636357672962 for service
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional service options...
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'no' for option: use_mgmtd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: CRM: Initialized
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] Logging: Initialized pcmk_startup
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Service: 9
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Local hostname: vbox3
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_update_nodeid: Local node id: 16792074
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Creating entry for node 16792074 born on 0
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null))
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has 1 quorum votes (was 0)
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node 16792074/vbox3 is now: member
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4384 for process stonithd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4385 for process cib
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4386 for process lrmd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4387 for process attrd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4388 for process pengine
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4389 for process crmd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.0.6
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync configuration service
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync profile loading service
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: NEW: vbox3 16792074
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: MEMB: vbox3 16792074
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610)
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [MAIN ] Completed service synchronization, ready to provide service.
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c
> > > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> > > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> > > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration.
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin
> > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd
> > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
> > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd
> > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > > Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine
> > > Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine
> > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15
> > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started.
> > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd
> > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up
> > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS
> > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully.
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false)
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4391 for process cib
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false)
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4392 for process attrd
> > > Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed
> > > Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
> > > Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/
> > > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> > > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> > > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration.
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin
> > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd
> > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up
> > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS
> > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false)
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4393 for process cib
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false)
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4394 for process attrd
> > > and last few lines then keep repeating...
> > >
> > > here's gdb backtrace obtained from core files:
> > > cib:
> > > #0 0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> > > #1 0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> > > #2 0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0,
> > > our_uname=0x616f28, nodeid=0x0) at ais.c:588
> > > #3 0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0)
> > > at cluster.c:56
> > > #4 0x000000000040d753 in cib_init () at main.c:424
> > > #5 0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218
> > >
> > >
> > > attrd:
> > > #0 0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> > > #1 0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> > > #2 0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>,
> > > our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588
> > > #3 0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0)
> > > at cluster.c:56
> > > #4 0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569
> > >
> > > Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I
> > > deleted old (testing) packages. But the versions are the same.
> > > Any idea where I should look for possible culprit?
> > > thanks a lot for reply!
> > > with best regards
> > > nik
> > >
> > >
> > > --
> > > -------------------------------------
> > > Nikola CIPRICH
> > > LinuxBox.cz, s.r.o.
> > > 28. rijna 168, 709 01 Ostrava
> > >
> > > tel.: +420 596 603 142
> > > fax: +420 596 621 273
> > > mobil: +420 777 093 799
> > > www.linuxbox.cz
> > >
> > > mobil servis: +420 737 238 656
> > > email servis: servis at linuxbox.cz
> > > -------------------------------------
> > >
> > > _______________________________________________
> > > Pacemaker mailing list
> > > Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> >
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
--
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava
tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
email servis: servis at linuxbox.cz
-------------------------------------
More information about the Pacemaker
mailing list