[Pacemaker] corosync pacemaker exit after some time

Thu Jan 31 22:34:16 EST 2013

On Wed, Jan 23, 2013 at 12:53 AM, E-Blokos <infos at e-blokos.com> wrote:
>
> HI,
>
> on Fedora 17 corosync pacemaker version 1.1.7 (fedora update)
> all nodes quit corosync pacemaker after a while
>
> [root at node140 ~]# systemctl status corosync
> corosync.service - Corosync Cluster Engine
>           Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled)
>           Active: failed (Result: exit-code) since Tue, 22 Jan 2013 08:42:47
> -0500; 5min ago
>          Process: 13152 ExecStop=/usr/share/corosync/corosync stop
> (code=exited, status=0/SUCCESS)
>          Process: 26754 ExecStart=/usr/share/corosync/corosync start
> (code=exited, status=1/FAILURE)
>         Main PID: 1442 (code=dumped, signal=BUS)

Looks like corosync crashed and took pacemaker with it.
Hard to say without the backtrace :(

>           CGroup: name=systemd:/system/corosync.service
>
> Jan 22 08:42:47 node140 corosync[26754]: [62B blob data]
> Jan 22 08:42:47 node140 corosync[26761]: [SERV  ] Unloading all Corosync
> service engines.
> Jan 22 08:42:47 node140 corosync[26761]: [QB    ] withdrawing server sockets
> Jan 22 08:42:47 node140 corosync[26761]: [SERV  ] Service engine unloaded:
> corosync vote quorum service v1.0
> Jan 22 08:42:47 node140 corosync[26761]: [QB    ] withdrawing server sockets
> Jan 22 08:42:47 node140 corosync[26761]: [SERV  ] Service engine unloaded:
> corosync configuration map access
> Jan 22 08:42:47 node140 corosync[26761]: [QB    ] withdrawing server sockets
> Jan 22 08:42:47 node140 corosync[26761]: [SERV  ] Service engine unloaded:
> corosync configuration service
> Jan 22 08:42:47 node140 corosync[26761]: [QB    ] withdrawing server sockets
> Jan 22 08:42:47 node140 corosync[26761]: [SERV  ] Service engine unloaded:
> corosync cluster closed process group service v1.01
>
> in log:
>
> Jan 22 08:42:23 node140 pacemakerd[26540]:     info: crm_log_init_worker:
> Changed active directory to /var/lib/heartbeat/cores/root
> Jan 22 08:42:23 node140 pacemakerd[26540]: Could not initialize Cluster
> Configuration Database API instance, error 2
> Jan 22 08:42:23 node140 systemd[1]: pacemaker.service: main process exited,
> code=exited, status=1
> Jan 22 08:42:23 node140 systemd[1]: Unit pacemaker.service entered failed
> state.
> Jan 22 08:42:23 node140 systemd[1]: pacemaker.service holdoff time over,
> scheduling restart.
>
> permission problems ? if yes is cores/root must be other than hacluster.root
> ?
>
> Thanks
>
> Franck
>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>