[Pacemaker] corosync/openais fails to start

Thu May 27 17:20:38 UTC 2010

On Thu, May 27, 2010 at 5:50 PM, Steven Dake <sdake at redhat.com> wrote:

> On 05/27/2010 08:40 AM, Diego Remolina wrote:
>
>> Is there any workaround for this? Perhaps a slightly older version of
>> the rpms? If so where do I find those?
>>
>>
> Corosync 1.2.1 doesn't have this issue apparently.  With corosync 1.2.1,
> please don't use "debug: on" keyword in your config options.  I am not sure
> where Andrew has corosync 1.2.1 rpms available.
>
> The corosync project itself doesn't release rpms.  See our policy on this
> topic:
>
> http://www.corosync.org/doku.php?id=faq:release_binaries
>
> Regards
> -steve
>
>
>
In my case, using pacemaker/corosync from clusterlabs repo on rh el 5.5 32
bit I had:
- both nodes ha1 and ha2 with
[root at ha1 ~]# rpm -qa corosync\* pacemaker\*
pacemaker-1.0.8-6.el5
corosynclib-1.2.1-1.el5
corosync-1.2.1-1.el5
pacemaker-libs-1.0.8-6.el5

- stop of corosync on node ha1
- update (using clusterlabs repo proposed and applied packages for pacemaker
with same version... donna if same bits..)
This takes corosync to 1.2.2
- start of corosync on ha1 and successfull join with the still corosync
1.2.1 one
 May 27 18:59:23 ha1 corosync[5136]:   [MAIN  ] Corosync Cluster Engine
exiting with status -1 at main.c:160.
May 27 19:06:19 ha1 yum: Updated: corosynclib-1.2.2-1.1.el5.i386
May 27 19:06:19 ha1 yum: Updated: pacemaker-libs-1.0.8-6.1.el5.i386
May 27 19:06:19 ha1 yum: Updated: corosync-1.2.2-1.1.el5.i386
May 27 19:06:20 ha1 yum: Updated: pacemaker-1.0.8-6.1.el5.i386
May 27 19:06:20 ha1 yum: Updated: corosynclib-devel-1.2.2-1.1.el5.i386
May 27 19:06:22 ha1 yum: Updated: pacemaker-libs-devel-1.0.8-6.1.el5.i386
May 27 19:06:59 ha1 corosync[7442]:   [MAIN  ] Corosync Cluster Engine
('1.2.2'): started and ready to provide service.
May 27 19:06:59 ha1 corosync[7442]:   [MAIN  ] Corosync built-in features:
nss rdma
May 27 19:06:59 ha1 corosync[7442]:   [MAIN  ] Successfully read main
configuration file '/etc/corosync/corosync.conf'.
May 27 19:06:59 ha1 corosync[7442]:   [TOTEM ] Initializing transport
(UDP/IP).
May 27 19:06:59 ha1 corosync[7442]:   [TOTEM ] Initializing transmit/receive
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

this implies also start of resources on it (nfsclient and apache in my case)

- move (and unmove to be able to take them again) of resources from ha2 to
the updated node ha1 (nfs-group in my case)
 Resource Group: nfs-group
     lv_drbd0   (ocf::heartbeat:LVM):   Started ha1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started ha1
     NfsFS      (ocf::heartbeat:Filesystem):    Started ha1
     nfssrv     (ocf::heartbeat:nfsserver):     Started ha1

- stop of corosync 1.2.1 on ha2
- update of pacemaker and corosync on ha2
- startup of corosync on ha2 and correct join to cluster with start of its
resources (nfsclient and apache in my case)
May 27 19:14:42 ha2 corosync[30954]:   [pcmk  ] notice: pcmk_shutdown: cib
confirmed stopped
May 27 19:14:42 ha2 corosync[30954]:   [pcmk  ] notice: stop_child: Sent -15
to stonithd: [30961]
May 27 19:14:42 ha2 stonithd: [30961]: notice: /usr/lib/heartbeat/stonithd
normally quit.
May 27 19:14:42 ha2 corosync[30954]:   [pcmk  ] info: pcmk_ipc_exit: Client
stonithd (conn=0x82aee48, async-conn=0x82aee48) left
May 27 19:14:43 ha2 corosync[30954]:   [pcmk  ] notice: pcmk_shutdown:
stonithd confirmed stopped
May 27 19:14:43 ha2 corosync[30954]:   [pcmk  ] info: update_member: Node
ha2 now has process list: 00000000000000000000000000000002 (2)
May 27 19:14:43 ha2 corosync[30954]:   [pcmk  ] notice: pcmk_shutdown:
Shutdown complete
May 27 19:14:43 ha2 corosync[30954]:   [SERV  ] Service engine unloaded:
Pacemaker Cluster Manager 1.0.8
May 27 19:14:43 ha2 corosync[30954]:   [SERV  ] Service engine unloaded:
corosync extended virtual synchrony service
May 27 19:14:43 ha2 corosync[30954]:   [SERV  ] Service engine unloaded:
corosync configuration service
May 27 19:14:43 ha2 corosync[30954]:   [SERV  ] Service engine unloaded:
corosync cluster closed process group service v1.01
May 27 19:14:43 ha2 corosync[30954]:   [SERV  ] Service engine unloaded:
corosync cluster config database access v1.01
May 27 19:14:43 ha2 corosync[30954]:   [SERV  ] Service engine unloaded:
corosync profile loading service
May 27 19:14:43 ha2 corosync[30954]:   [SERV  ] Service engine unloaded:
corosync cluster quorum service v0.1
May 27 19:14:43 ha2 corosync[30954]:   [MAIN  ] Corosync Cluster Engine
exiting with status -1 at main.c:160.
May 27 19:15:51 ha2 yum: Updated: corosynclib-1.2.2-1.1.el5.i386
May 27 19:15:51 ha2 yum: Updated: pacemaker-libs-1.0.8-6.1.el5.i386
May 27 19:15:52 ha2 yum: Updated: corosync-1.2.2-1.1.el5.i386
May 27 19:15:52 ha2 yum: Updated: pacemaker-1.0.8-6.1.el5.i386
May 27 19:17:00 ha2 corosync[3430]:   [MAIN  ] Corosync Cluster Engine
('1.2.2'): started and ready to provide service.
May 27 19:17:00 ha2 corosync[3430]:   [MAIN  ] Corosync built-in features:
nss rdma
May 27 19:17:00 ha2 corosync[3430]:   [MAIN  ] Successfully read main
configuration file '/etc/corosync/corosync.conf'.
May 27 19:17:00 ha2 corosync[3430]:   [TOTEM ] Initializing transport
(UDP/IP).
May 27 19:17:00 ha2 corosync[3430]:   [TOTEM ] Initializing transmit/receive
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

So in my case the sw upgrade was successfull with no downtime.

Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100527/4041f0bf/attachment-0002.htm>