[Pacemaker] Problems with Pacemaker + Corosync after reboot
Daniel Bareiro
daniel-listas at gmx.net
Thu Dec 23 23:12:25 UTC 2010
On Wednesday, 22 December 2010 08:29:02 -0500,
Shravan Mishra wrote:
> Hi,
Hi, Shravan.
> What's happening is that corosync is forking but the exec is not
> happening.
And do you think that what is shown in the logs is consistent with what
is shown using ps?
> I used to see this problem in my case when syslog-ng process was not
> running.
>
> Try checking that and starting it and then start corosync.
Now I see that if I do a shutdown of the node that has the resource
(failover-ip), then this does not migrate to another node. By doing the
test I made sure Pacemaker + Corosync are functioning correctly on both
nodes before doing a shutdown of Atlantis.
Before making a shutdown of Atlantis:
-----------------------------------------------------------------------
daedalus:~# crm_mon --one-shot
============
Last updated: Thu Dec 23 19:24:09 2010
Stack: openais
Current DC: atlantis - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ atlantis daedalus ]
failover-ip (ocf::heartbeat:IPaddr): Started atlantis
-----------------------------------------------------------------------
After doing a shutdown of Atlantis:
-----------------------------------------------------------------------
daedalus:~# crm_mon --one-shot
============
Last updated: Thu Dec 23 19:25:44 2010
Stack: openais
Current DC: daedalus - partition WITHOUT quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ daedalus ]
OFFLINE: [ atlantis ]
-----------------------------------------------------------------------
Here I'm using a configuration like the one presented in the wiki [1].
I am also noting that after the Atlantis launch, corosync makes the fork
without exec (as we assume from what I showed in the previous mail) and
only now is when the resource migrates to Daedalus:
-----------------------------------------------------------------------
daedalus:~# crm_mon --one-shot
============
Last updated: Thu Dec 23 19:49:11 2010
Stack: openais
Current DC: daedalus - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ daedalus ]
OFFLINE: [ atlantis ]
failover-ip (ocf::heartbeat:IPaddr): Started daedalus
-----------------------------------------------------------------------
-----------------------------------------------------------------------
atlantis:~# crm_mon --one-shot
Connection to cluster failed: connection failed
-----------------------------------------------------------------------
I tried doing a "corosync stop", but the processes are not closed:
atlantis:~# ps auxf
[...]
root 1564 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync
root 1565 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync
root 1566 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync
root 1567 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync
root 1568 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync
root 1569 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync
The only way I found to correctly start corosync is doing a "pkill -9
corosync" and "corosync start":
atlantis:~# ps auxf
[...]
root 2120 0.2 1.9 134288 5060 ? Ssl 19:59 0:00 /usr/sbin/corosync
root 2128 0.0 4.5 76028 11600 ? SLs 19:59 0:00 \_ /usr/lib/heartbeat/stonithd
105 2129 0.1 2.0 79104 5120 ? S 19:59 0:00 \_ /usr/lib/heartbeat/cib
root 2130 0.0 0.8 71580 2108 ? S 19:59 0:00 \_ /usr/lib/heartbeat/lrmd
105 2131 0.0 1.3 79968 3340 ? S 19:59 0:00 \_ /usr/lib/heartbeat/attrd
105 2132 0.0 1.1 80332 2892 ? S 19:59 0:00 \_ /usr/lib/heartbeat/pengine
105 2133 0.0 1.4 86216 3764 ? S 19:59 0:00 \_ /usr/lib/heartbeat/crmd
After this, the resource automatically migrates back to Atlantis:
-----------------------------------------------------------------------
daedalus:~# crm_mon --one-shot
============
Last updated: Thu Dec 23 20:03:18 2010
Stack: openais
Current DC: daedalus - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ atlantis daedalus ]
failover-ip (ocf::heartbeat:IPaddr): Started atlantis
-----------------------------------------------------------------------
Any idea how to fix this problem with Corosync?
Why to do a shutdown of Atlantis the resource does not migrate to
Daedalus?
Thanks for your reply.
Regards,
Daniel
[1] http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo
--
Daniel Bareiro - GNU/Linux registered user #188.598
Proudly running Debian GNU/Linux with uptime:
17:52:45 up 71 days, 18:19, 10 users, load average: 0.00, 0.01, 0.03
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101223/412e0244/attachment-0004.sig>
More information about the Pacemaker
mailing list