[Pacemaker] pacemaker node stuck offline
Andreas Kurz
andreas at hastexo.com
Thu Mar 21 15:15:29 UTC 2013
On 2013-03-21 14:31, Patrick Hemmer wrote:
> I've got a 2-node cluster where it seems last night one of the nodes
> went offline, and I can't see any reason why.
>
> Attached are the logs from the 2 nodes (the relevant timeframe seems to
> be 2013-03-21 between 06:05 and 06:10).
> This is on ubuntu 12.04
Looks like your non-redundant cluster-communication was interrupted at
around that time for whatever reason and your cluster split-brained.
Does the drbd-replication use a different network-connection? If yes,
why not using it for a redundant ring setup ... and you should use STONITH.
I also wonder why you have defined "expected_votes='1'" in your
cluster.conf.
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
>
> # crm status
> ============
> Last updated: Thu Mar 21 13:17:21 2013
> Last change: Thu Mar 14 14:42:18 2013 via crm_shadow on i-a706d8ff
> Stack: cman
> Current DC: i-a706d8ff - partition WITHOUT quorum
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 2 Nodes configured, unknown expected votes
> 5 Resources configured.
> ============
>
> Online: [ i-a706d8ff ]
> OFFLINE: [ i-3307d96b ]
>
> dns-postgresql (ocf::cloud:route53): Started i-a706d8ff
> Master/Slave Set: ms-drbd-postgresql [drbd-postgresql]
> Masters: [ i-a706d8ff ]
> Stopped: [ drbd-postgresql:0 ]
> fs-drbd-postgresql (ocf::heartbeat:Filesystem): Started i-a706d8ff
> postgresql (ocf::heartbeat:pgsql): Started i-a706d8ff
>
>
> # cman_tool nodes
> Node Sts Inc Joined Name
> 181480898 M 4 2013-03-14 14:25:27 i-3307d96b
> 181481642 M 5132 2013-03-21 06:07:40 i-a706d8ff
>
>
> # cman_tool status
> Version: 6.2.0
> Config Version: 1
> Cluster Name: cloudapp-servic
> Cluster Id: 63629
> Cluster Member: Yes
> Cluster Generation: 5132
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 1
> Total votes: 2
> Node votes: 1
> Quorum: 2
> Active subsystems: 4
> Flags:
> Ports Bound: 0
> Node name: i-3307d96b
> Node ID: 181480898
> Multicast addresses: 255.255.255.255
> Node addresses: 10.209.45.194
>
>
>
> # cat /etc/cluster/cluster.conf
> <?xml version="1.0" ?>
> <cluster name='cloudapp-servic' config_version='1'>
> <logging to_logfile='no' syslog_facility='local2'
> syslog_priority='debug' />
> <cman expected_votes='1' transport='udpu' />
> <clusternodes>
> <clusternode nodeid='181480898' name='i-3307d96b'>
> <fence>
> <method name='pcmk-redirect'>
> <device name='pcmk' port='i-3307d96b' />
> </method>
> </fence>
> </clusternode>
> <clusternode nodeid='181481642' name='i-a706d8ff'>
> <fence>
> <method name='pcmk-redirect'>
> <device name='pcmk' port='i-a706d8ff' />
> </method>
> </fence>
> </clusternode>
> </clusternodes>
>
> <fencedevices>
> <fencedevice name="pcmk" agent="fence_pcmk" />
> </fencedevices>
> </cluster>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 287 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130321/0b1d400c/attachment-0004.sig>
More information about the Pacemaker
mailing list