[Pacemaker] pacemaker node stuck offline

Andreas Kurz andreas at hastexo.com
Thu Mar 21 11:15:29 EDT 2013


On 2013-03-21 14:31, Patrick Hemmer wrote:
> I've got a 2-node cluster where it seems last night one of the nodes
> went offline, and I can't see any reason why.
> 
> Attached are the logs from the 2 nodes (the relevant timeframe seems to
> be 2013-03-21 between 06:05 and 06:10).
> This is on ubuntu 12.04

Looks like your non-redundant cluster-communication was interrupted at
around that time for whatever reason and your cluster split-brained.

Does the drbd-replication use a different network-connection? If yes,
why not using it for a redundant ring setup ... and you should use STONITH.

I also wonder why you have defined "expected_votes='1'" in your
cluster.conf.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now


> 
> # crm status
> ============
> Last updated: Thu Mar 21 13:17:21 2013
> Last change: Thu Mar 14 14:42:18 2013 via crm_shadow on i-a706d8ff
> Stack: cman
> Current DC: i-a706d8ff - partition WITHOUT quorum
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 2 Nodes configured, unknown expected votes
> 5 Resources configured.
> ============
> 
> Online: [ i-a706d8ff ]
> OFFLINE: [ i-3307d96b ]
> 
>  dns-postgresql    (ocf::cloud:route53):    Started i-a706d8ff
>  Master/Slave Set: ms-drbd-postgresql [drbd-postgresql]
>      Masters: [ i-a706d8ff ]
>      Stopped: [ drbd-postgresql:0 ]
>  fs-drbd-postgresql    (ocf::heartbeat:Filesystem):    Started i-a706d8ff
>  postgresql    (ocf::heartbeat:pgsql):    Started i-a706d8ff
> 
> 
> # cman_tool nodes
> Node  Sts   Inc   Joined               Name
> 181480898   M      4   2013-03-14 14:25:27  i-3307d96b
> 181481642   M   5132   2013-03-21 06:07:40  i-a706d8ff
> 
> 
> # cman_tool status
> Version: 6.2.0
> Config Version: 1
> Cluster Name: cloudapp-servic
> Cluster Id: 63629
> Cluster Member: Yes
> Cluster Generation: 5132
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 1
> Total votes: 2
> Node votes: 1
> Quorum: 2 
> Active subsystems: 4
> Flags:
> Ports Bound: 0 
> Node name: i-3307d96b
> Node ID: 181480898
> Multicast addresses: 255.255.255.255
> Node addresses: 10.209.45.194
> 
> 
> 
> # cat /etc/cluster/cluster.conf
> <?xml version="1.0" ?>
> <cluster name='cloudapp-servic' config_version='1'>
>     <logging to_logfile='no' syslog_facility='local2'
> syslog_priority='debug' />
>     <cman expected_votes='1' transport='udpu' />
>     <clusternodes>
>         <clusternode nodeid='181480898' name='i-3307d96b'>
>             <fence>
>                 <method name='pcmk-redirect'>
>                     <device name='pcmk' port='i-3307d96b' />
>                 </method>
>             </fence>
>         </clusternode>
>         <clusternode nodeid='181481642' name='i-a706d8ff'>
>             <fence>
>                 <method name='pcmk-redirect'>
>                     <device name='pcmk' port='i-a706d8ff' />
>                 </method>
>             </fence>
>         </clusternode>
>     </clusternodes>
> 
>     <fencedevices>
>         <fencedevice name="pcmk" agent="fence_pcmk" />
>     </fencedevices>
> </cluster>
> 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 287 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130321/0b1d400c/attachment-0003.sig>


More information about the Pacemaker mailing list