[Pacemaker] Master/Slave resource cannot start
Diego Remolina
diego.remolina at physics.gatech.edu
Wed Aug 12 12:13:23 UTC 2009
> Can you define "not correctly" please?
> I'd rather not ignore such behavior.
The machine would come up and not join the cluster. Checking the status
of openais would show as "Running". crm status would show:
Connection to cluster failed: connection failed
A look at the log file shows:
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] AIS Executive Service
RELEASE 'subrev 1152 version 0.80'
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] Copyright (C)
2002-2006 MontaVista Software, Inc and contributors.
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] Copyright (C) 2006
Red Hat, Inc.
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] AIS Executive
Service: started and ready to provide service.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Token Timeout (3000
ms) retransmit timeout (294 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] token hold (225 ms)
retransmits before loss (10 retrans)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] join (60 ms)
send_join (0 ms) consensus (1500 ms) merge (200 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] downcheck (1000 ms)
fail to recv const (50 msgs)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] seqno unchanged const
(30 rotations) Maximum network MTU 1500
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] window size per
rotation (50 messages) maximum messages per rotation (20 messages)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] send threads (0 threads)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP token expired
timeout (294 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP token problem
counter (2000 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP threshold (10
problem count)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP mode set to passive.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM]
heartbeat_failures_allowed (0)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] max_network_delay (50 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] HeartBeat is
Disabled. To enable set heartbeat_failures_allowed > 0
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Receive multicast
socket recv buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Transmit multicast
socket send buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] The network interface
[10.0.0.22] is now up.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Created or loaded
sequence id 112.10.0.0.22 for this ring.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Receive multicast
socket recv buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Transmit multicast
socket send buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] The network interface
[10.0.1.22] is now up.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] entering GATHER state
from 15.
Aug 12 07:57:17 phys-file02 openais[9380]: [crm ] info:
process_ais_conf: Reading configure
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info:
config_find_next: Processing additional logging options...
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: get_config_opt:
Found 'on' for option: debug
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: get_config_opt:
Defaulting to 'off' for option: to_file
Aug 12 07:57:21 phys-file02 crm_shadow: [9396]: info: Invoked: crm_shadow
I try to stop ais but it fails, the dots just keep appearing on the stop
command progress:
[root at phys-file02 log]# /etc/init.d/openais stop
Stopping OpenAIS daemon (aisexec):
..............................................
I have to Ctrl+C out of it and then
[root at phys-file02 log]# pkill -9 aisexec
[root at phys-file02 log]# ps -ef | grep ais
root 9639 5760 0 08:01 pts/1 00:00:00 grep ais
Then I start openais again and crm starts correctly.
[root at phys-file02 log]# /etc/init.d/openais start
Starting OpenAIS daemon (aisexec): starting... rc=0: OK
[root at phys-file02 log]# crm status
============
Last updated: Wed Aug 12 08:01:33 2009
Stack: openais
Current DC: phys-file01.physics.gatech.edu - partition with quorum
Version: 1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ phys-file01.physics.gatech.edu phys-file02.physics.gatech.edu ]
Master/Slave Set: ms-drbd_export
Masters: [ phys-file01.physics.gatech.edu ]
Slaves: [ phys-file02.physics.gatech.edu ]
Master/Slave Set: ms-drbd_scratch
Masters: [ phys-file01.physics.gatech.edu ]
Slaves: [ phys-file02.physics.gatech.edu ]
Resource Group: fileserver
fs_export (ocf::heartbeat:Filesystem): Started
phys-file01.physics.gatech.edu
fs_scratch (ocf::heartbeat:Filesystem): Started
phys-file01.physics.gatech.edu
virtual-ip-1 (ocf::heartbeat:IPaddr2): Started
phys-file01.physics.gatech.edu
nfs (lsb:nfs): Started phys-file01.physics.gatech.edu
samba (lsb:smb): Started phys-file01.physics.gatech.edu
Clone Set: pingd-clone
Started: [ phys-file01.physics.gatech.edu
phys-file02.physics.gatech.edu ]
I am not quite sure how to fix this to guarantee that openais always
starts crm correctly. My drbd interfaces are bonded, but they are set to
mode 2 which is failover, no round robing nor teaming, etc.
[root at phys-file02 log]# cat /proc/net/bonding/bond1 | grep Mode
Bonding Mode: fault-tolerance (active-backup)
[root at phys-file02 log]# cat /proc/net/bonding/bond2 | grep Mode
Bonding Mode: fault-tolerance (active-backup)
[root at phys-file02 log]# ifconfig bond1 | grep "inet addr"
inet addr:10.0.0.22 Bcast:10.0.0.255 Mask:255.255.255.0
[root at phys-file02 log]# ifconfig bond2 | grep "inet addr"
inet addr:10.0.1.22 Bcast:10.0.1.255 Mask:255.255.255.0
[root at phys-file02 log]# grep addr /etc/ais/openais.conf
bindnetaddr: 10.0.0.0
mcastaddr: 226.94.0.1
bindnetaddr: 10.0.1.0
mcastaddr: 226.94.1.1
On the other node:
[root at phys-file01 ~]# /etc/init.d/openais restart
Stopping OpenAIS daemon (aisexec): ..........OK
Starting OpenAIS daemon (aisexec): starting... rc=0: OK
[root at phys-file01 ~]# crm status
Connection to cluster failed: connection failed
Diego
More information about the Pacemaker
mailing list