[Pacemaker] Two cloned VM, only one of the both shows online when starting corosync/pacemaker
Guillaume Chanaud
guillaume.chanaud at connecting-nature.com
Fri Jul 16 15:44:27 UTC 2010
Hello,
I have a working config for a drbd/nfs active/passive cluster.
I added a webserver to the cluster. This webserver is hosted in a Fedora
13 VM.
Nodes communicate in a vlan (physical and virtual nodes communicate
directly through the vlan).
I configured a first webserver node, and everything went fine. It goes
online, start httpd (and pureftpd). A i said, this httpd node is a kvm
virtual machine.
DRBD/nfs active/passive are both physical nodes (under gentoo)
Every node are running corosync 1.2.3, fedora use pacemaker 1.1.1 and
gentoo pacemaker 1.0.8
Here is the cib :
node filer1.connecting-nature.com \
attributes standby="off"
node filer2.connecting-nature.com \
attributes standby="off"
node www01.connecting-nature.com \
attributes standby="off"
node www02.connecting-nature.com \
attributes standby="off"
primitive drbd_nfs ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="59s" role="Master" timeout="60s" \
op monitor interval="60s" role="Slave" timeout="60s"
primitive fs_nfs ocf:heartbeat:Filesystem \
op monitor interval="120s" timeout="60s" \
params device="/dev/drbd0" directory="/data" fstype="ext4"
primitive httpd_server ocf:heartbeat:apache \
op monitor interval="40s" timeout="20s" \
params configfile="/etc/httpd/conf/httpd.conf"
primitive ip_failover heartbeat:OVHfailover.py \
op monitor interval="120s" timeout="60s" \
params 1="xxxxxx-ovh" 2="xxxxxx" 3="nsxxxxx.ovh.net"
4="ip.fail.ov.er" \
meta target-role="Started"
primitive ip_nfs ocf:heartbeat:IPaddr2 \
op monitor interval="60s" timeout="20s" \
params ip="192.168.0.20" cidr_netmask="24" nic="vlan2019"
primitive nfs_server lsb:nfs \
op monitor interval="120s" timeout="60s"
primitive pureftpd_server ocf:heartbeat:Pure-FTPd \
op monitor interval="30s" timeout="10s" \
params script="/usr/sbin/pure-config.pl" daemon_type="ldap"
group group_nfs ip_nfs fs_nfs nfs_server ip_failover \
meta target-role="Started"
ms ms_drbd_nfs drbd_nfs \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" tar
ms ms_pureftpd_server pureftpd_server \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" tar
clone httpd-clone httpd_server
location drbd-filer1 ms_drbd_nfs \
rule $id="drbd-filer1-rule" 90: #uname eq
filer1.connecting-nature.com
location drbd-filer2 ms_drbd_nfs \
rule $id="drbd-filer2-rule" 90: #uname eq
filer2.connecting-nature.com
location httpd-www01 httpd-clone \
rule $id="httpd-www01-rule" 90: #uname eq
www01.connecting-nature.com
location httpd-www02 httpd-clone \
rule $id="httpd-www02-rule" 90: #uname eq
www02.connecting-nature.com
location nfs-filer1 group_nfs \
rule $id="nfs-filer1-rule" 90: #uname eq
filer1.connecting-nature.com
location nfs-filer2 group_nfs \
rule $id="nfs-filer2-rule" 90: #uname eq
filer2.connecting-nature.com
location pureftpd_www01 ms_pureftpd_server \
rule $id="pureftpd-www01-rule" 90: #uname eq
www01.connecting-nature.com
location pureftpd_www02 ms_pureftpd_server \
rule $id="pureftpd-www02-rule" 90: #uname eq
www02.connecting-nature.com
colocation nfs_on_drbd inf: group_nfs ms_drbd_nfs:Master
order nfs_after_drbd inf: ms_drbd_nfs:promote group_nfs:start
property $id="cib-bootstrap-options" \
symmetric-cluster="false" \
no-quorum-policy="ignore" \
default-resource-stickiness="100" \
default-resource-failure-stickiness="10" \
stonith-enabled="false" \
stonith-action="reboot" \
stop-orphan-resources="true" \
stop-orphan-actions="true" \
remove-after-stop="false" \
short-resource-names="true" \
transition-idle-timeout="3min" \
default-action-timeout="30s" \
is-managed-default="true" \
startup-fencing="true" \
cluster-delay="60s" \
expected-nodes="1" \
expected-quorum-votes="4" \
dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
cluster-infrastructure="openais" \
last-lrm-refresh="1279241420" \
election-timeout="50s"
Ok when starting www01 (for the first time ) crm_mon show :
Online: [ filer2.connecting-nature.com filer1.connecting-nature.com
www01.connecting-nature.com ]
Resource Group: group_nfs
ip_nfs (ocf::heartbeat:IPaddr2): Started
filer1.connecting-nature.com
fs_nfs (ocf::heartbeat:Filesystem): Started
filer1.connecting-nature.com
nfs_server (lsb:nfs): Started filer1.connecting-nature.com
ip_failover (heartbeat:OVHfailover.py): Started
filer1.connecting-nature.com
Master/Slave Set: ms_drbd_nfs
Masters: [ filer1.connecting-nature.com ]
Slaves: [ filer2.connecting-nature.com ]
Clone Set: httpd-clone
Started: [ www01.connecting-nature.com ]
Stopped: [ httpd_server:1]
Master/Slave Set: ms_pureftpd_server
Slaves: [ www01.connecting-nature.com ]
Stopped: [ pureftpd_server:1 ]
Nice, everything's ok.
Now, i want to start www02, which is a vm cloned from www01. Only the
hostname and ipaddress changed !
Online: [ filer2.connecting-nature.com filer1.connecting-nature.com
www01.connecting-nature.com ]
OFFLINE: [ www02.connecting-nature.com ]
...
The machine correctly join the cluster, but always stay stopped...So i
did the test by stopping www02, stopping www01, and then starting
www02... It works, www02 goes online, but now, if i want to start www01,
it stays offline.
I can't start the the two node, each time one or the other will stay
offline.
If i shutdown the online node, after some time, the offline node will
come online, and when i restart the shooted node, it'll stay offline.
Sometimes services are started on the offline node but it remains
offline for corosync ! (it happens when i shutdown/launch www node in
random order to see what happens)
I tried without any services, that doesn't change anything.
Here are my corosync.conf :
# Please read the corosync.conf.5 manual page
compatibility: none
aisexec {
# Run as root - this is necessary to be able to manage
resources with Pacemaker
user: root
group: root
}
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 0
use_logd: yes
}
totem {
version: 2
# How long before declaring a token lost (ms)
token: 5000
hold: 40
# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 30
# How long to wait for join messages in the membership protocol
(ms)
join: 1000
# How long to wait for consensus to be achieved before
# starting a new round of membership configuration (ms)
consensus: 7500
# Turn off the virtual synchrony filter
vsftype: none
# Number of messages that may be sent by one processor on
receipt of the token
max_messages: 20
# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes
# Enable encryption
secauth: off
threads: 0
# Optionally assign a fixed node id (integer)
nodeid: 30283707487
interface {
ringnumber: 0
bindnetaddr: 192.168.0.60
mcastport: 5406
mcastaddr: 225.0.0.1
}
}
logging {
fileline: on
to_stderr: yes
to_logfile: yes
to_syslog: yes
logfile: /var/log/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
the "bindnetaddress" change for every node, and the "nodeid" too (i
tried to put a fixed value to see if it was not a matter of auto
generated node id value, but it didn't change anything).
After some time, on the offline node, the only two processes running are :
/usr/lib64/heartbeat/lrmd
/usr/lib64/heartbeat/pengine
(there should be
corosync
/usr/lib64/heartbeat/stonithd
/usr/lib64/heartbeat/cib
/usr/lib64/heartbeat/lrmd
/usr/lib64/heartbeat/attrd
/usr/lib64/heartbeat/pengine
/usr/lib64/heartbeat/crmd
)
I join the log here :
Jul 16 17:37:57 www01 corosync[25654]: [MAIN ] main.c:1307 Corosync
Cluster Engine ('1.2.3'): started and ready to provide service.
Jul 16 17:37:57 www01 corosync[25654]: [MAIN ] main.c:1308 Corosync
built-in features: nss rdma
Jul 16 17:37:57 www01 corosync[25654]: [MAIN ] main.c:1383
Successfully read main configuration file '/etc/corosync/corosync.conf'.
Jul 16 17:37:57 www01 corosync[25654]: [TOTEM ] totemnet.c:204
Initializing transport (UDP/IP).
Jul 16 17:37:57 www01 corosync[25654]: [TOTEM ] totemudp.c:321
Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC
(mode 0).
Jul 16 17:37:58 www01 corosync[25654]: [TOTEM ] totemudp.c:1378 The
network interface [192.168.0.60] is now up.
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:330 info:
process_ais_conf: Reading configure
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:556 info:
config_find_init: Local handle: 2013064636357672962 for logging
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:582 info:
config_find_next: Processing additional logging options...
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:608 info:
get_config_opt: Found 'on' for option: debug
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:622 info:
get_config_opt: Defaulting to 'off' for option: to_file
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:608 info:
get_config_opt: Found 'yes' for option: to_syslog
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:622 info:
get_config_opt: Defaulting to 'daemon' for option: syslog_facility
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:556 info:
config_find_init: Local handle: 4730966301143465987 for service
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:582 info:
config_find_next: Processing additional service options...
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:622 info:
get_config_opt: Defaulting to 'pcmk' for option: clustername
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:608 info:
get_config_opt: Found 'yes' for option: use_logd
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:622 info:
get_config_opt: Defaulting to 'no' for option: use_mgmtd
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:554 info:
pcmk_startup: CRM: Initialized
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:555 Logging:
Initialized pcmk_startup
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:573 info:
pcmk_startup: Maximum core file size is: 18446744073709551615
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:594 info:
pcmk_startup: Service: 9
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:595 info:
pcmk_startup: Local hostname: www01.connecting-nature.com
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:510 info:
pcmk_update_nodeid: Local node id: 218936415
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:243 info:
update_member: Creating entry for node 218936415 born on 0
Jul 16 17:37:58 www01 stonith-ng: [25661]: WARN: Initializing connection
to logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: Invoked:
/usr/lib64/heartbeat/stonithd
Jul 16 17:37:58 www01 stonith-ng: [25661]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: crm_cluster_connect:
Connecting to OpenAIS
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: init_ais_connection:
Creating connection to our AIS plugin
Jul 16 17:37:58 www01 lrmd: [25663]: WARN: Initializing connection to
logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 lrmd: [25663]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Jul 16 17:37:58 www01 cib: [25662]: WARN: Initializing connection to
logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 cib: [25662]: info: Invoked: /usr/lib64/heartbeat/cib
Jul 16 17:37:58 www01 cib: [25662]: info: G_main_add_TriggerHandler:
Added signal manual handler
Jul 16 17:37:58 www01 cib: [25662]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Jul 16 17:37:58 www01 lrmd: [25663]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Jul 16 17:37:58 www01 attrd: [25664]: WARN: Initializing connection to
logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 pengine: [25665]: WARN: Initializing connection to
logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:270 info:
update_member: 0x1259e10 Node 218936415 now known as
www01.connecting-nature.com (was: (null))
Jul 16 17:37:58 www01 crmd: [25666]: WARN: Initializing connection to
logging daemon failed. Logging daemon may not be running
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: init_ais_connection:
AIS connection established
Jul 16 17:37:58 www01 cib: [25662]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Jul 16 17:37:58 www01 attrd: [25664]: info: Invoked:
/usr/lib64/heartbeat/attrd
Jul 16 17:37:58 www01 pengine: [25665]: info: Invoked:
/usr/lib64/heartbeat/pengine
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:286 info:
update_member: Node www01.connecting-nature.com now has 1 quorum votes
(was 0)
Jul 16 17:37:58 www01 crmd: [25666]: info: Invoked:
/usr/lib64/heartbeat/crmd
Jul 16 17:37:58 www01 cib: [25662]: WARN: retrieveCib: Cluster
configuration not found: /var/lib/heartbeat/crm/cib.xml
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: get_ais_nodeid: Server
details: id=218936415 uname=www01.connecting-nature.com cname=pcmk
Jul 16 17:37:58 www01 lrmd: [25663]: info: G_main_add_SignalHandler:
Added signal handler for signal 10
Jul 16 17:37:58 www01 attrd: [25664]: info: main: Starting up
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:296 info:
update_member: Node 218936415/www01.connecting-nature.com is now: member
Jul 16 17:37:58 www01 crmd: [25666]: info: main: CRM Hg Version:
972b9a5f68606f632893fceed658efa085062f55
Jul 16 17:37:58 www01 cib: [25662]: WARN: readCibXmlFile: Primary
configuration corrupt or unusable, trying backup...
Jul 16 17:37:58 www01 lrmd: [25663]: info: G_main_add_SignalHandler:
Added signal handler for signal 12
Jul 16 17:37:58 www01 attrd: [25664]: info: crm_cluster_connect:
Connecting to OpenAIS
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:132 info:
spawn_child: Forked child 25661 for process stonith-ng
Jul 16 17:37:58 www01 crmd: [25666]: info: crmd_init: Starting crmd
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: crm_new_peer: Node
www01.connecting-nature.com now has id: 218936415
Jul 16 17:37:58 www01 attrd: [25664]: info: init_ais_connection:
Creating connection to our AIS plugin
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:132 info:
spawn_child: Forked child 25662 for process cib
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: crm_new_peer: Node
218936415 is now known as www01.connecting-nature.com
Jul 16 17:37:58 www01 lrmd: [25663]: info: Started.
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:132 info:
spawn_child: Forked child 25663 for process lrmd
Jul 16 17:37:58 www01 attrd: [25664]: info: init_ais_connection: AIS
connection established
Jul 16 17:37:58 www01 cib: [25662]: WARN: readCibXmlFile: Continuing
with an empty configuration.
Jul 16 17:37:58 www01 stonith-ng: [25661]: info: main: Starting
stonith-ng mainloop
Jul 16 17:37:58 www01 pengine: [25665]: info: main: Starting pengine
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:132 info:
spawn_child: Forked child 25664 for process attrd
Jul 16 17:37:58 www01 attrd: [25664]: info: get_ais_nodeid: Server
details: id=218936415 uname=www01.connecting-nature.com cname=pcmk
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:132 info:
spawn_child: Forked child 25665 for process pengine
Jul 16 17:37:58 www01 attrd: [25664]: info: crm_new_peer: Node
www01.connecting-nature.com now has id: 218936415
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:132 info:
spawn_child: Forked child 25666 for process crmd
Jul 16 17:37:58 www01 attrd: [25664]: info: crm_new_peer: Node 218936415
is now known as www01.connecting-nature.com
Jul 16 17:37:58 www01 corosync[25654]: [SERV ] service.c:265 Service
engine loaded: Pacemaker Cluster Manager 1.1.1
Jul 16 17:37:58 www01 attrd: [25664]: info: main: Cluster connection active
Jul 16 17:37:58 www01 corosync[25654]: [SERV ] service.c:265 Service
engine loaded: corosync extended virtual synchrony service
Jul 16 17:37:58 www01 crmd: [25666]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Jul 16 17:37:58 www01 attrd: [25664]: info: main: Accepting attribute
updates
Jul 16 17:37:58 www01 corosync[25654]: [SERV ] service.c:265 Service
engine loaded: corosync configuration service
Jul 16 17:37:58 www01 attrd: [25664]: info: main: Starting mainloop...
Jul 16 17:37:58 www01 corosync[25654]: [SERV ] service.c:265 Service
engine loaded: corosync cluster closed process group service v1.01
Jul 16 17:37:58 www01 corosync[25654]: [SERV ] service.c:265 Service
engine loaded: corosync cluster config database access v1.01
Jul 16 17:37:58 www01 corosync[25654]: [SERV ] service.c:265 Service
engine loaded: corosync profile loading service
Jul 16 17:37:58 www01 corosync[25654]: [SERV ] service.c:265 Service
engine loaded: corosync cluster quorum service v0.1
Jul 16 17:37:58 www01 corosync[25654]: [MAIN ] main.c:1220
Compatibility mode set to none. Using V2 of the synchronization engine.
Jul 16 17:37:58 www01 cib: [25662]: info: startCib: CIB Initialization
completed successfully
Jul 16 17:37:58 www01 cib: [25662]: info: crm_cluster_connect:
Connecting to OpenAIS
Jul 16 17:37:58 www01 cib: [25662]: info: init_ais_connection: Creating
connection to our AIS plugin
Jul 16 17:37:58 www01 cib: [25662]: info: init_ais_connection: AIS
connection established
Jul 16 17:37:58 www01 cib: [25662]: info: get_ais_nodeid: Server
details: id=218936415 uname=www01.connecting-nature.com cname=pcmk
Jul 16 17:37:58 www01 cib: [25662]: info: crm_new_peer: Node
www01.connecting-nature.com now has id: 218936415
Jul 16 17:37:58 www01 cib: [25662]: info: crm_new_peer: Node 218936415
is now known as www01.connecting-nature.com
Jul 16 17:37:58 www01 cib: [25662]: info: cib_init: Starting cib mainloop
Jul 16 17:37:58 www01 cib: [25662]: info: ais_dispatch: Membership
145204: quorum still lost
Jul 16 17:37:58 www01 cib: [25662]: info: crm_update_peer: Node
www01.connecting-nature.com: id=218936415 state=member (new) addr=r(0)
ip(192.168.0.60) (new) votes=1 (new) born=0 seen=145204
proc=00000000000000000000000000111312 (new)
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:699 notice:
pcmk_peer_update: Transitional membership event on ring 145204: memb=0,
new=0, lost=0
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:699 notice:
pcmk_peer_update: Stable membership event on ring 145204: memb=1, new=1,
lost=0
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:727 info:
pcmk_peer_update: NEW: www01.connecting-nature.com 218936415
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:743 info:
pcmk_peer_update: MEMB: www01.connecting-nature.com 218936415
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] utils.c:279 info:
update_member: Node www01.connecting-nature.com now has process list:
00000000000000000000000000111312 (1118994)
Jul 16 17:37:58 www01 corosync[25654]: [TOTEM ] totemsrp.c:1786 A
processor joined or left the membership and a new membership was formed.
Jul 16 17:37:58 www01 corosync[25654]: [MAIN ] main.c:261 Completed
service synchronization, ready to provide service.
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:973 info:
pcmk_ipc: Recorded connection 0x1263600 for stonith-ng/25661
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:973 info:
pcmk_ipc: Recorded connection 0x1265f70 for attrd/25664
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:973 info:
pcmk_ipc: Recorded connection 0x1266a20 for cib/25662
Jul 16 17:37:58 www01 corosync[25654]: [pcmk ] plugin.c:982 info:
pcmk_ipc: Sending membership update 145204 to cib
Jul 16 17:37:58 www01 cib: [25670]: info: write_cib_contents: Wrote
version 0.0.0 of the CIB to disk (digest: d73fab8054ae39ca7743184041837550)
Jul 16 17:37:58 www01 cib: [25670]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.zo9HRa (digest:
/var/lib/heartbeat/crm/cib.2sBGDK)
Jul 16 17:37:58 www01 cib: [25662]: info: Managed write_cib_contents
process 25670 exited with return code 0.
Jul 16 17:37:59 www01 crmd: [25666]: info: do_cib_control: CIB
connection established
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_cluster_connect:
Connecting to OpenAIS
Jul 16 17:37:59 www01 crmd: [25666]: info: init_ais_connection: Creating
connection to our AIS plugin
Jul 16 17:37:59 www01 crmd: [25666]: info: init_ais_connection: AIS
connection established
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:973 info:
pcmk_ipc: Recorded connection 0x12659e0 for crmd/25666
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:982 info:
pcmk_ipc: Sending membership update 145204 to crmd
Jul 16 17:37:59 www01 crmd: [25666]: info: get_ais_nodeid: Server
details: id=218936415 uname=www01.connecting-nature.com cname=pcmk
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_new_peer: Node
www01.connecting-nature.com now has id: 218936415
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_new_peer: Node 218936415
is now known as www01.connecting-nature.com
Jul 16 17:37:59 www01 crmd: [25666]: info: do_ha_control: Connected to
the cluster
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:699 notice:
pcmk_peer_update: Transitional membership event on ring 145212: memb=1,
new=0, lost=0
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:710 info:
pcmk_peer_update: memb: www01.connecting-nature.com 218936415
Jul 16 17:37:59 www01 cib: [25662]: info: ais_dispatch: Membership
145212: quorum still lost
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:699 notice:
pcmk_peer_update: Stable membership event on ring 145212: memb=4, new=3,
lost=0
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:243 info:
update_member: Creating entry for node 83929280 born on 145212
Jul 16 17:37:59 www01 cib: [25662]: info: crm_new_peer: Node <null> now
has id: 83929280
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:296 info:
update_member: Node 83929280/unknown is now: member
Jul 16 17:37:59 www01 cib: [25662]: info: crm_update_peer: Node (null):
id=83929280 state=member (new) addr=r(0) ip(192.168.0.5) votes=0 born=0
seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:727 info:
pcmk_peer_update: NEW: .pending. 83929280
Jul 16 17:37:59 www01 crmd: [25666]: info: do_started: Delaying start,
CCM (0000000000100000) not connected
Jul 16 17:37:59 www01 cib: [25662]: info: crm_new_peer: Node <null> now
has id: 100706496
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:243 info:
update_member: Creating entry for node 100706496 born on 145212
Jul 16 17:37:59 www01 cib: [25662]: info: crm_update_peer: Node (null):
id=100706496 state=member (new) addr=r(0) ip(192.168.0.6) votes=0
born=0 seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:296 info:
update_member: Node 100706496/unknown is now: member
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:727 info:
pcmk_peer_update: NEW: .pending. 100706496
Jul 16 17:37:59 www01 cib: [25662]: info: crm_new_peer: Node <null> now
has id: 218437511
Jul 16 17:37:59 www01 crmd: [25666]: info: crmd_init: Starting crmd's
mainloop
Jul 16 17:37:59 www01 cib: [25662]: info: crm_update_peer: Node (null):
id=218437511 state=member (new) addr=r(0) ip(192.168.0.70) votes=0
born=0 seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:243 info:
update_member: Creating entry for node 218437511 born on 145212
Jul 16 17:37:59 www01 crmd: [25666]: info: ais_dispatch: Membership
145204: quorum still lost
Jul 16 17:37:59 www01 cib: [25662]: notice: ais_dispatch: Membership
145212: quorum acquired
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:296 info:
update_member: Node 218437511/unknown is now: member
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_update_peer: Node
www01.connecting-nature.com: id=218936415 state=member (new) addr=r(0)
ip(192.168.0.60) (new) votes=1 (new) born=0 seen=145204
proc=00000000000000000000000000111312 (new)
Jul 16 17:37:59 www01 cib: [25662]: info: crm_get_peer: Node 100706496
is now known as filer2.connecting-nature.com
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:727 info:
pcmk_peer_update: NEW: .pending. 218437511
Jul 16 17:37:59 www01 cib: [25662]: info: crm_update_peer: Node
filer2.connecting-nature.com: id=100706496 state=member addr=r(0)
ip(192.168.0.6) votes=1 (new) born=15948 seen=145212
proc=00000000000000000000000000013312 (new)
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:743 info:
pcmk_peer_update: MEMB: .pending. 83929280
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:743 info:
pcmk_peer_update: MEMB: .pending. 100706496
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:743 info:
pcmk_peer_update: MEMB: www01.connecting-nature.com 218936415
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:743 info:
pcmk_peer_update: MEMB: .pending. 218437511
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:1288 info:
send_member_notification: Sending membership update 145212 to 2 children
Jul 16 17:37:59 www01 crmd: [25666]: info: do_started: Delaying start,
Config not read (0000000000000040)
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:259 info:
update_member: 0x1259e10 Node 218936415 ((null)) born on: 145212
Jul 16 17:37:59 www01 corosync[25654]: [TOTEM ] totemsrp.c:1786 A
processor joined or left the membership and a new membership was formed.
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:259 info:
update_member: 0x1265450 Node 100706496 (filer2.connecting-nature.com)
born on: 15948
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:270 info:
update_member: 0x1265450 Node 100706496 now known as
filer2.connecting-nature.com (was: (null))
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:279 info:
update_member: Node filer2.connecting-nature.com now has process list:
00000000000000000000000000013312 (78610)
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] utils.c:286 info:
update_member: Node filer2.connecting-nature.com now has 1 quorum votes
(was 0)
Jul 16 17:37:59 www01 crmd: [25666]: info: config_query_callback:
Checking for expired actions every 900000ms
Jul 16 17:37:59 www01 corosync[25654]: [pcmk ] plugin.c:1288 info:
send_member_notification: Sending membership update 145212 to 2 children
Jul 16 17:37:59 www01 crmd: [25666]: info: config_query_callback:
Sending expected-votes=2 to corosync
Jul 16 17:37:59 www01 crmd: [25666]: info: ais_dispatch: Membership
145212: quorum still lost
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_new_peer: Node <null> now
has id: 83929280
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_update_peer: Node (null):
id=83929280 state=member (new) addr=r(0) ip(192.168.0.5) votes=0 born=0
seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_new_peer: Node <null> now
has id: 100706496
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_update_peer: Node (null):
id=100706496 state=member (new) addr=r(0) ip(192.168.0.6) votes=0
born=0 seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_new_peer: Node <null> now
has id: 218437511
Jul 16 17:37:59 www01 crmd: [25666]: info: crm_update_peer: Node (null):
id=218437511 state=member (new) addr=r(0) ip(192.168.0.70) votes=0
born=0 seen=145212 proc=00000000000000000000000000000000
Jul 16 17:37:59 www01 crmd: [25666]: info: do_started: The local CRM is
operational
Jul 16 17:37:59 www01 crmd: [25666]: info: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Jul 16 17:38:00 www01 crmd: [25666]: notice: ais_dispatch: Membership
145212: quorum acquired
Jul 16 17:38:00 www01 crmd: [25666]: info: crm_get_peer: Node 100706496
is now known as filer2.connecting-nature.com
Jul 16 17:38:00 www01 crmd: [25666]: info: crm_update_peer: Node
filer2.connecting-nature.com: id=100706496 state=member addr=r(0)
ip(192.168.0.6) votes=1 (new) born=15948 seen=145212
proc=00000000000000000000000000013312 (new)
Jul 16 17:38:00 www01 crmd: [25666]: info: ais_dispatch: Membership
145212: quorum retained
Jul 16 17:38:00 www01 crmd: [25666]: info: te_connect_stonith:
Attempting connection to fencing daemon...
Jul 16 17:38:01 www01 crmd: [25666]: info: te_connect_stonith: Connected
Jul 16 17:38:03 www01 attrd: [25664]: info: cib_connect: Connected to
the CIB after 1 signon attempts
Jul 16 17:38:03 www01 attrd: [25664]: info: cib_connect: Sending full
refresh
Jul 16 17:38:08 www01 cib: [25662]: info: ais_dispatch: Membership
145212: quorum retained
Jul 16 17:38:08 www01 crmd: [25666]: info: ais_dispatch: Membership
145212: quorum retained
Jul 16 17:38:08 www01 cib: [25662]: info: crm_get_peer: Node 218437511
is now known as www02.connecting-nature.com
Jul 16 17:38:08 www01 crmd: [25666]: info: crm_get_peer: Node 218437511
is now known as www02.connecting-nature.com
Jul 16 17:38:08 www01 cib: [25662]: info: crm_update_peer: Node
www02.connecting-nature.com: id=218437511 state=member addr=r(0)
ip(192.168.0.70) votes=1 (new) born=145144 seen=145212
proc=00000000000000000000000000111312 (new)
Jul 16 17:38:08 www01 crmd: [25666]: info: crm_update_peer: Node
www02.connecting-nature.com: id=218437511 state=member addr=r(0)
ip(192.168.0.70) votes=1 (new) born=145144 seen=145212
proc=00000000000000000000000000111312 (new)
Jul 16 17:38:08 www01 cib: [25662]: info: ais_dispatch: Membership
145212: quorum retained
Jul 16 17:38:08 www01 cib: [25662]: info: crm_get_peer: Node 83929280 is
now known as filer1.connecting-nature.com
Jul 16 17:38:08 www01 cib: [25662]: info: crm_update_peer: Node
filer1.connecting-nature.com: id=83929280 state=member addr=r(0)
ip(192.168.0.5) votes=1 (new) born=15948 seen=145212
proc=00000000000000000000000000013312 (new)
Jul 16 17:38:08 www01 cib: [25662]: info: cib_process_diff: Diff
0.755.35 -> 0.755.36 not applied to 0.0.0: current "epoch" is less than
required
Jul 16 17:38:08 www01 cib: [25662]: info: cib_server_process_diff:
Requesting re-sync from peer
Jul 16 17:38:08 www01 cib: [25662]: WARN: cib_diff_notify: Local-only
Change (client:crmd, call: 5456): 0.0.0 (Application of an update diff
failed, requesting a full refresh)
Jul 16 17:38:08 www01 crmd: [25666]: info: ais_dispatch: Membership
145212: quorum retained
Jul 16 17:38:08 www01 crmd: [25666]: info: crm_get_peer: Node 83929280
is now known as filer1.connecting-nature.com
Jul 16 17:38:08 www01 crmd: [25666]: info: crm_update_peer: Node
filer1.connecting-nature.com: id=83929280 state=member addr=r(0)
ip(192.168.0.5) votes=1 (new) born=15948 seen=145212
proc=00000000000000000000000000013312 (new)
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] utils.c:259 info:
update_member: 0x1265700 Node 218437511 (www02.connecting-nature.com)
born on: 145144
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] utils.c:270 info:
update_member: 0x1265700 Node 218437511 now known as
www02.connecting-nature.com (was: (null))
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] utils.c:279 info:
update_member: Node www02.connecting-nature.com now has process list:
00000000000000000000000000111312 (1118994)
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] utils.c:286 info:
update_member: Node www02.connecting-nature.com now has 1 quorum votes
(was 0)
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:317 info:
update_expected_votes: Expected quorum votes 2 -> 3
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:1288 info:
send_member_notification: Sending membership update 145212 to 2 children
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] utils.c:259 info:
update_member: 0x1265140 Node 83929280 (filer1.connecting-nature.com)
born on: 15948
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] utils.c:270 info:
update_member: 0x1265140 Node 83929280 now known as
filer1.connecting-nature.com (was: (null))
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] utils.c:279 info:
update_member: Node filer1.connecting-nature.com now has process list:
00000000000000000000000000013312 (78610)
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] utils.c:286 info:
update_member: Node filer1.connecting-nature.com now has 1 quorum votes
(was 0)
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:317 info:
update_expected_votes: Expected quorum votes 3 -> 4
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:1288 info:
send_member_notification: Sending membership update 145212 to 2 children
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:699 notice:
pcmk_peer_update: Transitional membership event on ring 145224: memb=3,
new=0, lost=1
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:710 info:
pcmk_peer_update: memb: filer2.connecting-nature.com 100706496
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:710 info:
pcmk_peer_update: memb: www01.connecting-nature.com 218936415
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:710 info:
pcmk_peer_update: memb: www02.connecting-nature.com 218437511
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:715 info:
pcmk_peer_update: lost: filer1.connecting-nature.com 83929280
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:699 notice:
pcmk_peer_update: Stable membership event on ring 145224: memb=4, new=1,
lost=0
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:727 info:
pcmk_peer_update: NEW: filer1.connecting-nature.com 83929280
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:743 info:
pcmk_peer_update: MEMB: filer1.connecting-nature.com 83929280
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:743 info:
pcmk_peer_update: MEMB: filer2.connecting-nature.com 100706496
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:743 info:
pcmk_peer_update: MEMB: www01.connecting-nature.com 218936415
Jul 16 17:38:08 www01 corosync[25654]: [pcmk ] plugin.c:743 info:
pcmk_peer_update: MEMB: www02.connecting-nature.com 218437511
Jul 16 17:38:08 www01 corosync[25654]: [TOTEM ] totemsrp.c:1786 A
processor joined or left the membership and a new membership was formed.
Jul 16 17:38:11 www01 stonith-ng: [25661]: ERROR: ais_dispatch:
Receiving message body failed: (2) Library error: Resource temporarily
unavailable (11)
Jul 16 17:38:11 www01 cib: [25662]: ERROR: ais_dispatch: Receiving
message body failed: (2) Library error: Resource temporarily unavailable
(11)
Jul 16 17:38:11 www01 stonith-ng: [25661]: ERROR: ais_dispatch: AIS
connection failed
Jul 16 17:38:11 www01 cib: [25662]: ERROR: ais_dispatch: AIS connection
failed
Jul 16 17:38:11 www01 cib: [25662]: ERROR: cib_ais_destroy: AIS
connection terminated
Jul 16 17:38:11 www01 stonith-ng: [25661]: ERROR:
stonith_peer_ais_destroy: AIS connection terminated
Jul 16 17:38:11 www01 attrd: [25664]: ERROR: ais_dispatch: Receiving
message body failed: (2) Library error: Resource temporarily unavailable
(11)
Jul 16 17:38:11 www01 attrd: [25664]: ERROR: ais_dispatch: AIS
connection failed
Jul 16 17:38:11 www01 attrd: [25664]: CRIT: attrd_ais_destroy: Lost
connection to OpenAIS service!
Jul 16 17:38:11 www01 attrd: [25664]: info: main: Exiting...
Jul 16 17:38:11 www01 crmd: [25666]: info: cib_native_msgready: Lost
connection to the CIB service [25662].
Jul 16 17:38:11 www01 crmd: [25666]: CRIT: cib_native_dispatch: Lost
connection to the CIB service [25662/callback].
Jul 16 17:38:11 www01 crmd: [25666]: CRIT: cib_native_dispatch: Lost
connection to the CIB service [25662/command].
Jul 16 17:38:11 www01 attrd: [25664]: ERROR:
attrd_cib_connection_destroy: Connection to the CIB terminated...
Jul 16 17:38:11 www01 crmd: [25666]: ERROR: crmd_cib_connection_destroy:
Connection to the CIB terminated...
Jul 16 17:38:11 www01 crmd: [25666]: ERROR: ais_dispatch: Receiving
message body failed: (2) Library error: Invalid argument (22)
Jul 16 17:38:11 www01 crmd: [25666]: ERROR: ais_dispatch: AIS connection
failed
Jul 16 17:38:11 www01 crmd: [25666]: ERROR: crm_ais_destroy: AIS
connection terminated
sorry for this looooooonnnggg post, but i tried to be the most exhaustive
Thanks for your help if possible
Guillaume
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100716/12c34936/attachment-0001.html>
More information about the Pacemaker
mailing list