[Pacemaker] Issue with the latest Pacemaker revision
Borislav Borisov
b.borisov at teracomm.bg
Thu Jan 26 14:27:07 UTC 2012
Greetings,
After upgrading from Pacemaker version 1.1.6 - 9971ebba4494 to version
1.1.6 - 2a6b296 crm fails to connect to the CIB. My setup is 2 Node
cluster with corosync and cman. Both versions are compiled with the same
options and my cluster.conf looks like this:
<?xml version="1.0"?>
<cluster name="pcmk" config_version="6">
<cman>
</cman>
<dlm>
</dlm>
<clusternodes>
<clusternode name="kvm1" nodeid="1">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="kvm1" />
</method>
</fence>
</clusternode>
<clusternode name="kvm2" nodeid="2">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="kvm2" />
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="pcmk" agent="fence_pcmk" />
</fencedevices>
</cluster>
Here is the output of the log file:
an 24 13:48:59 corosync [MAIN ] Corosync Cluster Engine ('1.4.2'):
started and ready to provide service.
Jan 24 13:48:59 corosync [MAIN ] Corosync built-in features: nss
Jan 24 13:48:59 corosync [MAIN ] Successfully read config from
/usr/local/etc/cluster/cluster.conf
Jan 24 13:48:59 corosync [MAIN ] Successfully parsed cman config
Jan 24 13:48:59 corosync [MAIN ] Successfully configured openais
services to load
Jan 24 13:48:59 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Jan 24 13:48:59 corosync [TOTEM ] Initializing transmit/receive
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jan 24 13:48:59 corosync [TOTEM ] The network interface [192.168.99.11]
is now up.
Jan 24 13:48:59 corosync [QUORUM] Using quorum provider quorum_cman
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: corosync
cluster quorum service v0.1
Jan 24 13:48:59 corosync [CMAN ] CMAN 1327149211 (built Jan 21 2012
14:33:53) started
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: corosync CMAN
membership service 2.90
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: openais cluster
membership service B.01.01
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: openais event
service B.01.01
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: openais
checkpoint service B.01.01
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: openais message
service B.03.01
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: openais
distributed locking service B.03.01
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: openais timer
service A.01.01
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: corosync
extended virtual synchrony service
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: corosync
configuration service
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: corosync
cluster closed process group service v1.01
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: corosync
cluster config database access v1.01
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: corosync
profile loading service
Jan 24 13:48:59 corosync [QUORUM] Using quorum provider quorum_cman
Jan 24 13:48:59 corosync [SERV ] Service engine loaded: corosync
cluster quorum service v0.1
Jan 24 13:48:59 corosync [MAIN ] Compatibility mode set to whitetank.
Using V1 and V2 of the synchronization engine.
Jan 24 13:48:59 corosync [CLM ] CLM CONFIGURATION CHANGE
Jan 24 13:48:59 corosync [CLM ] New Configuration:
Jan 24 13:48:59 corosync [CLM ] Members Left:
Jan 24 13:48:59 corosync [CLM ] Members Joined:
Jan 24 13:48:59 corosync [CLM ] CLM CONFIGURATION CHANGE
Jan 24 13:48:59 corosync [CLM ] New Configuration:
Jan 24 13:48:59 corosync [CLM ] r(0) ip(192.168.99.11)
Jan 24 13:48:59 corosync [CLM ] Members Left:
Jan 24 13:48:59 corosync [CLM ] Members Joined:
Jan 24 13:48:59 corosync [CLM ] r(0) ip(192.168.99.11)
Jan 24 13:48:59 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jan 24 13:48:59 corosync [QUORUM] Members[1]: 1
Jan 24 13:48:59 corosync [QUORUM] Members[1]: 1
Jan 24 13:48:59 corosync [CPG ] chosen downlist: sender r(0)
ip(192.168.99.11) ; members(old:0 left:0)
Jan 24 13:48:59 corosync [MAIN ] Completed service synchronization,
ready to provide service.
Jan 24 13:48:59 corosync [CLM ] CLM CONFIGURATION CHANGE
Jan 24 13:48:59 corosync [CLM ] New Configuration:
Jan 24 13:48:59 corosync [CLM ] r(0) ip(192.168.99.11)
Jan 24 13:48:59 corosync [CLM ] Members Left:
Jan 24 13:48:59 corosync [CLM ] Members Joined:
Jan 24 13:48:59 corosync [CLM ] CLM CONFIGURATION CHANGE
Jan 24 13:48:59 corosync [CLM ] New Configuration:
Jan 24 13:48:59 corosync [CLM ] r(0) ip(192.168.99.11)
Jan 24 13:48:59 corosync [CLM ] r(0) ip(192.168.99.12)
Jan 24 13:48:59 corosync [CLM ] Members Left:
Jan 24 13:48:59 corosync [CLM ] Members Joined:
Jan 24 13:48:59 corosync [CLM ] r(0) ip(192.168.99.12)
Jan 24 13:48:59 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jan 24 13:48:59 corosync [CMAN ] quorum regained, resuming activity
Jan 24 13:48:59 corosync [QUORUM] This node is within the primary
component and will provide service.
Jan 24 13:48:59 corosync [QUORUM] Members[2]: 1 2
Jan 24 13:48:59 corosync [QUORUM] Members[2]: 1 2
Jan 24 13:48:59 corosync [CPG ] chosen downlist: sender r(0)
ip(192.168.99.11) ; members(old:1 left:0)
Jan 24 13:48:59 corosync [MAIN ] Completed service synchronization,
ready to provide service.
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: crm_log_init_worker:
Changed active directory to /usr/local/var/lib/heartbeat/cores/root
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: main: Starting Pacemaker
1.1.6 (Build: 2a6b296): ncurses corosync-plugin cman
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: main: Maximum core file
size is: 4294967295
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: debug: cluster_connect_cfg:
Our nodeid: 1
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: debug: cluster_connect_cfg:
Adding fd=7 to mainloop
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: debug: cluster_connect_cpg:
Our nodeid: 1
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: debug: cluster_connect_cpg:
Adding fd=8 to mainloop
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: get_local_node_name:
Using CMAN node name: kvm1
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
0x8d06548 Node 1 now known as kvm1 (was: (null))
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm1 now has process list: 00000000000000000000000000000002 (was
00000000000000000000000000000000)
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: start_child: Forked
child 28681 for process stonith-ng
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm1 now has process list: 00000000000000000000000000100002 (was
00000000000000000000000000000002)
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: start_child: Forked
child 28682 for process cib
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm1 now has process list: 00000000000000000000000000100102 (was
00000000000000000000000000100002)
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: start_child: Forked
child 28683 for process lrmd
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm1 now has process list: 00000000000000000000000000100112 (was
00000000000000000000000000100102)
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: start_child: Forked
child 28684 for process attrd
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm1 now has process list: 00000000000000000000000000101112 (was
00000000000000000000000000100112)
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: start_child: Forked
child 28685 for process pengine
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm1 now has process list: 00000000000000000000000000111112 (was
00000000000000000000000000101112)
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: start_child: Forked
child 28686 for process crmd
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm1 now has process list: 00000000000000000000000000111312 (was
00000000000000000000000000111112)
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: main: Starting mainloop
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
0x8d083f0 Node 2 now known as kvm2 (was: (null))
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm2 now has process list: 00000000000000000000000000111312 (was
00000000000000000000000000000000)
Jan 24 13:49:35 kvm1 cib: [28682]: info: crm_log_init_worker: Changed
active directory to /usr/local/var/lib/heartbeat/cores/hacluster
Jan 24 13:49:35 kvm1 cib: [28682]: info: retrieveCib: Reading cluster
configuration from: /usr/local/var/lib/heartbeat/crm/cib.xml (digest:
/usr/local/var/lib/heartbea /crm/cib.xml.sig)
Jan 24 13:49:35 kvm1 cib: [28682]: WARN: retrieveCib: Cluster
configuration not found: /usr/local/var/lib/heartbeat/crm/cib.xml
Jan 24 13:49:35 kvm1 cib: [28682]: WARN: readCibXmlFile: Primary
configuration corrupt or unusable, trying backup...
Jan 24 13:49:35 kvm1 cib: [28682]: debug: get_last_sequence: Series file
/usr/local/var/lib/heartbeat/crm/cib.last does not exist
Jan 24 13:49:35 kvm1 cib: [28682]: debug: readCibXmlFile: Backup file
/usr/local/var/lib/heartbeat/crm/cib-99.raw not found
Jan 24 13:49:35 kvm1 cib: [28682]: WARN: readCibXmlFile: Continuing with
an empty configuration.
Jan 24 13:49:35 kvm1 stonith-ng: [28681]: info: Invoked:
/usr/local/libexec/pacemaker/stonithd
Jan 24 13:49:35 kvm1 attrd: [28684]: info: Invoked:
/usr/local/libexec/pacemaker/attrd
Jan 24 13:49:35 kvm1 attrd: [28684]: info: crm_log_init_worker: Changed
active directory to /usr/local/var/lib/heartbeat/cores/hacluster
Jan 24 13:49:35 kvm1 attrd: [28684]: info: main: Starting up
Jan 24 13:49:35 kvm1 attrd: [28684]: info: get_cluster_type: Cluster
type is: 'cman'
Jan 24 13:49:35 kvm1 attrd: [28684]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jan 24 13:49:35 kvm1 pengine: [28685]: info: Invoked:
/usr/local/libexec/pacemaker/pengine
Jan 24 13:49:35 kvm1 cib: [28682]: debug: readCibXmlFile: [on-disk] <cib
epoch="0" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" >
Jan 24 13:49:35 kvm1 cib: [28682]: debug: readCibXmlFile: [on-disk]
<configuration >
Jan 24 13:49:35 kvm1 cib: [28682]: debug: readCibXmlFile: [on-disk]
<crm_config />
Jan 24 13:49:35 kvm1 cib: [28682]: debug: readCibXmlFile: [on-disk]
<nodes />
Jan 24 13:49:35 kvm1 cib: [28682]: debug: readCibXmlFile: [on-disk]
<resources />
Jan 24 13:49:35 kvm1 cib: [28682]: debug: readCibXmlFile: [on-disk]
<constraints />
Jan 24 13:49:35 kvm1 cib: [28682]: debug: readCibXmlFile: [on-disk]
</configuration>
Jan 24 13:49:35 kvm1 cib: [28682]: debug: readCibXmlFile: [on-disk]
<status />
Jan 24 13:49:35 kvm1 cib: [28682]: debug: readCibXmlFile: [on-disk] </cib>
Jan 24 13:49:35 kvm1 cib: [28682]: info: validate_with_relaxng: Creating
RNG parser context
Jan 24 13:49:35 kvm1 lrmd: [28683]: info: enabling coredumps
Jan 24 13:49:35 kvm1 lrmd: [28683]: WARN: Core dumps could be lost if
multiple dumps occur.
Jan 24 13:49:35 kvm1 stonith-ng: [28681]: info: crm_log_init_worker:
Changed active directory to /usr/local/var/lib/heartbeat/cores/root
Jan 24 13:49:35 kvm1 stonith-ng: [28681]: info: get_cluster_type:
Cluster type is: 'cman'
Jan 24 13:49:35 kvm1 stonith-ng: [28681]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jan 24 13:49:35 kvm1 stonith-ng: [28681]: ERROR: init_cpg_connection:
corosync qorum is not supported in this build
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: ERROR: pcmk_child_exit: Child
process stonith-ng exited (pid=28681, rc=100)
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: notice: pcmk_child_exit: Child
process stonith-ng no longer wishes to be respawned
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm1 now has process list: 00000000000000000000000000011312 (was
00000000000000000000000000111312)
Jan 24 13:49:35 kvm1 attrd: [28684]: ERROR: init_cpg_connection:
corosync qorum is not supported in this build
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: ERROR: pcmk_child_exit: Child
process attrd exited (pid=28684, rc=100)
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: notice: pcmk_child_exit: Child
process attrd no longer wishes to be respawned
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm1 now has process list: 00000000000000000000000000010312 (was
00000000000000000000000000011312)
Jan 24 13:49:35 kvm1 crmd: [28686]: info: Invoked:
/usr/local/libexec/pacemaker/crmd
Jan 24 13:49:35 kvm1 lrmd: [28683]: WARN: Consider setting non-default
value in /proc/sys/kernel/core_pattern (or equivalent) for maximum
supportability
Jan 24 13:49:35 kvm1 lrmd: [28683]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Jan 24 13:49:35 kvm1 lrmd: [28683]: debug: main: run the loop...
Jan 24 13:49:35 kvm1 lrmd: [28683]: info: Started.
Jan 24 13:49:35 kvm1 pengine: [28685]: info: crm_log_init_worker:
Changed active directory to /usr/local/var/lib/heartbeat/cores/hacluster
Jan 24 13:49:35 kvm1 pengine: [28685]: debug: main: Checking for old
instances of pengine
Jan 24 13:49:35 kvm1 pengine: [28685]: debug:
init_client_ipc_comms_nodispatch: Attempting to talk on:
/usr/local/var/run/crm/pengine
Jan 24 13:49:35 kvm1 pengine: [28685]: debug:
init_client_ipc_comms_nodispatch: Could not init comms on:
/usr/local/var/run/crm/pengine
Jan 24 13:49:35 kvm1 pengine: [28685]: debug: main: Init server comms
Jan 24 13:49:35 kvm1 pengine: [28685]: info: main: Starting pengine
Jan 24 13:49:35 kvm1 crmd: [28686]: info: crm_log_init_worker: Changed
active directory to /usr/local/var/lib/heartbeat/cores/root
Jan 24 13:49:35 kvm1 crmd: [28686]: info: main: CRM Hg Version: 2a6b296
Jan 24 13:49:35 kvm1 crmd: [28686]: info: crmd_init: Starting crmd
Jan 24 13:49:35 kvm1 crmd: [28686]: debug: s_crmd_fsa: Processing
I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ]
Jan 24 13:49:35 kvm1 crmd: [28686]: debug: do_fsa_action:
actions:trace: // A_LOG
Jan 24 13:49:35 kvm1 crmd: [28686]: debug: do_fsa_action:
actions:trace: // A_STARTUP
Jan 24 13:49:35 kvm1 crmd: [28686]: debug: do_startup: Registering
Signal Handlers
Jan 24 13:49:35 kvm1 crmd: [28686]: debug: do_startup: Creating CIB and
LRM objects
Jan 24 13:49:35 kvm1 crmd: [28686]: debug: do_fsa_action:
actions:trace: // A_CIB_START
Jan 24 13:49:35 kvm1 crmd: [28686]: debug:
init_client_ipc_comms_nodispatch: Attempting to talk on:
/usr/local/var/run/crm/cib_rw
Jan 24 13:49:35 kvm1 crmd: [28686]: debug:
init_client_ipc_comms_nodispatch: Could not init comms on:
/usr/local/var/run/crm/cib_rw
Jan 24 13:49:35 kvm1 crmd: [28686]: debug: cib_native_signon_raw:
Connection to command channel failed
Jan 24 13:49:35 kvm1 crmd: [28686]: debug:
init_client_ipc_comms_nodispatch: Attempting to talk on:
/usr/local/var/run/crm/cib_callback
Jan 24 13:49:35 kvm1 crmd: [28686]: debug:
init_client_ipc_comms_nodispatch: Could not init comms on:
/usr/local/var/run/crm/cib_callback
Jan 24 13:49:35 kvm1 crmd: [28686]: debug: cib_native_signon_raw:
Connection to callback channel failed
Jan 24 13:49:35 kvm1 crmd: [28686]: debug: cib_native_signon_raw:
Connection to CIB failed: connection failed
Jan 24 13:49:35 kvm1 crmd: [28686]: debug: cib_native_signoff: Signing
out of the CIB Service
Jan 24 13:49:35 kvm1 cib: [28682]: debug: activateCibXml: Triggering CIB
write for start op
Jan 24 13:49:35 kvm1 cib: [28682]: info: startCib: CIB Initialization
completed successfully
Jan 24 13:49:35 kvm1 cib: [28682]: info: get_cluster_type: Cluster type
is: 'cman'
Jan 24 13:49:35 kvm1 cib: [28682]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jan 24 13:49:35 kvm1 cib: [28682]: ERROR: init_cpg_connection: corosync
qorum is not supported in this build
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: ERROR: pcmk_child_exit: Child
process cib exited (pid=28682, rc=100)
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: notice: pcmk_child_exit: Child
process cib no longer wishes to be respawned
Jan 24 13:49:35 kvm1 pacemakerd: [28677]: info: update_node_processes:
Node kvm1 now has process list: 00000000000000000000000000010212 (was
00000000000000000000000000010312)
Jan 24 13:49:36 kvm1 crmd: [28686]: debug:
init_client_ipc_comms_nodispatch: Attempting to talk on:
/usr/local/var/run/crm/cib_rw
Jan 24 13:49:36 kvm1 crmd: [28686]: debug:
init_client_ipc_comms_nodispatch: Could not init comms on:
/usr/local/var/run/crm/cib_rw
Jan 24 13:49:36 kvm1 crmd: [28686]: debug: cib_native_signon_raw:
Connection to command channel failed
Jan 24 13:49:36 kvm1 crmd: [28686]: debug:
init_client_ipc_comms_nodispatch: Attempting to talk on:
/usr/local/var/run/crm/cib_callback
Jan 24 13:49:36 kvm1 crmd: [28686]: debug:
init_client_ipc_comms_nodispatch: Could not init comms on:
/usr/local/var/run/crm/cib_callback
Jan 24 13:49:36 kvm1 crmd: [28686]: debug: cib_native_signon_raw:
Connection to callback channel failed
Jan 24 13:49:36 kvm1 crmd: [28686]: debug: cib_native_signon_raw:
Connection to CIB failed: connection failed
Jan 24 13:49:36 kvm1 crmd: [28686]: debug: cib_native_signoff: Signing
out of the CIB Service
Jan 24 13:49:36 kvm1 crmd: [28686]: info: do_cib_control: Could not
connect to the CIB service: connection failed
Jan 24 13:49:36 kvm1 crmd: [28686]: WARN: do_cib_control: Couldn't
complete CIB registration 1 times... pause and retry
The last bit repeats until crmd exits after 30 or so tries. If I launch
corosync and pacemaker as standalone daemons there is no such issue.
Obviously I am missing something out within cman and I just can't figure
out what. Any help is appreciated.
Regards,
Borislav Borisov
More information about the Pacemaker
mailing list