[Pacemaker] Corosync / Pacemaker Cluster crashing

Andreas Kurz andreas at hastexo.com
Fri Apr 20 11:22:18 UTC 2012


On 04/20/2012 12:08 PM, Bensch, Kobus wrote:
> Hi
> 
> I have the following cluster setup:
> 
> 2 physical Dell servers with RHEL6.2 with all the latest patches.
> 
> Each server has 3 network connections that looks like this:
> 
> BOND02 NIC's
> 
> ETH4 for Corosync
> ETH6 for corosync
> 
> This is the corosync config:
> Cocorsync.conf
> aisexec {
> group:root
> user:root
> }
> 
> compatibility: whitetank
> service {
> use_mgmtd:yes
> use_logd:yes
> ver:0
> name:pacemaker
> }

you also specified that service in /etc/corosync/service.d/pcmk ...
remove one of them ... even better: remove that definition above and
install Pacemaker 1.1.6 and Corosync 1.4.x packages that are available
as technology preview in RHEL 6.2

> totem {
> rrp_mode:active
> join:180
> max_messages:20
> vsftype:none
> token:5000
> consensus:6000
> secauth:on
> token_retransmits_before_loss_const:10
> threads:0
> #threads:16
> version:2
> interface {
> bindnetaddr:10.255.1.0
> mcastaddr:232.10.1.1
> mcastport:5405
> ringnumber:0
> ttl:1
> }
> interface {
> bindnetaddr:10.255.2.0
> mcastaddr:232.10.2.1
> mcastport:5405
> ringnumber:1
> ttl:1
> }
> clear_node_high_bit:yes
> }
> logging {
> to_logfile:yes
> to_syslog:yes
> debug:off
> timestamp:on
> logfile: /var/log/cluster/corosync.log
> to_stderr:no
> fileline:off
> syslog_facility:daemon
> }
> amf {
> mode:disabled
> }
> 
> The pacemaker plugin:
> /etc/corosync/service.d/pcmk
> service {
>         # Load the Pacemaker Cluster Resource Manager
>         name: pacemaker
>         ver:  1
> }
> 
> Corosync keeps crashing when I try to do anything in the crm cli.
> Whether it is moving resources, creating resources, it does not matter.
> 
> The corosync config for now is very simple and looks like this:
> node lxdcv01nd01
> node lxdcv01nd02
> primitive lcdcv01 ocf:heartbeat:IPaddr2 \
> params ip="10.1.0.95" cidr_netmask="32" \
> op monitor interval="30s"
> primitive local-manage ocf:heartbeat:IPaddr2 \
> params ip="127.0.2.1" cidr_netmask="32" \
> op monitor interval="30s"
> location cli-prefer-lcdcv01 lcdcv01 \
> rule $id="cli-prefer-rule-lcdcv01" inf: #uname eq lxdcv01nd02
> location cli-prefer-local-manage local-manage \
> rule $id="cli-prefer-rule-local-manage" inf: #uname eq lxdcv01nd02
> property $id="cib-bootstrap-options" \
> dc-version="1.0.12-unknown" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
> 
> I tried to disable various config lines but still no joy. Any help would
> be appreciated.
> 
> When the server crashes I get this in the log:
> Apr 20 10:54:17 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:17 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:17 corosync [TOTEM ] FAILED TO RECEIVE

There have been problems with delayed mcast messages that could lead to
such errors, though that has been in older corosync versions ... should
not happen in recent corosync versions. See
http://answerpot.com/showthread.php?1361794-corosync+crashes

Another point for upgrading to recent versions ;-)

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> Apr 20 10:54:18 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:18 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:18 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:18 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:19 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:19 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:19 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:19 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:20 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:20 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:20 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:20 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:21 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:21 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:21 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:21 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:22 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:22 corosync [TOTEM ] FAILED TO RECEIVE
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [21259]:
> ERROR: ais_dispatch: Receiving message body failed: (2) Library error:
> Resource temporarily unavailable (11)
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [21254]:
> ERROR: ais_dispatch: Receiving message body failed: (2) Library error:
> Invalid argument (22)
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [21257]:
> ERROR: ais_dispatch: Receiving message body failed: (2) Library error:
> Resource temporarily unavailable (11)
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group cib: [21255]:
> ERROR: ais_dispatch: Receiving message body failed: (2) Library error:
> Resource temporarily unavailable (11)
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [21259]:
> ERROR: ais_dispatch: AIS connection failed
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [21254]:
> ERROR: ais_dispatch: AIS connection failed
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [21257]:
> ERROR: ais_dispatch: AIS connection failed
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [21259]:
> ERROR: crm_ais_destroy: AIS connection terminated
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group cib: [21255]:
> ERROR: ais_dispatch: AIS connection failed
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [21254]:
> ERROR: AIS connection terminated
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [21257]:
> CRIT: attrd_ais_destroy: Lost connection to OpenAIS service!
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group cib: [21255]:
> ERROR: cib_ais_destroy: AIS connection terminated
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [21257]:
> info: main: Exiting...
> Apr 20 10:54:22 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [21257]:
> ERROR: attrd_cib_connection_destroy: Connection to the CIB terminated...
> Apr 20 10:54:36 corosync [MAIN  ] Corosync Cluster Engine ('1.2.7'):
> started and ready to provide service.
> Apr 20 10:54:36 corosync [MAIN  ] Corosync built-in features: nss rdma
> Apr 20 10:54:36 corosync [MAIN  ] Successfully read main configuration
> file '/etc/corosync/corosync.conf'.
> Apr 20 10:54:36 corosync [TOTEM ] Initializing transport (UDP/IP).
> Apr 20 10:54:36 corosync [TOTEM ] Initializing transmit/receive
> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Apr 20 10:54:36 corosync [TOTEM ] Initializing transport (UDP/IP).
> Apr 20 10:54:36 corosync [TOTEM ] Initializing transmit/receive
> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Apr 20 10:54:36 corosync [TOTEM ] The network interface [10.255.1.1] is
> now up.
> Apr 20 10:54:36 corosync [pcmk  ] info: process_ais_conf: Reading configure
> Set r/w permissions for uid=0, gid=0 on /var/log/cluster/corosync.log
> Apr 20 10:54:36 corosync [pcmk  ] info: config_find_init: Local handle:
> 5650605097994944514 for logging
> Apr 20 10:54:36 corosync [pcmk  ] info: config_find_next: Processing
> additional logging options...
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'off' for
> option: debug
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'yes' for
> option: to_logfile
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found
> '/var/log/cluster/corosync.log' for option: logfile
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'yes' for
> option: to_syslog
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'daemon'
> for option: syslog_facility
> Apr 20 10:54:36 corosync [pcmk  ] info: config_find_init: Local handle:
> 2730409743423111171 for service
> Apr 20 10:54:36 corosync [pcmk  ] info: config_find_next: Processing
> additional service options...
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Defaulting to
> 'pcmk' for option: clustername
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'yes' for
> option: use_logd
> Apr 20 10:54:36 corosync [pcmk  ] info: get_config_opt: Found 'yes' for
> option: use_mgmtd
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
> Apr 20 10:54:36 corosync [pcmk  ] Logging: Initialized pcmk_startup
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_startup: Maximum core file
> size is: 18446744073709551615
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_startup: Service: 9
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_startup: Local hostname:
> lxdcv01nd01.bauer-uk.bauermedia.group
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_update_nodeid: Local node
> id: 16908042
> Apr 20 10:54:36 corosync [pcmk  ] info: update_member: Creating entry
> for node 16908042 born on 0
> Apr 20 10:54:36 corosync [pcmk  ] info: update_member: 0x18db8e0 Node
> 16908042 now known as lxdcv01nd01.bauer-uk.bauermedia.group (was: (null))
> Apr 20 10:54:36 corosync [pcmk  ] info: update_member: Node
> lxdcv01nd01.bauer-uk.bauermedia.group now has 1 quorum votes (was 0)
> Apr 20 10:54:36 corosync [pcmk  ] info: update_member: Node
> 16908042/lxdcv01nd01.bauer-uk.bauermedia.group is now: member
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22445
> for process stonithd
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22446
> for process cib
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22447
> for process lrmd
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [21256]:
> info: lrmd is shutting down
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> WARN: Initializing connection to logging daemon failed. Logging daemon
> may not be running
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> WARN: Initializing connection to logging daemon failed. Logging daemon
> may not be running
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22448
> for process attrd
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> WARN: Initializing connection to logging daemon failed. Logging daemon
> may not be running
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: G_main_add_SignalHandler: Added signal handler for signal 10
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: Signal sent to pid=21256, waiting for process to exit
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22449
> for process pengine
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> WARN: Initializing connection to logging daemon failed. Logging daemon
> may not be running
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: Invoked: /usr/lib64/heartbeat/cib 
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: G_main_add_SignalHandler: Added signal handler for signal 12
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group pengine: [22449]:
> WARN: Initializing connection to logging daemon failed. Logging daemon
> may not be running
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22450
> for process crmd
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: Invoked: /usr/lib64/heartbeat/attrd 
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: G_main_add_TriggerHandler: Added signal manual handler
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group pengine: [22449]:
> info: Invoked: /usr/lib64/heartbeat/pengine 
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> WARN: Initializing connection to logging daemon failed. Logging daemon
> may not be running
> Apr 20 10:54:36 corosync [pcmk  ] info: spawn_child: Forked child 22451
> for process mgmtd
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: main: Starting up
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: G_main_add_SignalHandler: Added signal handler for signal 17
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group pengine: [22449]:
> WARN: main: Terminating previous PE instance
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: retrieveCib: Reading cluster configuration from:
> /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig)
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: crm_cluster_connect: Connecting to OpenAIS
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: Pacemaker
> Cluster Manager 1.0.12
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: crm_cluster_connect: Connecting to OpenAIS
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group pengine: [21258]:
> WARN: process_pe_message: Received quit message, terminating
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: init_ais_connection_once: Creating connection to our AIS plugin
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: Invoked: /usr/lib64/heartbeat/crmd 
> Apr 20 10:54:36 corosync [SERV  ] Service failed to load 'pacemaker'.
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: init_ais_connection_once: Creating connection to our AIS plugin
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: main: CRM Hg Version: unknown
> 
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync
> extended virtual synchrony service
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync
> configuration service
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: crmd_init: Starting crmd
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync
> cluster closed process group service v1.01
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: G_main_add_SignalHandler: Added signal handler for signal 17
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync
> cluster config database access v1.01
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync
> profile loading service
> Apr 20 10:54:36 corosync [SERV  ] Service engine loaded: corosync
> cluster quorum service v0.1
> Apr 20 10:54:36 corosync [MAIN  ] Compatibility mode set to whitetank.
>  Using V1 and V2 of the synchronization engine.
> Apr 20 10:54:36 corosync [TOTEM ] The network interface [10.255.2.1] is
> now up.
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: init_ais_connection_once: AIS connection established
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: startCib: CIB Initialization completed successfully
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: crm_cluster_connect: Connecting to OpenAIS
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: init_ais_connection_once: Creating connection to our AIS plugin
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: init_ais_connection_once: AIS connection established
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_ipc: Recorded connection
> 0x18e7150 for stonithd/22445
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_ipc: Recorded connection
> 0x18eb4b0 for attrd/22448
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: get_ais_nodeid: Server details: id=16908042
> uname=lxdcv01nd01.bauer-uk.bauermedia.group cname=pcmk
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: get_ais_nodeid: Server details: id=16908042
> uname=lxdcv01nd01.bauer-uk.bauermedia.group cname=pcmk
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: crm_new_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group now has
> id: 16908042
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: crm_new_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group now has
> id: 16908042
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: crm_new_peer: Node 16908042 is now known as
> lxdcv01nd01.bauer-uk.bauermedia.group
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: main: Cluster connection active
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: crm_new_peer: Node 16908042 is now known as
> lxdcv01nd01.bauer-uk.bauermedia.group
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: main: Accepting attribute updates
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: main: Starting mainloop...
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> notice: /usr/lib64/heartbeat/stonithd start up successfully.
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group stonithd: [22445]:
> info: G_main_add_SignalHandler: Added signal handler for signal 17
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: init_ais_connection_once: AIS connection established
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_ipc: Recorded connection
> 0x18ef810 for cib/22446
> Apr 20 10:54:36 corosync [pcmk  ] info: update_member: Node
> lxdcv01nd01.bauer-uk.bauermedia.group now has process list:
> 00000000000000000000000000053312 (340754)
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_ipc: Sending membership
> update 0 to cib
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: get_ais_nodeid: Server details: id=16908042
> uname=lxdcv01nd01.bauer-uk.bauermedia.group cname=pcmk
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: crm_new_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group now has
> id: 16908042
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: crm_new_peer: Node 16908042 is now known as
> lxdcv01nd01.bauer-uk.bauermedia.group
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: cib_init: Starting cib mainloop
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: ais_dispatch: Membership 0: quorum still lost
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: crm_update_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group:
> id=16908042 state=member (new) addr=(null) votes=1 (new) born=0 seen=0
> proc=00000000000000000000000000053312 (new)
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22455]:
> info: write_cib_contents: Archived previous version as
> /var/lib/heartbeat/crm/cib-80.raw
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22455]:
> info: write_cib_contents: Wrote version 0.89.0 of the CIB to disk
> (digest: e15d151e0fed09d1d411b21b345a8952)
> Apr 20 10:54:36 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22455]:
> info: retrieveCib: Reading cluster configuration from:
> /var/lib/heartbeat/crm/cib.cZHXQX (digest:
> /var/lib/heartbeat/crm/cib.U3NqAd)
> Apr 20 10:54:36 corosync [TOTEM ] Incrementing problem counter for seqid
> 1 iface 10.255.2.1 to [1 of 10]
> Apr 20 10:54:36 corosync [pcmk  ] notice: pcmk_peer_update: Transitional
> membership event on ring 208: memb=0, new=0, lost=0
> Apr 20 10:54:36 corosync [pcmk  ] notice: pcmk_peer_update: Stable
> membership event on ring 208: memb=1, new=1, lost=0
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_peer_update: NEW:
>  lxdcv01nd01.bauer-uk.bauermedia.group 16908042
> Apr 20 10:54:36 corosync [pcmk  ] info: pcmk_peer_update: MEMB:
> lxdcv01nd01.bauer-uk.bauermedia.group 16908042
> Apr 20 10:54:36 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Apr 20 10:54:36 corosync [MAIN  ] Completed service synchronization,
> ready to provide service.
> Apr 20 10:54:37 corosync [pcmk  ] notice: pcmk_peer_update: Transitional
> membership event on ring 212: memb=1, new=0, lost=0
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_peer_update: memb:
> lxdcv01nd01.bauer-uk.bauermedia.group 16908042
> Apr 20 10:54:37 corosync [pcmk  ] notice: pcmk_peer_update: Stable
> membership event on ring 212: memb=2, new=1, lost=0
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: Creating entry
> for node 33685258 born on 212
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: Node
> 33685258/unknown is now: member
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_peer_update: NEW:
>  .pending. 33685258
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_peer_update: MEMB:
> lxdcv01nd01.bauer-uk.bauermedia.group 16908042
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_peer_update: MEMB:
> .pending. 33685258
> Apr 20 10:54:37 corosync [pcmk  ] info: send_member_notification:
> Sending membership update 212 to 1 children
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: 0x18db8e0 Node
> 16908042 (lxdcv01nd01.bauer-uk.bauermedia.group) born on: 212
> Apr 20 10:54:37 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: ais_dispatch: Membership 212: quorum still lost
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: 0x18e6ac0 Node
> 33685258 ((null)) born on: 196
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: crm_new_peer: Node <null> now has id: 33685258
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: 0x18e6ac0 Node
> 33685258 now known as lxdcv01nd02.bauer-uk.bauermedia.group (was: (null))
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: crm_update_peer: Node (null): id=33685258 state=member (new)
> addr=r(0) ip(10.255.1.2) r(1) ip(10.255.2.2)  votes=0 born=0 seen=212
> proc=00000000000000000000000000000000
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: Node
> lxdcv01nd02.bauer-uk.bauermedia.group now has process list:
> 00000000000000000000000000013312 (78610)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: crm_update_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group:
> id=16908042 state=member addr=r(0) ip(10.255.1.1) r(1) ip(10.255.2.1)
>  (new) votes=1 born=0 seen=212 proc=00000000000000000000000000053312
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: Node
> lxdcv01nd02.bauer-uk.bauermedia.group now has 1 quorum votes (was 0)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> notice: ais_dispatch: Membership 212: quorum acquired
> Apr 20 10:54:37 corosync [pcmk  ] info: send_member_notification:
> Sending membership update 212 to 1 children
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: crm_get_peer: Node 33685258 is now known as
> lxdcv01nd02.bauer-uk.bauermedia.group
> Apr 20 10:54:37 corosync [pcmk  ] WARN: route_ais_message: Sending
> message to local.crmd failed: unknown (rc=-2)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: crm_update_peer: Node lxdcv01nd02.bauer-uk.bauermedia.group:
> id=33685258 state=member addr=r(0) ip(10.255.1.2) r(1) ip(10.255.2.2)
>  votes=1 (new) born=196 seen=212 proc=00000000000000000000000000013312 (new)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: cib_process_diff: Diff 0.91.3 -> 0.91.4 not applied to 0.89.0:
> current "epoch" is less than required
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: cib_server_process_diff: Requesting re-sync from peer
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> WARN: cib_diff_notify: Local-only Change (client:crmd, call: 77):
> -1.-1.-1 (Application of an update diff failed, requesting a full refresh)
> Apr 20 10:54:37 corosync [MAIN  ] Completed service synchronization,
> ready to provide service.
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> WARN: cib_server_process_diff: Not applying diff 0.91.4 -> 0.91.5 (sync
> in progress)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> WARN: cib_server_process_diff: Not applying diff 0.91.5 -> 0.91.6 (sync
> in progress)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> WARN: cib_server_process_diff: Not applying diff 0.91.6 -> 0.92.1 (sync
> in progress)
> Apr 20 10:54:37 corosync [pcmk  ] WARN: route_ais_message: Sending
> message to local.crmd failed: unknown (rc=-2)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22446]:
> info: cib_replace_notify: Local-only Replace: -1.-1.-1 from
> lxdcv01nd02.bauer-uk.bauermedia.group
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22456]:
> info: write_cib_contents: Archived previous version as
> /var/lib/heartbeat/crm/cib-81.raw
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22456]:
> info: write_cib_contents: Wrote version 0.92.0 of the CIB to disk
> (digest: 65cf2f5895618dbd08c40b8c39a479c5)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group cib: [22456]:
> info: retrieveCib: Reading cluster configuration from:
> /var/lib/heartbeat/crm/cib.8nhla0 (digest:
> /var/lib/heartbeat/crm/cib.nUDbdi)
> Apr 20 10:54:37 corosync [pcmk  ] WARN: route_ais_message: Sending
> message to local.crmd failed: unknown (rc=-2)
> Apr 20 10:54:37 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child
> process mgmtd exited (pid=22451, rc=100)
> Apr 20 10:54:37 corosync [pcmk  ] notice: pcmk_wait_dispatch: Child
> process mgmtd no longer wishes to be respawned
> Apr 20 10:54:37 corosync [pcmk  ] info: update_member: Node
> lxdcv01nd01.bauer-uk.bauermedia.group now has process list:
> 00000000000000000000000000013312 (78610)
> Apr 20 10:54:37 corosync [pcmk  ] WARN: route_ais_message: Sending
> message to local.crmd failed: unknown (rc=-2)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: G_main_add_SignalHandler: Added signal handler for signal 15
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: G_main_add_SignalHandler: Added signal handler for signal 17
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: enabling coredumps
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: G_main_add_SignalHandler: Added signal handler for signal 10
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: G_main_add_SignalHandler: Added signal handler for signal 12
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: Started.
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: do_cib_control: CIB connection established
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: crm_cluster_connect: Connecting to OpenAIS
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: init_ais_connection_once: Creating connection to our AIS plugin
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: init_ais_connection_once: AIS connection established
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_ipc: Recorded connection
> 0x18f53c0 for crmd/22450
> Apr 20 10:54:37 corosync [pcmk  ] info: pcmk_ipc: Sending membership
> update 212 to crmd
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: get_ais_nodeid: Server details: id=16908042
> uname=lxdcv01nd01.bauer-uk.bauermedia.group cname=pcmk
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: crm_new_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group now has
> id: 16908042
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: crm_new_peer: Node 16908042 is now known as
> lxdcv01nd01.bauer-uk.bauermedia.group
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: do_ha_control: Connected to the cluster
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: do_started: Delaying start, CCM (0000000000100000) not connected
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: crmd_init: Starting crmd's mainloop
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: config_query_callback: Checking for expired actions every 900000ms
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: config_query_callback: Sending expected-votes=2 to corosync
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> notice: ais_dispatch: Membership 212: quorum acquired
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: crm_new_peer: Node lxdcv01nd02.bauer-uk.bauermedia.group now has
> id: 33685258
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: crm_new_peer: Node 33685258 is now known as
> lxdcv01nd02.bauer-uk.bauermedia.group
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: crm_update_peer: Node lxdcv01nd02.bauer-uk.bauermedia.group:
> id=33685258 state=member (new) addr=r(0) ip(10.255.1.2) r(1)
> ip(10.255.2.2)  votes=1 born=196 seen=212
> proc=00000000000000000000000000013312
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: crm_update_peer: Node lxdcv01nd01.bauer-uk.bauermedia.group:
> id=16908042 state=member (new) addr=r(0) ip(10.255.1.1) r(1)
> ip(10.255.2.1)  (new) votes=1 (new) born=212 seen=212
> proc=00000000000000000000000000013312 (new)
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: do_started: The local CRM is operational
> Apr 20 10:54:37 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: do_state_transition: State transition S_STARTING -> S_PENDING [
> input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group pengine: [22449]:
> info: main: Starting pengine
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: ais_dispatch: Membership 212: quorum retained
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: update_dc: Set DC to lxdcv01nd02.bauer-uk.bauermedia.group (3.0.1)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: update_attrd: Connecting to attrd...
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: find_hash_entry: Creating hash entry for terminate
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: do_state_transition: State transition S_PENDING -> S_NOT_DC [
> input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: find_hash_entry: Creating hash entry for shutdown
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: attrd_local_callback: Sending full refresh (origin=crmd)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: erase_xpath_callback: Deletion of
> "//node_state[@uname='lxdcv01nd01.bauer-uk.bauermedia.group']/transient_attributes":
> ok (rc=0)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: crm_new_peer: Node lxdcv01nd02.bauer-uk.bauermedia.group now has
> id: 33685258
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: crm_new_peer: Node 33685258 is now known as
> lxdcv01nd02.bauer-uk.bauermedia.group
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: find_hash_entry: Creating hash entry for probe_complete
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: attrd_perform_update: Delaying operation probe_complete=<null>:
> cib not connected
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: do_lrm_rsc_op: Performing
> key=6:4:7:e6a3b9c7-c24d-497a-9c07-d6082ee231a9 op=local-manage_monitor_0 )
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: rsc:local-manage:2: probe
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: do_lrm_rsc_op: Performing
> key=7:4:7:e6a3b9c7-c24d-497a-9c07-d6082ee231a9 op=lcdcv01_monitor_0 )
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: rsc:lcdcv01:3: probe
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: process_lrm_event: LRM operation lcdcv01_monitor_0 (call=3, rc=0,
> cib-update=7, confirmed=true) ok
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: process_lrm_event: LRM operation local-manage_monitor_0 (call=2,
> rc=7, cib-update=8, confirmed=true) not running
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: attrd_trigger_update: Sending flush op to all hosts for:
> probe_complete (true)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: attrd_perform_update: Delaying operation probe_complete=true: cib
> not connected
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: do_lrm_rsc_op: Performing
> key=9:5:0:e6a3b9c7-c24d-497a-9c07-d6082ee231a9 op=lcdcv01_stop_0 )
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: rsc:lcdcv01:4: stop
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: attrd_trigger_update: Sending flush op to all hosts for:
> probe_complete (true)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: attrd_perform_update: Delaying operation probe_complete=true: cib
> not connected
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]:
> info: RA output: (lcdcv01:stop:stderr) logd is not running
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]:
> info: process_lrm_event: LRM operation lcdcv01_stop_0 (call=4, rc=0,
> cib-update=9, confirmed=true) ok
> Apr 20 10:54:38 corosync [TOTEM ] ring 1 active with no faults
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: cib_connect: Connected to the CIB after 1 signon attempts
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: cib_connect: Sending full refresh
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: attrd_trigger_update: Sending flush op to all hosts for:
> probe_complete (true)
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]:
> info: attrd_perform_update: Sent update 4: probe_complete=true
> 
>  
> Bauer Corporate Services UK LP (BCS) is a division of the Bauer Media Group the 
> largest consumer publisher in the UK, and second largest commercial radio 
> broadcaster. BCS provides financial services and manages and develops IT systems 
> on which our UK publishing, broadcast, digital and partner businesses depend.
> 
> The information in this email is intended only for the addressee(s) named above. 
> Access to this email by anyone else is unauthorised. If you are not the intended 
> recipient of this message any disclosure, copying, distribution or any action 
> taken in reliance on it is prohibited and may be unlawful. Bauer Corporate 
> Services do not warrant that any attachments are free from viruses or other 
> defects and accept no liability for any losses resulting from infected email 
> transmissions.
> 
> Please note that any views expressed in this email may be those of the 
> originator and do not necessarily reflect those of this organisation.
> 
> Bauer Corporate Services UK LP is registered in England; Registered address is 
> 1 Lincoln Court, Lincoln Road, Peterborough, PE1 2RF.
> 
> Registration number LP13195
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120420/839dd53f/attachment-0004.sig>


More information about the Pacemaker mailing list