[Pacemaker] pacemaker 1.1.6

Tue Jun 3 11:06:26 CEST 2014

I was doing a nic firmware upgrade and i forgot to stop the cluster on node
where i was working, but something strange happened, both node are fenced
at the same time.

I'm using sbd as stonith device, with the following parameters.

watchdog time = 10 ; msgwait 20 ; stonith-timeout = 40(pacemaker)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140603/58735ad3/attachment.html>
-------------- next part --------------
May 31 14:41:48 node01 cluster-dlm: stop_kernel: clvmd stop_kernel cg 2
May 31 14:41:48 node01 corosync[76539]:  [CPG   ] chosen downlist: sender r(0) ip(191.255.5.201) ; members(old:2 left:1)
May 31 14:41:48 node01 cluster-dlm: do_sysfs: write "0" to "/sys/kernel/dlm/clvmd/control"
May 31 14:41:48 node01 crmd: [76549]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ]
May 31 14:41:48 node01 crmd: [76549]: info: update_dc: Unset DC node02
May 31 14:41:48 node01 corosync[76539]:  [MAIN  ] Completed service synchronization, ready to provide service.
May 31 14:41:48 node01 crmd: [76549]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
May 31 14:41:48 node01 crmd: [76549]: info: do_te_control: Registering TE UUID: 3f4ffc02-37c8-471d-bb82-43b23b6c96c4
May 31 14:41:48 node01 crmd: [76549]: info: set_graph_functions: Setting custom graph functions
May 31 14:41:48 node01 crmd: [76549]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses
May 31 14:41:48 node01 crmd: [76549]: info: do_dc_takeover: Taking over DC status for this partition
May 31 14:41:48 node01 cib: [76545]: info: cib_process_readwrite: We are now in R/W mode
May 31 14:41:48 node01 cluster-dlm: fence_node_time: Node 1241907135/node02 has not been shot yet
May 31 14:41:48 node01 cib: [76545]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/179, version=0.1600.32): ok (rc=0)
May 31 14:41:48 node01 cib: [76545]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/180, version=0.1600.33): ok (rc=0)
May 31 14:41:48 node01 cib: [76545]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/182, version=0.1600.34): ok (rc=0)
May 31 14:41:48 node01 crmd: [76549]: info: join_make_offer: Making join offers based on membership 1356
May 31 14:41:48 node01 crmd: [76549]: info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks
May 31 14:41:48 node01 crmd: [76549]: info: ais_dispatch_message: Membership 1356: quorum still lost
-------------- next part --------------
May 31 14:41:48 node02 kernel: [905880.644815] qlcnic 0000:08:00.1: phy port: 1 switch_mode: 0,
May 31 14:41:48 node02 kernel: [905880.644818]     max_tx_q: 1 max_rx_q: 16 min_tx_bw: 0x0,
May 31 14:41:48 node02 kernel: [905880.644820]     max_tx_bw: 0x64 max_mtu:0x2580, capabilities: 0xdeea0fae
May 31 14:41:48 node02 crmd: [16192]: info: crmd_ais_dispatch: Setting expected votes to 2
May 31 14:41:48 node02 sbd: [36423]: WARN: CIB: We do NOT have quorum!
May 31 14:41:48 node02 sbd: [36420]: WARN: Pacemaker health check: UNHEALTHY
May 31 14:41:48 node02 crmd: [16192]: WARN: match_down_event: No match for shutdown action on node01
May 31 14:41:48 node02 crmd: [16192]: info: te_update_diff: Stonith/shutdown of node01 not matched
May 31 14:41:48 node02 crmd: [16192]: info: abort_transition_graph: te_update_diff:234 - Triggered transition abort (complete=1, tag=node_state, id=s02srv002ch, magic=NA, cib=0.1600.33) : Node failure
May 31 14:41:48 node02 crmd: [16192]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
May 31 14:41:48 node02 crmd: [16192]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
May 31 14:41:48 node02 crmd: [16192]: info: do_pe_invoke: Query 1676: Requesting the current CIB: S_POLICY_ENGINE
May 31 14:41:48 node02 cib: [16188]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/1675, version=0.1600.35): ok (rc=0)
May 31 14:41:48 node02 cluster-dlm: fence_node_time: Node 1225129919/node01 has not been shot yet
May 31 14:41:48 node02 cluster-dlm: check_fencing_done: clvmd check_fencing 1225129919 wait add 1400654144 fail 1401540108 last 0
May 31 14:41:48 node02 kernel: [905880.676719] qlcnic 0000:08:00.1: Supports FW dump capability
May 31 14:41:48 node02 kernel: [905880.676728] qlcnic 0000:08:00.1: firmware v4.14.26
May 31 14:41:48 node02 crmd: [16192]: info: do_pe_invoke_callback: Invoking the PE: query=1676, ref=pe_calc-dc-1401540108-4630, seq=1356, quorate=0
May 31 14:41:48 node02 pengine: [16191]: notice: unpack_config: On loss of CCM Quorum: Ignore
May 31 14:41:48 node02 pengine: [16191]: WARN: pe_fence_node: Node node01 will be fenced because it is un-expectedly down
May 31 14:41:48 node02 pengine: [16191]: WARN: determine_online_status: Node s02srv002ch is unclean
May 31 14:41:48 node02 pengine: [16191]: WARN: custom_action: Action dlm:1_stop_0 on node01 is unrunnable (offline)
May 31 14:41:48 node02 pengine: [16191]: WARN: custom_action: Marking node node01 unclean
May 31 14:41:48 node02 pengine: [16191]: WARN: custom_action: Action clvm:1_stop_0 on node01 is unrunnable (offline)
May 31 14:41:48 node02 pengine: [16191]: WARN: custom_action: Marking node node01 unclean