[Pacemaker] Issue with an isolated node overriding CIB after rejoining main cluster

Howley, Tom tom.howley at hp.com
Fri Jul 12 08:49:41 EDT 2013


Hi,

pacemaker:1.1.6-2ubuntu3, corosync:1.4.2-2, drbd8-utils 2:8.3.11-0ubuntu1

I have a three node setup, with two nodes running DRBD, resource-level fencing enabled ('resource-and-stonith') and obviously stonith configured for each node. In my current test case, I bring down network interface on the DRBD primary/master node (using ifdown eth0, for example), which sometimes leads to split-brain when the isolated node rejoins the cluster - the serious problem is that upon rejoining, the isolated node is promoted to DRBD primary (despite the original fencing constraint) , which opens us up to data-loss for updates that occurred while that node was down.

The exact problem scenario is as follows:

-          Alice: DRBD Primary/Master, Bob: Secondary/Slave, Jim: Quorum node, Epoch=100

-          ifdown eth0 on Alice

-          Alice detects loss of network if, sets itself up as DC, carries out some CIB updates (see log snippet below) that raises the epoch level, say Epoch=102

-          Alice is shot via stonith.

-          Bob adds fencing rule to CIB to prevent promotion of DRBD on any other node, Epoch=101

-          When Alice comes back and rejoins the cluster, the DC decides to sync to Alice CIB, thereby removing the fencing rule prematurely (i.e. before the drbd devices have resynched).

-          In some cases: Alice is promoted to Primary/Master and fences resource to prevent promotion on any other node.

-          We now have split-brain and potential loss of data.

So some questions on the above:

1.       My initial feeling was that the isolated node, Alice,  (which has no quorum) should not be updating a CIB that could potentially override the sane part of the cluster. Is that a fair comment?

2.       Is this issue just particular to my use of 'ifdown ethX' to disable the network? This is hinted at here: https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface Has this issue been addressed, or will it be in the future?

3.        If 'ifdown ethX is not valid', what is the best alternative that mimics what might happen in real world? I have tried blocking connections using iptables rules, dropping all incoming and outoing packets; initial testing appears to show different corosync behaviour that would hopefully not lead to my problem scenario, but I'm still in the process of confirming. I have also carried out some cable pulls and not run into issues yet, but this problem can be intermittent, so really needs an automated way to test many times.

4.       The log snippet below from the isolated node shows that it updates the CIB twice sometime after detecting loss of network interface. Why does this happen? I believe that ultimately it is these CIB updates that increment the epoch, which leads to this CIB overriding the cluster later.

I have also tried a no-quorum-policy of 'suicide' in an attempt to prevent CIB updates by the Alice, but it didn't make a different. Note that to facilitate log collection and analysis, I have added a delay to the stonith reset operation, but I have also set the timeout on the crm-fence-peer script to ensure that it is greater than this 'deadtime'.

Any advice on this would be greatly appreciated.

Thanks,

Tom

Log snippet showing isolated node updating the CIB, which results in epoch being incremented two times:

Jul 10 13:42:54 stratus18 corosync[1268]:   [TOTEM ] A processor failed, forming new configuration.
Jul 10 13:42:54 stratus18 corosync[1268]:   [TOTEM ] The network interface is down.
Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20758]: TOMTEST-DEBUG: modified version
Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20758]: invoked for tomtest
Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20761]: TOMTEST-DEBUG: modified version
Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20761]: invoked for tomtest
Jul 10 13:42:55 stratus18 stonith-ng: [1276]: info: stonith_command: Processed st_execute from lrmd: rc=-1
Jul 10 13:42:55 stratus18 external/ipmi[20806]: [20816]: ERROR: error executing ipmitool: Connect failed: Network is unreachable#015 Unable to get Chassis Power Status#015
Jul 10 13:42:55 stratus18 crm-fence-peer.sh[20758]: Call cib_query failed (-41): Remote node did not respond
Jul 10 13:42:55 stratus18 crm-fence-peer.sh[20761]: Call cib_query failed (-41): Remote node did not respond
Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #7 eth0, 192.168.185.150#123, interface stats: received=0, sent=0, dropped=0, active_time=912 secs
Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #4 eth0, fe80::7ae7:d1ff:fe22:5270#123, interface stats: received=0, sent=0, dropped=0, active_time=6080 secs
Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #3 eth0, 192.168.185.118#123, interface stats: received=52, sent=53, dropped=0, active_time=6080 secs
Jul 10 13:42:55 stratus18 ntpd[1062]: 192.168.8.97 interface 192.168.185.118 -> (none)
Jul 10 13:42:55 stratus18 ntpd[1062]: peers refreshed
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 2728: memb=1, new=0, lost=2
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: pcmk_peer_update: memb: .unknown. 16777343
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: pcmk_peer_update: lost: stratus18 1991878848
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: pcmk_peer_update: lost: stratus20 2025433280
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 2728: memb=1, new=0, lost=0
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Creating entry for node 16777343 born on 2728
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node 16777343/unknown is now: member
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: pcmk_peer_update: MEMB: .pending. 16777343
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] ERROR: pcmk_peer_update: Something strange happened: 1
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: ais_mark_unseen_peer_dead: Node stratus17 was not seen in the previous transition
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node 1975101632/stratus17 is now: lost
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: ais_mark_unseen_peer_dead: Node stratus18 was not seen in the previous transition
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node 1991878848/stratus18 is now: lost
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: ais_mark_unseen_peer_dead: Node stratus20 was not seen in the previous transition
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node 2025433280/stratus20 is now: lost
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] WARN: pcmk_update_nodeid: Detected local node id change: 1991878848 -> 16777343
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: destroy_ais_node: Destroying entry for node 1991878848
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] notice: ais_remove_peer: Removed dead peer 1991878848 from the membership list
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: ais_remove_peer: Sending removal of 1991878848 to 2 children
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: 0x13d9520 Node 16777343 now known as stratus18 (was: (null))
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node stratus18 now has 1 quorum votes (was 0)
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node stratus18 now has process list: 00000000000000000000000000111312 (1118994)
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: send_member_notification: Sending membership update 2728 to 2 children
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: 0x13d9520 Node 16777343 ((null)) born on: 2708
Jul 10 13:42:55 stratus18 corosync[1268]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now has id: 16777343
Jul 10 13:42:55 stratus18 cib: [1277]: info: ais_dispatch_message: Membership 2728: quorum retained
Jul 10 13:42:55 stratus18 cib: [1277]: info: ais_dispatch_message: Removing peer 1991878848/1991878848
Jul 10 13:42:55 stratus18 cib: [1277]: info: reap_crm_member: Peer 1991878848 is unknown
Jul 10 13:42:55 stratus18 cib: [1277]: notice: ais_dispatch_message: Membership 2728: quorum lost
Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_update_peer: Node stratus17: id=1975101632 state=lost (new) addr=r(0) ip(192.168.185.117)  votes=1 born=2724 seen=2724 proc=00000000000000000000000000111312
Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_update_peer: Node stratus20: id=2025433280 state=lost (new) addr=r(0) ip(192.168.185.120)  votes=1 born=4 seen=2724 proc=00000000000000000000000000111312
Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now has id: 1991878848
Jul 10 13:42:55 stratus18 corosync[1268]:   [CPG   ] chosen downlist: sender r(0) ip(127.0.0.1) ; members(old:3 left:3)
Jul 10 13:42:55 stratus18 corosync[1268]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_get_peer: Node stratus18 now has id: 16777343
Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Membership 2728: quorum retained
Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Removing peer 1991878848/1991878848
Jul 10 13:42:55 stratus18 crmd: [1281]: info: reap_crm_member: Peer 1991878848 is unknown
Jul 10 13:42:55 stratus18 crmd: [1281]: notice: ais_dispatch_message: Membership 2728: quorum lost
Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_status_callback: status: stratus17 is now lost (was member)
Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_peer: Node stratus17: id=1975101632 state=lost (new) addr=r(0) ip(192.168.185.117)  votes=1 born=2724 seen=2724 proc=00000000000000000000000000111312
Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_status_callback: status: stratus20 is now lost (was member)
Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_peer: Node stratus20: id=2025433280 state=lost (new) addr=r(0) ip(192.168.185.120)  votes=1 born=4 seen=2724 proc=00000000000000000000000000111312
Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: check_dead_member: Our DC node (stratus20) left the cluster
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ]
Jul 10 13:42:55 stratus18 crmd: [1281]: info: update_dc: Unset DC stratus20
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_te_control: Registering TE UUID: 6e335eff-5e48-4fc1-9003-0537ae948dfd
Jul 10 13:42:55 stratus18 crmd: [1281]: info: set_graph_functions: Setting custom graph functions
Jul 10 13:42:55 stratus18 crmd: [1281]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_takeover: Taking over DC status for this partition
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_readwrite: We are now in R/W mode
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/57, version=0.76.46): ok (rc=0)
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/58, version=0.76.47): ok (rc=0)
Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now has id: 16777343
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/60, version=0.76.48): ok (rc=0)
Jul 10 13:42:55 stratus18 crmd: [1281]: info: join_make_offer: Making join offers based on membership 2728
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks
Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Membership 2728: quorum still lost
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/62, version=0.76.49): ok (rc=0)
Jul 10 13:42:55 stratus18 crmd: [1281]: info: crmd_ais_dispatch: Setting expected votes to 2
Jul 10 13:42:55 stratus18 crmd: [1281]: info: update_dc: Set DC to stratus18 (3.0.5)
Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Shutdown escalation occurs after: 1200000ms
Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Checking for expired actions every 900000ms
Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Sending expected-votes=3 to corosync
Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Membership 2728: quorum still lost
Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_expected_votes: Expected quorum votes 2 -> 3
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <cib admin_epoch="0" epoch="76" num_updates="49" >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -   <configuration >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -     <crm_config >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -       <cluster_property_set id="cib-bootstrap-options" >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -         <nvpair value="3" id="cib-bootstrap-options-expected-quorum-votes" />
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -       </cluster_property_set>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -     </crm_config>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -   </configuration>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </cib>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <cib admin_epoch="0" cib-last-written="Wed Jul 10 13:25:58 2013" crm_feature_set="3.0.5" epoch="77" have-quorum="1" num_updates="1" update-client="crmd" update-origin="stratus17" validate-with="pacemaker-1.2" dc-uuid="stratus20" >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +   <configuration >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +     <crm_config >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +       <cluster_property_set id="cib-bootstrap-options" >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +         <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2" />
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +       </cluster_property_set>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +     </crm_config>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +   </configuration>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </cib>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/65, version=0.77.1): ok (rc=0)
Jul 10 13:42:55 stratus18 crmd: [1281]: info: crmd_ais_dispatch: Setting expected votes to 3
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: All 1 cluster nodes responded to the join offer.
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_finalize: join-1: Syncing the CIB from stratus18 to the rest of the cluster
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <cib admin_epoch="0" epoch="77" num_updates="1" >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -   <configuration >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -     <crm_config >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -       <cluster_property_set id="cib-bootstrap-options" >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -         <nvpair value="2" id="cib-bootstrap-options-expected-quorum-votes" />
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -       </cluster_property_set>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -     </crm_config>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -   </configuration>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </cib>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <cib admin_epoch="0" cib-last-written="Wed Jul 10 13:42:55 2013" crm_feature_set="3.0.5" epoch="78" have-quorum="1" num_updates="1" update-client="crmd" update-origin="stratus18" validate-with="pacemaker-1.2" dc-uuid="stratus20" >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +   <configuration >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +     <crm_config >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +       <cluster_property_set id="cib-bootstrap-options" >
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +         <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="3" />
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +       </cluster_property_set>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +     </crm_config>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +   </configuration>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </cib>
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/68, version=0.78.1): ok (rc=0)
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/69, version=0.78.1): ok (rc=0)
Jul 10 13:42:55 stratus18 lrmd: [1278]: info: stonith_api_device_metadata: looking up external/ipmi/heartbeat metadata
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/70, version=0.78.2): ok (rc=0)
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_ack: join-1: Updating node state to member for stratus18
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='stratus18']/lrm (origin=local/crmd/71, version=0.78.3): ok (rc=0)
Jul 10 13:42:55 stratus18 crmd: [1281]: info: erase_xpath_callback: Deletion of "//node_state[@uname='stratus18']/lrm": ok (rc=0)
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date
Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_quorum: Updating quorum status to false (call=75)
Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: do_te_invoke:167 - Triggered transition abort (complete=1) : Peer Cancelled
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 76: Requesting the current CIB: S_POLICY_ENGINE
Jul 10 13:42:55 stratus18 attrd: [1279]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jul 10 13:42:55 stratus18 attrd: [1279]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/73, version=0.78.5): ok (rc=0)
Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: match_down_event: No match for shutdown action on stratus17
Jul 10 13:42:55 stratus18 crmd: [1281]: info: te_update_diff: Stonith/shutdown of stratus17 not matched
Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, id=stratus17, magic=NA, cib=0.78.6) : Node failure
Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: match_down_event: No match for shutdown action on stratus20
Jul 10 13:42:55 stratus18 crmd: [1281]: info: te_update_diff: Stonith/shutdown of stratus20 not matched
Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, id=stratus20, magic=NA, cib=0.78.6) : Node failure
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 77: Requesting the current CIB: S_POLICY_ENGINE
Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 78: Requesting the current CIB: S_POLICY_ENGINE
Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/75, version=0.78.7): ok (rc=0)
Jul 10 13:42:56 stratus18 crmd: [1281]: info: do_pe_invoke_callback: Invoking the PE: query=78, ref=pe_calc-dc-1373460176-49, seq=2728, quorate=0
Jul 10 13:42:56 stratus18 attrd: [1279]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_tomtest:0 (10000)
Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: cluster_status: We do not have quorum - fencing and resource management disabled
Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: pe_fence_node: Node stratus17 will be fenced because it is un-expectedly down
Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: determine_online_status: Node stratus17 is unclean
Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: pe_fence_node: Node stratus20 will be fenced because it is un-expectedly down
Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: determine_online_status: Node stratus20 is unclean
Jul 10 13:42:56 stratus18 pengine: [1280]: notice: unpack_rsc_op: Hard error - drbd_tomtest:0_last_failure_0 failed with rc=5: Preventing ms_drbd_tomtest from re-starting on stratus20
Jul 10 13:42:56 stratus18 pengine: [1280]: notice: unpack_rsc_op: Hard error - tomtest_mysql_SERVICE_last_failure_0 failed with rc=5: Preventing tomtest_mysql_SERVICE from re-starting on stratus20

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130712/17d5ab56/attachment-0002.html>


More information about the Pacemaker mailing list