[Pacemaker] Issue with an isolated node overriding CIB after rejoining main cluster

Sun Jul 14 20:52:05 EDT 2013

On 12/07/2013, at 10:49 PM, "Howley, Tom" <tom.howley at hp.com> wrote:

> Hi,
>  
> pacemaker:1.1.6-2ubuntu3,

ouch

> corosync:1.4.2-2, drbd8-utils 2:8.3.11-0ubuntu1
>  
> I have a three node setup, with two nodes running DRBD, resource-level fencing enabled (‘resource-and-stonith’) and obviously stonith configured for each node. In my current test case, I bring down network interface on the DRBD primary/master node (using ifdown eth0, for example), which sometimes leads to split-brain when the isolated node rejoins the cluster – the serious problem is that upon rejoining, the isolated node is promoted to DRBD primary (despite the original fencing constraint) , which opens us up to data-loss for updates that occurred while that node was down.
>  
> The exact problem scenario is as follows:
> -          Alice: DRBD Primary/Master, Bob: Secondary/Slave, Jim: Quorum node, Epoch=100
> -          ifdown eth0 on Alice
> -          Alice detects loss of network if, sets itself up as DC, carries out some CIB updates (see log snippet below) that raises the epoch level, say Epoch=102

epoch is bumped after an election and a configuration change but NOT a status change.
so it shouldn't be making it to 102

> -          Alice is shot via stonith.
> -          Bob adds fencing rule to CIB to prevent promotion of DRBD on any other node, Epoch=101
> -          When Alice comes back and rejoins the cluster, the DC decides to sync to Alice CIB, thereby removing the fencing rule prematurely (i.e. before the drbd devices have resynched).
> -          In some cases: Alice is promoted to Primary/Master and fences resource to prevent promotion on any other node.
> -          We now have split-brain and potential loss of data.
>  
> So some questions on the above:
> 1.       My initial feeling was that the isolated node, Alice,  (which has no quorum) should not be updating a CIB that could potentially override the sane part of the cluster. Is that a fair comment?

Not as currently designed.  Although there may be some improvements we can make in that area.

> 2.       Is this issue just particular to my use of ‘ifdown ethX’ to disable the network? This is hinted at here: https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface Has this issue been addressed, or will it be in the future?
> 3.        If ‘ifdown ethX is not valid’, what is the best alternative that mimics what might happen in real world? I have tried blocking connections using iptables rules, dropping all incoming and outoing packets; initial testing appears to show different corosync behaviour that would hopefully not lead to my problem scenario, but I’m still in the process of confirming. I have also carried out some cable pulls and not run into issues yet, but this problem can be intermittent, so really needs an automated way to test many times.
> 4.       The log snippet below from the isolated node shows that it updates the CIB twice sometime after detecting loss of network interface. Why does this happen? I believe that ultimately it is these CIB updates that increment the epoch, which leads to this CIB overriding the cluster later.
>  
> I have also tried a no-quorum-policy of ‘suicide’ in an attempt to prevent CIB updates by the Alice, but it didn’t make a different.

Why isn't your normal fencing device working?

> Note that to facilitate log collection and analysis, I have added a delay to the stonith reset operation, but I have also set the timeout on the crm-fence-peer script to ensure that it is greater than this ‘deadtime’.
>  
> Any advice on this would be greatly appreciated.
>  
> Thanks,
>  
> Tom
>  
> Log snippet showing isolated node updating the CIB, which results in epoch being incremented two times:
>  
> Jul 10 13:42:54 stratus18 corosync[1268]:   [TOTEM ] A processor failed, forming new configuration.
> Jul 10 13:42:54 stratus18 corosync[1268]:   [TOTEM ] The network interface is down.
> Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20758]: TOMTEST-DEBUG: modified version
> Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20758]: invoked for tomtest
> Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20761]: TOMTEST-DEBUG: modified version
> Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20761]: invoked for tomtest
> Jul 10 13:42:55 stratus18 stonith-ng: [1276]: info: stonith_command: Processed st_execute from lrmd: rc=-1
> Jul 10 13:42:55 stratus18 external/ipmi[20806]: [20816]: ERROR: error executing ipmitool: Connect failed: Network is unreachable#015 Unable to get Chassis Power Status#015
> Jul 10 13:42:55 stratus18 crm-fence-peer.sh[20758]: Call cib_query failed (-41): Remote node did not respond
> Jul 10 13:42:55 stratus18 crm-fence-peer.sh[20761]: Call cib_query failed (-41): Remote node did not respond
> Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #7 eth0, 192.168.185.150#123, interface stats: received=0, sent=0, dropped=0, active_time=912 secs
> Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #4 eth0, fe80::7ae7:d1ff:fe22:5270#123, interface stats: received=0, sent=0, dropped=0, active_time=6080 secs
> Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #3 eth0, 192.168.185.118#123, interface stats: received=52, sent=53, dropped=0, active_time=6080 secs
> Jul 10 13:42:55 stratus18 ntpd[1062]: 192.168.8.97 interface 192.168.185.118 -> (none)
> Jul 10 13:42:55 stratus18 ntpd[1062]: peers refreshed
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 2728: memb=1, new=0, lost=2
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: pcmk_peer_update: memb: .unknown. 16777343
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: pcmk_peer_update: lost: stratus18 1991878848
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: pcmk_peer_update: lost: stratus20 2025433280
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 2728: memb=1, new=0, lost=0
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Creating entry for node 16777343 born on 2728
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node 16777343/unknown is now: member
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: pcmk_peer_update: MEMB: .pending. 16777343
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] ERROR: pcmk_peer_update: Something strange happened: 1
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: ais_mark_unseen_peer_dead: Node stratus17 was not seen in the previous transition
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node 1975101632/stratus17 is now: lost
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: ais_mark_unseen_peer_dead: Node stratus18 was not seen in the previous transition
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node 1991878848/stratus18 is now: lost
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: ais_mark_unseen_peer_dead: Node stratus20 was not seen in the previous transition
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node 2025433280/stratus20 is now: lost
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] WARN: pcmk_update_nodeid: Detected local node id change: 1991878848 -> 16777343
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: destroy_ais_node: Destroying entry for node 1991878848
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] notice: ais_remove_peer: Removed dead peer 1991878848 from the membership list
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: ais_remove_peer: Sending removal of 1991878848 to 2 children
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: 0x13d9520 Node 16777343 now known as stratus18 (was: (null))
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node stratus18 now has 1 quorum votes (was 0)
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: Node stratus18 now has process list: 00000000000000000000000000111312 (1118994)
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: send_member_notification: Sending membership update 2728 to 2 children
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_member: 0x13d9520 Node 16777343 ((null)) born on: 2708
> Jul 10 13:42:55 stratus18 corosync[1268]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now has id: 16777343
> Jul 10 13:42:55 stratus18 cib: [1277]: info: ais_dispatch_message: Membership 2728: quorum retained
> Jul 10 13:42:55 stratus18 cib: [1277]: info: ais_dispatch_message: Removing peer 1991878848/1991878848
> Jul 10 13:42:55 stratus18 cib: [1277]: info: reap_crm_member: Peer 1991878848 is unknown
> Jul 10 13:42:55 stratus18 cib: [1277]: notice: ais_dispatch_message: Membership 2728: quorum lost
> Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_update_peer: Node stratus17: id=1975101632 state=lost (new) addr=r(0) ip(192.168.185.117)  votes=1 born=2724 seen=2724 proc=00000000000000000000000000111312
> Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_update_peer: Node stratus20: id=2025433280 state=lost (new) addr=r(0) ip(192.168.185.120)  votes=1 born=4 seen=2724 proc=00000000000000000000000000111312
> Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now has id: 1991878848
> Jul 10 13:42:55 stratus18 corosync[1268]:   [CPG   ] chosen downlist: sender r(0) ip(127.0.0.1) ; members(old:3 left:3)
> Jul 10 13:42:55 stratus18 corosync[1268]:   [MAIN  ] Completed service synchronization, ready to provide service.
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_get_peer: Node stratus18 now has id: 16777343
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Membership 2728: quorum retained
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Removing peer 1991878848/1991878848
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: reap_crm_member: Peer 1991878848 is unknown
> Jul 10 13:42:55 stratus18 crmd: [1281]: notice: ais_dispatch_message: Membership 2728: quorum lost
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_status_callback: status: stratus17 is now lost (was member)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_peer: Node stratus17: id=1975101632 state=lost (new) addr=r(0) ip(192.168.185.117)  votes=1 born=2724 seen=2724 proc=00000000000000000000000000111312
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_status_callback: status: stratus20 is now lost (was member)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_peer: Node stratus20: id=2025433280 state=lost (new) addr=r(0) ip(192.168.185.120)  votes=1 born=4 seen=2724 proc=00000000000000000000000000111312
> Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: check_dead_member: Our DC node (stratus20) left the cluster
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ]
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: update_dc: Unset DC stratus20
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_te_control: Registering TE UUID: 6e335eff-5e48-4fc1-9003-0537ae948dfd
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: set_graph_functions: Setting custom graph functions
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_takeover: Taking over DC status for this partition
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_readwrite: We are now in R/W mode
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/57, version=0.76.46): ok (rc=0)
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/58, version=0.76.47): ok (rc=0)
> Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now has id: 16777343
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/60, version=0.76.48): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: join_make_offer: Making join offers based on membership 2728
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Membership 2728: quorum still lost
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/62, version=0.76.49): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crmd_ais_dispatch: Setting expected votes to 2
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: update_dc: Set DC to stratus18 (3.0.5)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Shutdown escalation occurs after: 1200000ms
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Checking for expired actions every 900000ms
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Sending expected-votes=3 to corosync
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Membership 2728: quorum still lost
> Jul 10 13:42:55 stratus18 corosync[1268]:   [pcmk  ] info: update_expected_votes: Expected quorum votes 2 -> 3
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <cib admin_epoch="0" epoch="76" num_updates="49" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -   <configuration >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -     <crm_config >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -       <cluster_property_set id="cib-bootstrap-options" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -         <nvpair value="3" id="cib-bootstrap-options-expected-quorum-votes" />
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -       </cluster_property_set>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -     </crm_config>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -   </configuration>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </cib>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <cib admin_epoch="0" cib-last-written="Wed Jul 10 13:25:58 2013" crm_feature_set="3.0.5" epoch="77" have-quorum="1" num_updates="1" update-client="crmd" update-origin="stratus17" validate-with="pacemaker-1.2" dc-uuid="stratus20" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +   <configuration >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +     <crm_config >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +       <cluster_property_set id="cib-bootstrap-options" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +         <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2" />
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +       </cluster_property_set>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +     </crm_config>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +   </configuration>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </cib>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/65, version=0.77.1): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crmd_ais_dispatch: Setting expected votes to 3
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: All 1 cluster nodes responded to the join offer.
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_finalize: join-1: Syncing the CIB from stratus18 to the rest of the cluster
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <cib admin_epoch="0" epoch="77" num_updates="1" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -   <configuration >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -     <crm_config >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -       <cluster_property_set id="cib-bootstrap-options" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -         <nvpair value="2" id="cib-bootstrap-options-expected-quorum-votes" />
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -       </cluster_property_set>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -     </crm_config>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: -   </configuration>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </cib>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <cib admin_epoch="0" cib-last-written="Wed Jul 10 13:42:55 2013" crm_feature_set="3.0.5" epoch="78" have-quorum="1" num_updates="1" update-client="crmd" update-origin="stratus18" validate-with="pacemaker-1.2" dc-uuid="stratus20" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +   <configuration >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +     <crm_config >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +       <cluster_property_set id="cib-bootstrap-options" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +         <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="3" />
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +       </cluster_property_set>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +     </crm_config>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: +   </configuration>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </cib>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/68, version=0.78.1): ok (rc=0)
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/69, version=0.78.1): ok (rc=0)
> Jul 10 13:42:55 stratus18 lrmd: [1278]: info: stonith_api_device_metadata: looking up external/ipmi/heartbeat metadata
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/70, version=0.78.2): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_ack: join-1: Updating node state to member for stratus18
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='stratus18']/lrm (origin=local/crmd/71, version=0.78.3): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: erase_xpath_callback: Deletion of "//node_state[@uname='stratus18']/lrm": ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_quorum: Updating quorum status to false (call=75)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: do_te_invoke:167 - Triggered transition abort (complete=1) : Peer Cancelled
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 76: Requesting the current CIB: S_POLICY_ENGINE
> Jul 10 13:42:55 stratus18 attrd: [1279]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Jul 10 13:42:55 stratus18 attrd: [1279]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/73, version=0.78.5): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: match_down_event: No match for shutdown action on stratus17
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: te_update_diff: Stonith/shutdown of stratus17 not matched
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, id=stratus17, magic=NA, cib=0.78.6) : Node failure
> Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: match_down_event: No match for shutdown action on stratus20
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: te_update_diff: Stonith/shutdown of stratus20 not matched
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, id=stratus20, magic=NA, cib=0.78.6) : Node failure
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 77: Requesting the current CIB: S_POLICY_ENGINE
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 78: Requesting the current CIB: S_POLICY_ENGINE
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/75, version=0.78.7): ok (rc=0)
> Jul 10 13:42:56 stratus18 crmd: [1281]: info: do_pe_invoke_callback: Invoking the PE: query=78, ref=pe_calc-dc-1373460176-49, seq=2728, quorate=0
> Jul 10 13:42:56 stratus18 attrd: [1279]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_tomtest:0 (10000)
> Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: cluster_status: We do not have quorum - fencing and resource management disabled
> Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: pe_fence_node: Node stratus17 will be fenced because it is un-expectedly down
> Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: determine_online_status: Node stratus17 is unclean
> Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: pe_fence_node: Node stratus20 will be fenced because it is un-expectedly down
> Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: determine_online_status: Node stratus20 is unclean
> Jul 10 13:42:56 stratus18 pengine: [1280]: notice: unpack_rsc_op: Hard error - drbd_tomtest:0_last_failure_0 failed with rc=5: Preventing ms_drbd_tomtest from re-starting on stratus20
> Jul 10 13:42:56 stratus18 pengine: [1280]: notice: unpack_rsc_op: Hard error - tomtest_mysql_SERVICE_last_failure_0 failed with rc=5: Preventing tomtest_mysql_SERVICE from re-starting on stratus20
>  
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org