[Pacemaker] Issue with an isolated node overriding CIB after rejoining main cluster
Howley, Tom
tom.howley at hp.com
Mon Jul 15 16:05:42 UTC 2013
Hi Andrew,
Thanks for the reply. I have a couple of more questions below. I have seem to have two main problems: isolated node updating CIB; corosync behaviour to ifdown.
> Why isn't your normal fencing device working?
My normal fencing is working and was in place for nearly all of my testing. I just tried the "suicide" option to see if it would prevent the isolated node from carrying out any CIB updates.
> epoch is bumped after an election and a configuration change but NOT a status change.
> so it shouldn't be making it to 102
My log below shows that the cib-bootstrap-options property is being updated. Is this not a configuration change?
>> 1. My initial feeling was that the isolated node, Alice, (which has no quorum) should not be updating a CIB that could potentially override the sane part of the cluster. Is that a fair comment?
> Not as currently designed. Although there may be some improvements we can make in that area.
Would you consider this a bug, or is there a case where this behaviour is desired?
In the meantime, I have a run script over the weekend that brings down the network on the current drbd master, randomly using one of two options: ifdown ethX; or add iptables rule to block all incoming and outgoing packages. All of the roughly 350 block ports scenarios were successfully recovered (i.e. no split-brain), whereas 130 out of 350 ifdown scenarios resulted in split-brain (the script automatically repaired split-brain between test interations). (Note that in order to aggravate the problem, these tests are based on using stonith with an artificial delay before reset, and ensuring that crm-fence-peer timeout is still greater than this delay -- I also intend to redo test with normal conditions.)
Is this a known/expected issue, which effectively means I shouldn't test using "ifdown ethX"? If so, is there some configuration I can apply to change behaviour to ifdown? My major fear is that some network failure could trigger the code path that leads to the isolated node updating CIB, etc.
Thanks again,
Tom
-----Original Message-----
From: Andrew Beekhof [mailto:andrew at beekhof.net]
Sent: 15 July 2013 01:52
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Issue with an isolated node overriding CIB after rejoining main cluster
On 12/07/2013, at 10:49 PM, "Howley, Tom" <tom.howley at hp.com> wrote:
> Hi,
>
> pacemaker:1.1.6-2ubuntu3,
ouch
> corosync:1.4.2-2, drbd8-utils 2:8.3.11-0ubuntu1
>
> I have a three node setup, with two nodes running DRBD, resource-level fencing enabled ('resource-and-stonith') and obviously stonith configured for each node. In my current test case, I bring down network interface on the DRBD primary/master node (using ifdown eth0, for example), which sometimes leads to split-brain when the isolated node rejoins the cluster - the serious problem is that upon rejoining, the isolated node is promoted to DRBD primary (despite the original fencing constraint) , which opens us up to data-loss for updates that occurred while that node was down.
>
> The exact problem scenario is as follows:
> - Alice: DRBD Primary/Master, Bob: Secondary/Slave, Jim: Quorum node, Epoch=100
> - ifdown eth0 on Alice
> - Alice detects loss of network if, sets itself up as DC, carries out some CIB updates (see log snippet below) that raises the epoch level, say Epoch=102
epoch is bumped after an election and a configuration change but NOT a status change.
so it shouldn't be making it to 102
> - Alice is shot via stonith.
> - Bob adds fencing rule to CIB to prevent promotion of DRBD on any other node, Epoch=101
> - When Alice comes back and rejoins the cluster, the DC decides to sync to Alice CIB, thereby removing the fencing rule prematurely (i.e. before the drbd devices have resynched).
> - In some cases: Alice is promoted to Primary/Master and fences resource to prevent promotion on any other node.
> - We now have split-brain and potential loss of data.
>
> So some questions on the above:
> 1. My initial feeling was that the isolated node, Alice, (which has no quorum) should not be updating a CIB that could potentially override the sane part of the cluster. Is that a fair comment?
Not as currently designed. Although there may be some improvements we can make in that area.
> 2. Is this issue just particular to my use of 'ifdown ethX' to disable the network? This is hinted at here: https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface Has this issue been addressed, or will it be in the future?
> 3. If 'ifdown ethX is not valid', what is the best alternative that mimics what might happen in real world? I have tried blocking connections using iptables rules, dropping all incoming and outoing packets; initial testing appears to show different corosync behaviour that would hopefully not lead to my problem scenario, but I'm still in the process of confirming. I have also carried out some cable pulls and not run into issues yet, but this problem can be intermittent, so really needs an automated way to test many times.
> 4. The log snippet below from the isolated node shows that it updates the CIB twice sometime after detecting loss of network interface. Why does this happen? I believe that ultimately it is these CIB updates that increment the epoch, which leads to this CIB overriding the cluster later.
>
> I have also tried a no-quorum-policy of 'suicide' in an attempt to prevent CIB updates by the Alice, but it didn't make a different.
Why isn't your normal fencing device working?
> Note that to facilitate log collection and analysis, I have added a delay to the stonith reset operation, but I have also set the timeout on the crm-fence-peer script to ensure that it is greater than this 'deadtime'.
>
> Any advice on this would be greatly appreciated.
>
> Thanks,
>
> Tom
>
> Log snippet showing isolated node updating the CIB, which results in epoch being incremented two times:
>
> Jul 10 13:42:54 stratus18 corosync[1268]: [TOTEM ] A processor failed, forming new configuration.
> Jul 10 13:42:54 stratus18 corosync[1268]: [TOTEM ] The network interface is down.
> Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20758]: TOMTEST-DEBUG: modified version
> Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20758]: invoked for tomtest
> Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20761]: TOMTEST-DEBUG: modified version
> Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20761]: invoked for tomtest
> Jul 10 13:42:55 stratus18 stonith-ng: [1276]: info: stonith_command: Processed st_execute from lrmd: rc=-1
> Jul 10 13:42:55 stratus18 external/ipmi[20806]: [20816]: ERROR: error executing ipmitool: Connect failed: Network is unreachable#015 Unable to get Chassis Power Status#015
> Jul 10 13:42:55 stratus18 crm-fence-peer.sh[20758]: Call cib_query failed (-41): Remote node did not respond
> Jul 10 13:42:55 stratus18 crm-fence-peer.sh[20761]: Call cib_query failed (-41): Remote node did not respond
> Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #7 eth0, 192.168.185.150#123, interface stats: received=0, sent=0, dropped=0, active_time=912 secs
> Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #4 eth0, fe80::7ae7:d1ff:fe22:5270#123, interface stats: received=0, sent=0, dropped=0, active_time=6080 secs
> Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #3 eth0, 192.168.185.118#123, interface stats: received=52, sent=53, dropped=0, active_time=6080 secs
> Jul 10 13:42:55 stratus18 ntpd[1062]: 192.168.8.97 interface 192.168.185.118 -> (none)
> Jul 10 13:42:55 stratus18 ntpd[1062]: peers refreshed
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 2728: memb=1, new=0, lost=2
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: pcmk_peer_update: memb: .unknown. 16777343
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: pcmk_peer_update: lost: stratus18 1991878848
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: pcmk_peer_update: lost: stratus20 2025433280
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 2728: memb=1, new=0, lost=0
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: Creating entry for node 16777343 born on 2728
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: Node 16777343/unknown is now: member
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: pcmk_peer_update: MEMB: .pending. 16777343
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] ERROR: pcmk_peer_update: Something strange happened: 1
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: ais_mark_unseen_peer_dead: Node stratus17 was not seen in the previous transition
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: Node 1975101632/stratus17 is now: lost
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: ais_mark_unseen_peer_dead: Node stratus18 was not seen in the previous transition
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: Node 1991878848/stratus18 is now: lost
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: ais_mark_unseen_peer_dead: Node stratus20 was not seen in the previous transition
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: Node 2025433280/stratus20 is now: lost
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] WARN: pcmk_update_nodeid: Detected local node id change: 1991878848 -> 16777343
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: destroy_ais_node: Destroying entry for node 1991878848
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] notice: ais_remove_peer: Removed dead peer 1991878848 from the membership list
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: ais_remove_peer: Sending removal of 1991878848 to 2 children
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: 0x13d9520 Node 16777343 now known as stratus18 (was: (null))
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: Node stratus18 now has 1 quorum votes (was 0)
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: Node stratus18 now has process list: 00000000000000000000000000111312 (1118994)
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: send_member_notification: Sending membership update 2728 to 2 children
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: 0x13d9520 Node 16777343 ((null)) born on: 2708
> Jul 10 13:42:55 stratus18 corosync[1268]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now has id: 16777343
> Jul 10 13:42:55 stratus18 cib: [1277]: info: ais_dispatch_message: Membership 2728: quorum retained
> Jul 10 13:42:55 stratus18 cib: [1277]: info: ais_dispatch_message: Removing peer 1991878848/1991878848
> Jul 10 13:42:55 stratus18 cib: [1277]: info: reap_crm_member: Peer 1991878848 is unknown
> Jul 10 13:42:55 stratus18 cib: [1277]: notice: ais_dispatch_message: Membership 2728: quorum lost
> Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_update_peer: Node stratus17: id=1975101632 state=lost (new) addr=r(0) ip(192.168.185.117) votes=1 born=2724 seen=2724 proc=00000000000000000000000000111312
> Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_update_peer: Node stratus20: id=2025433280 state=lost (new) addr=r(0) ip(192.168.185.120) votes=1 born=4 seen=2724 proc=00000000000000000000000000111312
> Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now has id: 1991878848
> Jul 10 13:42:55 stratus18 corosync[1268]: [CPG ] chosen downlist: sender r(0) ip(127.0.0.1) ; members(old:3 left:3)
> Jul 10 13:42:55 stratus18 corosync[1268]: [MAIN ] Completed service synchronization, ready to provide service.
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_get_peer: Node stratus18 now has id: 16777343
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Membership 2728: quorum retained
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Removing peer 1991878848/1991878848
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: reap_crm_member: Peer 1991878848 is unknown
> Jul 10 13:42:55 stratus18 crmd: [1281]: notice: ais_dispatch_message: Membership 2728: quorum lost
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_status_callback: status: stratus17 is now lost (was member)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_peer: Node stratus17: id=1975101632 state=lost (new) addr=r(0) ip(192.168.185.117) votes=1 born=2724 seen=2724 proc=00000000000000000000000000111312
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_status_callback: status: stratus20 is now lost (was member)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_peer: Node stratus20: id=2025433280 state=lost (new) addr=r(0) ip(192.168.185.120) votes=1 born=4 seen=2724 proc=00000000000000000000000000111312
> Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: check_dead_member: Our DC node (stratus20) left the cluster
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ]
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: update_dc: Unset DC stratus20
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_te_control: Registering TE UUID: 6e335eff-5e48-4fc1-9003-0537ae948dfd
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: set_graph_functions: Setting custom graph functions
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_takeover: Taking over DC status for this partition
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_readwrite: We are now in R/W mode
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/57, version=0.76.46): ok (rc=0)
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/58, version=0.76.47): ok (rc=0)
> Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now has id: 16777343
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/60, version=0.76.48): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: join_make_offer: Making join offers based on membership 2728
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Membership 2728: quorum still lost
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/62, version=0.76.49): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crmd_ais_dispatch: Setting expected votes to 2
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: update_dc: Set DC to stratus18 (3.0.5)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Shutdown escalation occurs after: 1200000ms
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Checking for expired actions every 900000ms
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Sending expected-votes=3 to corosync
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Membership 2728: quorum still lost
> Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_expected_votes: Expected quorum votes 2 -> 3
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <cib admin_epoch="0" epoch="76" num_updates="49" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <configuration >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <crm_config >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <cluster_property_set id="cib-bootstrap-options" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <nvpair value="3" id="cib-bootstrap-options-expected-quorum-votes" />
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </cluster_property_set>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </crm_config>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </configuration>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </cib>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <cib admin_epoch="0" cib-last-written="Wed Jul 10 13:25:58 2013" crm_feature_set="3.0.5" epoch="77" have-quorum="1" num_updates="1" update-client="crmd" update-origin="stratus17" validate-with="pacemaker-1.2" dc-uuid="stratus20" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <configuration >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <crm_config >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <cluster_property_set id="cib-bootstrap-options" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2" />
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </cluster_property_set>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </crm_config>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </configuration>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </cib>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/65, version=0.77.1): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crmd_ais_dispatch: Setting expected votes to 3
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: All 1 cluster nodes responded to the join offer.
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_finalize: join-1: Syncing the CIB from stratus18 to the rest of the cluster
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <cib admin_epoch="0" epoch="77" num_updates="1" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <configuration >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <crm_config >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <cluster_property_set id="cib-bootstrap-options" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <nvpair value="2" id="cib-bootstrap-options-expected-quorum-votes" />
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </cluster_property_set>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </crm_config>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </configuration>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </cib>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <cib admin_epoch="0" cib-last-written="Wed Jul 10 13:42:55 2013" crm_feature_set="3.0.5" epoch="78" have-quorum="1" num_updates="1" update-client="crmd" update-origin="stratus18" validate-with="pacemaker-1.2" dc-uuid="stratus20" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <configuration >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <crm_config >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <cluster_property_set id="cib-bootstrap-options" >
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="3" />
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </cluster_property_set>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </crm_config>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </configuration>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </cib>
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/68, version=0.78.1): ok (rc=0)
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/69, version=0.78.1): ok (rc=0)
> Jul 10 13:42:55 stratus18 lrmd: [1278]: info: stonith_api_device_metadata: looking up external/ipmi/heartbeat metadata
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/70, version=0.78.2): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_ack: join-1: Updating node state to member for stratus18
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='stratus18']/lrm (origin=local/crmd/71, version=0.78.3): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: erase_xpath_callback: Deletion of "//node_state[@uname='stratus18']/lrm": ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_quorum: Updating quorum status to false (call=75)
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: do_te_invoke:167 - Triggered transition abort (complete=1) : Peer Cancelled
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 76: Requesting the current CIB: S_POLICY_ENGINE
> Jul 10 13:42:55 stratus18 attrd: [1279]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Jul 10 13:42:55 stratus18 attrd: [1279]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/73, version=0.78.5): ok (rc=0)
> Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: match_down_event: No match for shutdown action on stratus17
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: te_update_diff: Stonith/shutdown of stratus17 not matched
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, id=stratus17, magic=NA, cib=0.78.6) : Node failure
> Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: match_down_event: No match for shutdown action on stratus20
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: te_update_diff: Stonith/shutdown of stratus20 not matched
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, id=stratus20, magic=NA, cib=0.78.6) : Node failure
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 77: Requesting the current CIB: S_POLICY_ENGINE
> Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 78: Requesting the current CIB: S_POLICY_ENGINE
> Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/75, version=0.78.7): ok (rc=0)
> Jul 10 13:42:56 stratus18 crmd: [1281]: info: do_pe_invoke_callback: Invoking the PE: query=78, ref=pe_calc-dc-1373460176-49, seq=2728, quorate=0
> Jul 10 13:42:56 stratus18 attrd: [1279]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_tomtest:0 (10000)
> Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: cluster_status: We do not have quorum - fencing and resource management disabled
> Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: pe_fence_node: Node stratus17 will be fenced because it is un-expectedly down
> Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: determine_online_status: Node stratus17 is unclean
> Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: pe_fence_node: Node stratus20 will be fenced because it is un-expectedly down
> Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: determine_online_status: Node stratus20 is unclean
> Jul 10 13:42:56 stratus18 pengine: [1280]: notice: unpack_rsc_op: Hard error - drbd_tomtest:0_last_failure_0 failed with rc=5: Preventing ms_drbd_tomtest from re-starting on stratus20
> Jul 10 13:42:56 stratus18 pengine: [1280]: notice: unpack_rsc_op: Hard error - tomtest_mysql_SERVICE_last_failure_0 failed with rc=5: Preventing tomtest_mysql_SERVICE from re-starting on stratus20
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list