[Pacemaker] Pacemaker handling dual primary DRBD to host Xen HVM(windows 7) DOMU doesn't start sometime and if it starts then doesn't migrate

kamal kishi kamal.kishi at gmail.com
Thu Jun 5 06:00:49 EDT 2014


Hi Andrew and Emi,

Please find the attached new pacemaker configuration and syslog,
log attached is when I turn off working node(server) and the xen doesn't
migrate.
After sometime it does start but throws white blank screen in VNCViewer

Thanks in advance


On Wed, Jun 4, 2014 at 2:52 PM, emmanuel segura <emi2fast at gmail.com> wrote:

> Because you don't have configured the fencing
>
>
> 2014-06-04 9:20 GMT+02:00 kamal kishi <kamal.kishi at gmail.com>:
>
> Hi emi,
>>
>> Cluster logs??
>> Rite now i'm getting all the logs in Syslog itself.
>>
>> Another thing i found out is that ocfs2 has some issue while a anyone
>> server is offline or powered off, can you suggest if using ocfs2 in here is
>> good option or not.
>>
>> Thank you
>>
>>
>> On Tue, Jun 3, 2014 at 6:31 PM, emmanuel segura <emi2fast at gmail.com>
>> wrote:
>>
>>> maybe i wrong, but i think you forgot the cluster logs
>>>
>>>
>>> 2014-06-03 14:34 GMT+02:00 kamal kishi <kamal.kishi at gmail.com>:
>>>
>>>> Hi all,
>>>>
>>>>         I'm sure many have come across same question and yes i've gone
>>>> through most of the blogs and mailing list without much results.
>>>> I'm trying to configure XEN HVM DOMU on DRBD replicated partition of
>>>> filesystem type ocfs2 using Pacemaker.
>>>>
>>>> My question is what all changes to be done to below mentioned files of
>>>> xen to work fine with pacemaker -
>>>> /etc/xen/xend-config.sxp
>>>> /etc/default/xendomains
>>>>
>>>> Let know if any other file to be edited .
>>>>
>>>> Find my configuration files attached.
>>>> Many times the xen resource doesn't start.
>>>> Even if the same starts, migration doesn't take place.
>>>> Checked logs, some "Unknown error" is printed
>>>>
>>>> Would be helpful if someone could guide me through with configuration.
>>>>
>>>> Thanks in advance guys
>>>>
>>>> --
>>>> Regards,
>>>> Kamal Kishore B V
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>
>>>
>>> --
>>> esta es mi vida e me la vivo hasta que dios quiera
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>>
>> --
>> Regards,
>> Kamal Kishore B V
>>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>



-- 
Regards,
Kamal Kishore B V
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140605/ea49f3fa/attachment-0003.html>
-------------- next part --------------
node server1
node server2
primitive Clu-FS-DRBD ocf:linbit:drbd \
        params drbd_resource="r0" \
        operations $id="Clu-FS-DRBD-ops" \
        op start interval="0" timeout="49s" \
        op stop interval="0" timeout="50s" \
        op monitor interval="40s" role="Master" timeout="50s" \
        op monitor interval="41s" role="Slave" timeout="51s" \
        meta target-role="started"
primitive Clu-FS-Mount ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/r0" directory="/cluster" fstype="ocfs2" \
        op monitor interval="120s" \
        meta target-role="started"
primitive xenwin7 ocf:heartbeat:Xen \
        params xmfile="/home/cluster/xen/win7.cfg" \
        op monitor interval="40s" \
        meta target-role="started" is-managed="true" allow-migrate="true"
ms Clu-FS-DRBD-Master Clu-FS-DRBD \
        meta resource-stickines="100" master-max="2" notify="true" interleave="true"
clone Clu-FS-Mount-Clone Clu-FS-Mount \
        meta interleave="true" ordered="true"
location drbd-fence-by-handler-Clu-FS-DRBD-Master Clu-FS-DRBD-Master \
        rule $id="drbd-fence-by-handler-rule-Clu-FS-DRBD-Master" $role="Master" -inf: #uname ne server1
colocation Clu-Clo-DRBD inf: Clu-FS-Mount-Clone Clu-FS-DRBD-Master:Master
colocation win7-Xen-Clu-Clo inf: xenwin7 Clu-FS-Mount-Clone
order Cluster-FS-After-DRBD inf: Clu-FS-DRBD-Master:promote Clu-FS-Mount-Clone:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        default-resource-stickiness="1000" \
        last-lrm-refresh="1401960233"
-------------- next part --------------
Jun  5 15:11:39 server1 NetworkManager[887]: <info> (eth0): carrier now OFF (device state 10)
Jun  5 15:11:39 server1 kernel: [ 2127.112852] bnx2 0000:01:00.0: eth0: NIC Copper Link is Down
Jun  5 15:11:39 server1 kernel: [ 2127.113876] xenbr0: port 1(eth0) entering forwarding state
Jun  5 15:11:41 server1 NetworkManager[887]: <info> (eth0): carrier now ON (device state 10)
Jun  5 15:11:41 server1 kernel: [ 2129.231687] bnx2 0000:01:00.0: eth0: NIC Copper Link is Up, 100 Mbps full duplex, receive & transmit flow control ON
Jun  5 15:11:41 server1 kernel: [ 2129.232672] xenbr0: port 1(eth0) entering forwarding state
Jun  5 15:11:41 server1 kernel: [ 2129.232696] xenbr0: port 1(eth0) entering forwarding state
Jun  5 15:11:42 server1 corosync[1556]:   [TOTEM ] A processor failed, forming new configuration.
Jun  5 15:11:43 server1 NetworkManager[887]: <info> (eth0): carrier now OFF (device state 10)
Jun  5 15:11:43 server1 kernel: [ 2130.624346] bnx2 0000:01:00.0: eth0: NIC Copper Link is Down
Jun  5 15:11:43 server1 kernel: [ 2130.625274] xenbr0: port 1(eth0) entering forwarding state
Jun  5 15:11:45 server1 corosync[1556]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 64: memb=1, new=0, lost=1
Jun  5 15:11:45 server1 corosync[1556]:   [pcmk  ] info: pcmk_peer_update: memb: server1 16777226
Jun  5 15:11:45 server1 corosync[1556]:   [pcmk  ] info: pcmk_peer_update: lost: server2 33554442
Jun  5 15:11:45 server1 corosync[1556]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 64: memb=1, new=0, lost=0
Jun  5 15:11:45 server1 corosync[1556]:   [pcmk  ] info: pcmk_peer_update: MEMB: server1 16777226
Jun  5 15:11:45 server1 corosync[1556]:   [pcmk  ] info: ais_mark_unseen_peer_dead: Node server2 was not seen in the previous transition
Jun  5 15:11:45 server1 corosync[1556]:   [pcmk  ] info: update_member: Node 33554442/server2 is now: lost
Jun  5 15:11:45 server1 corosync[1556]:   [pcmk  ] info: send_member_notification: Sending membership update 64 to 2 children
Jun  5 15:11:45 server1 corosync[1556]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun  5 15:11:45 server1 corosync[1556]:   [CPG   ] chosen downlist: sender r(0) ip(10.0.0.1) ; members(old:2 left:1)
Jun  5 15:11:45 server1 corosync[1556]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun  5 15:11:45 server1 cib: [1595]: notice: ais_dispatch_message: Membership 64: quorum lost
Jun  5 15:11:45 server1 cib: [1595]: info: crm_update_peer: Node server2: id=33554442 state=lost (new) addr=r(0) ip(10.0.0.2)  votes=1 born=60 seen=60 proc=00000000000000000000000000111312
Jun  5 15:11:45 server1 crmd: [1600]: notice: ais_dispatch_message: Membership 64: quorum lost
Jun  5 15:11:45 server1 crmd: [1600]: info: ais_status_callback: status: server2 is now lost (was member)
Jun  5 15:11:45 server1 crmd: [1600]: info: crm_update_peer: Node server2: id=33554442 state=lost (new) addr=r(0) ip(10.0.0.2)  votes=1 born=60 seen=60 proc=00000000000000000000000000111312
Jun  5 15:11:45 server1 crmd: [1600]: info: erase_node_from_join: Removed node server2 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1
Jun  5 15:11:45 server1 cib: [1595]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/146, version=0.52.3): ok (rc=0)
Jun  5 15:11:45 server1 crmd: [1600]: info: crm_update_quorum: Updating quorum status to false (call=148)
Jun  5 15:11:45 server1 cib: [1595]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/148, version=0.52.5): ok (rc=0)
Jun  5 15:11:45 server1 crmd: [1600]: info: crmd_ais_dispatch: Setting expected votes to 2
Jun  5 15:11:45 server1 crmd: [1600]: WARN: match_down_event: No match for shutdown action on server2
Jun  5 15:11:45 server1 crmd: [1600]: info: te_update_diff: Stonith/shutdown of server2 not matched
Jun  5 15:11:45 server1 crmd: [1600]: info: abort_transition_graph: te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, id=server2, magic=NA, cib=0.52.4) : Node failure
Jun  5 15:11:45 server1 crmd: [1600]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jun  5 15:11:45 server1 crmd: [1600]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
Jun  5 15:11:45 server1 crmd: [1600]: info: do_pe_invoke: Query 151: Requesting the current CIB: S_POLICY_ENGINE
Jun  5 15:11:45 server1 cib: [1595]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/150, version=0.52.6): ok (rc=0)
Jun  5 15:11:45 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the PE: query=151, ref=pe_calc-dc-1401961305-161, seq=64, quorate=0
Jun  5 15:11:45 server1 pengine: [1599]: notice: unpack_config: On loss of CCM Quorum: Ignore
Jun  5 15:11:45 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed op xenwin7_last_failure_0 on server1: unknown error (1)
Jun  5 15:11:45 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun  5 15:11:45 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun  5 15:11:45 server1 pengine: [1599]: notice: RecurringOp:  Start recurring monitor (40s) for xenwin7 on server1
Jun  5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-DRBD:0#011(Master server1)
Jun  5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-DRBD:1#011(Stopped)
Jun  5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-Mount:0#011(Started server1)
Jun  5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-Mount:1#011(Stopped)
Jun  5 15:11:45 server1 pengine: [1599]: notice: LogActions: Start   xenwin7#011(server1)
Jun  5 15:11:45 server1 crmd: [1600]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Jun  5 15:11:45 server1 crmd: [1600]: info: unpack_graph: Unpacked transition 38: 2 actions in 2 synapses
Jun  5 15:11:45 server1 crmd: [1600]: info: do_te_invoke: Processing graph 38 (ref=pe_calc-dc-1401961305-161) derived from /var/lib/pengine/pe-input-93.bz2
Jun  5 15:11:45 server1 crmd: [1600]: info: te_rsc_command: Initiating action 40: start xenwin7_start_0 on server1 (local)
Jun  5 15:11:45 server1 crmd: [1600]: info: do_lrm_rsc_op: Performing key=40:38:0:43add4e5-6270-43de-8ca9-8a4939271b5b op=xenwin7_start_0 )
Jun  5 15:11:45 server1 lrmd: [1596]: info: rsc:xenwin7 start[41] (pid 9270)
Jun  5 15:11:45 server1 pengine: [1599]: notice: process_pe_message: Transition 38: PEngine Input stored in: /var/lib/pengine/pe-input-93.bz2
Jun  5 15:11:58 server1 kernel: [ 2146.278476] block drbd0: PingAck did not arrive in time.
Jun  5 15:11:58 server1 kernel: [ 2146.278488] block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) 
Jun  5 15:11:58 server1 kernel: [ 2146.278686] block drbd0: asender terminated
Jun  5 15:11:58 server1 kernel: [ 2146.278693] block drbd0: Terminating drbd0_asender
Jun  5 15:11:58 server1 kernel: [ 2146.278771] block drbd0: Connection closed
Jun  5 15:11:58 server1 kernel: [ 2146.278849] block drbd0: conn( NetworkFailure -> Unconnected ) 
Jun  5 15:11:58 server1 kernel: [ 2146.278860] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
Jun  5 15:11:58 server1 kernel: [ 2146.278864] block drbd0: receiver terminated
Jun  5 15:11:58 server1 kernel: [ 2146.278868] block drbd0: Restarting drbd0_receiver
Jun  5 15:11:58 server1 kernel: [ 2146.278872] block drbd0: receiver (re)started
Jun  5 15:11:58 server1 kernel: [ 2146.278881] block drbd0: conn( Unconnected -> WFConnection ) 
Jun  5 15:11:58 server1 crm-fence-peer.sh[9353]: invoked for r0
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: - <cib admin_epoch="0" epoch="52" num_updates="6" />
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: + <cib epoch="53" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.5" update-origin="server1" update-client="crm_resource" cib-last-written="Thu Jun  5 15:10:26 2014" have-quorum="0" dc-uuid="server1" >
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: +   <configuration >
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: +     <constraints >
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: +       <rsc_location rsc="Clu-FS-DRBD-Master" id="drbd-fence-by-handler-Clu-FS-DRBD-Master" __crm_diff_marker__="added:top" >
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: +         <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-rule-Clu-FS-DRBD-Master" >
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: +           <expression attribute="#uname" operation="ne" value="server1" id="drbd-fence-by-handler-expr-Clu-FS-DRBD-Master" />
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: +         </rule>
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: +       </rsc_location>
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: +     </constraints>
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: +   </configuration>
Jun  5 15:11:59 server1 cib: [1595]: info: cib:diff: + </cib>
Jun  5 15:11:59 server1 cib: [1595]: info: cib_process_request: Operation complete: op cib_create for section constraints (origin=local/cibadmin/2, version=0.53.1): ok (rc=0)
Jun  5 15:11:59 server1 crmd: [1600]: info: abort_transition_graph: te_update_diff:124 - Triggered transition abort (complete=0, tag=diff, id=(null), magic=NA, cib=0.53.1) : Non-status change
Jun  5 15:11:59 server1 crmd: [1600]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000
Jun  5 15:11:59 server1 crmd: [1600]: info: update_abort_priority: Abort action done superceeded by restart
Jun  5 15:11:59 server1 crm-fence-peer.sh[9353]: INFO peer is reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-Clu-FS-DRBD-Master'
Jun  5 15:11:59 server1 kernel: [ 2147.428617] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 4 (0x400)
Jun  5 15:11:59 server1 kernel: [ 2147.428623] block drbd0: fence-peer helper returned 4 (peer was fenced)
Jun  5 15:11:59 server1 kernel: [ 2147.428632] block drbd0: pdsk( DUnknown -> Outdated ) 
Jun  5 15:11:59 server1 kernel: [ 2147.428680] block drbd0: new current UUID C7AE32BDEB8201AF:41DEB2849956CF9F:CE91A410F5C9F940:CE90A410F5C9F940
Jun  5 15:11:59 server1 kernel: [ 2147.428861] block drbd0: susp( 1 -> 0 ) 
Jun  5 15:12:05 server1 lrmd: [1596]: WARN: xenwin7:start process (PID 9270) timed out (try 1).  Killing with signal SIGTERM (15).
Jun  5 15:12:05 server1 lrmd: [1596]: WARN: operation start[41] on xenwin7 for client 1600: pid 9270 timed out
Jun  5 15:12:05 server1 crmd: [1600]: ERROR: process_lrm_event: LRM operation xenwin7_start_0 (41) Timed Out (timeout=20000ms)
Jun  5 15:12:05 server1 crmd: [1600]: WARN: status_from_rc: Action 40 (xenwin7_start_0) on server1 failed (target: 0 vs. rc: -2): Error
Jun  5 15:12:05 server1 crmd: [1600]: WARN: update_failcount: Updating failcount for xenwin7 on server1 after failed start: rc=-2 (update=INFINITY, time=1401961325)
Jun  5 15:12:05 server1 crmd: [1600]: info: abort_transition_graph: match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=xenwin7_last_failure_0, magic=2:-2;40:38:0:43add4e5-6270-43de-8ca9-8a4939271b5b, cib=0.53.2) : Event failed
Jun  5 15:12:05 server1 crmd: [1600]: info: match_graph_event: Action xenwin7_start_0 (40) confirmed on server1 (rc=4)
Jun  5 15:12:05 server1 crmd: [1600]: info: run_graph: ====================================================
Jun  5 15:12:05 server1 crmd: [1600]: notice: run_graph: Transition 38 (Complete=1, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-93.bz2): Stopped
Jun  5 15:12:05 server1 crmd: [1600]: info: te_graph_trigger: Transition 38 is now complete
Jun  5 15:12:05 server1 crmd: [1600]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun  5 15:12:05 server1 crmd: [1600]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
Jun  5 15:12:05 server1 crmd: [1600]: info: do_pe_invoke: Query 153: Requesting the current CIB: S_POLICY_ENGINE
Jun  5 15:12:05 server1 attrd: [1597]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-xenwin7 (INFINITY)
Jun  5 15:12:05 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the PE: query=153, ref=pe_calc-dc-1401961325-163, seq=64, quorate=0
Jun  5 15:12:05 server1 pengine: [1599]: notice: unpack_config: On loss of CCM Quorum: Ignore
Jun  5 15:12:05 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed op xenwin7_last_failure_0 on server1: unknown exec error (-2)
Jun  5 15:12:05 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun  5 15:12:05 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun  5 15:12:05 server1 pengine: [1599]: notice: RecurringOp:  Start recurring monitor (40s) for xenwin7 on server1
Jun  5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-DRBD:0#011(Master server1)
Jun  5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-DRBD:1#011(Stopped)
Jun  5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-Mount:0#011(Started server1)
Jun  5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-Mount:1#011(Stopped)
Jun  5 15:12:05 server1 pengine: [1599]: notice: LogActions: Recover xenwin7#011(Started server1)
Jun  5 15:12:05 server1 crmd: [1600]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Jun  5 15:12:05 server1 crmd: [1600]: info: unpack_graph: Unpacked transition 39: 4 actions in 4 synapses
Jun  5 15:12:05 server1 crmd: [1600]: info: do_te_invoke: Processing graph 39 (ref=pe_calc-dc-1401961325-163) derived from /var/lib/pengine/pe-input-94.bz2
Jun  5 15:12:05 server1 crmd: [1600]: info: te_rsc_command: Initiating action 3: stop xenwin7_stop_0 on server1 (local)
Jun  5 15:12:05 server1 attrd: [1597]: notice: attrd_perform_update: Sent update 124: fail-count-xenwin7=INFINITY
Jun  5 15:12:05 server1 crmd: [1600]: info: do_lrm_rsc_op: Performing key=3:39:0:43add4e5-6270-43de-8ca9-8a4939271b5b op=xenwin7_stop_0 )
Jun  5 15:12:05 server1 attrd: [1597]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-xenwin7 (1401961325)
Jun  5 15:12:05 server1 lrmd: [1596]: info: rsc:xenwin7 stop[42] (pid 9401)
Jun  5 15:12:05 server1 crmd: [1600]: info: abort_transition_graph: te_update_diff:164 - Triggered transition abort (complete=0, tag=nvpair, id=status-server1-fail-count-xenwin7, name=fail-count-xenwin7, value=INFINITY, magic=NA, cib=0.53.3) : Transient attribute: update
Jun  5 15:12:05 server1 crmd: [1600]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000
Jun  5 15:12:05 server1 crmd: [1600]: info: update_abort_priority: Abort action done superceeded by restart
Jun  5 15:12:05 server1 attrd: [1597]: notice: attrd_perform_update: Sent update 126: last-failure-xenwin7=1401961325
Jun  5 15:12:05 server1 crmd: [1600]: info: abort_transition_graph: te_update_diff:164 - Triggered transition abort (complete=0, tag=nvpair, id=status-server1-last-failure-xenwin7, name=last-failure-xenwin7, value=1401961325, magic=NA, cib=0.53.4) : Transient attribute: update
Jun  5 15:12:05 server1 pengine: [1599]: notice: process_pe_message: Transition 39: PEngine Input stored in: /var/lib/pengine/pe-input-94.bz2
Jun  5 15:12:07 server1 kernel: [ 2155.458452] o2net: Connection to node server2 (num 1) at 10.0.0.2:7777 has been idle for 30.84 secs, shutting it down.
Jun  5 15:12:07 server1 kernel: [ 2155.458486] o2net: No longer connected to node server2 (num 1) at 10.0.0.2:7777
Jun  5 15:12:07 server1 kernel: [ 2155.458531] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -112 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:07 server1 kernel: [ 2155.458538] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:12:13 server1 kernel: [ 2160.562477] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:13 server1 kernel: [ 2160.562484] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:12:18 server1 kernel: [ 2165.666468] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:18 server1 kernel: [ 2165.666475] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:12:23 server1 kernel: [ 2170.770473] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:23 server1 kernel: [ 2170.770481] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:12:25 server1 lrmd: [1596]: WARN: xenwin7:stop process (PID 9401) timed out (try 1).  Killing with signal SIGTERM (15).
Jun  5 15:12:25 server1 lrmd: [1596]: WARN: operation stop[42] on xenwin7 for client 1600: pid 9401 timed out
Jun  5 15:12:25 server1 crmd: [1600]: ERROR: process_lrm_event: LRM operation xenwin7_stop_0 (42) Timed Out (timeout=20000ms)
Jun  5 15:12:25 server1 crmd: [1600]: WARN: status_from_rc: Action 3 (xenwin7_stop_0) on server1 failed (target: 0 vs. rc: -2): Error
Jun  5 15:12:25 server1 crmd: [1600]: WARN: update_failcount: Updating failcount for xenwin7 on server1 after failed stop: rc=-2 (update=INFINITY, time=1401961345)
Jun  5 15:12:25 server1 crmd: [1600]: info: abort_transition_graph: match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=xenwin7_last_failure_0, magic=2:-2;3:39:0:43add4e5-6270-43de-8ca9-8a4939271b5b, cib=0.53.5) : Event failed
Jun  5 15:12:25 server1 crmd: [1600]: info: match_graph_event: Action xenwin7_stop_0 (3) confirmed on server1 (rc=4)
Jun  5 15:12:25 server1 crmd: [1600]: info: run_graph: ====================================================
Jun  5 15:12:25 server1 crmd: [1600]: notice: run_graph: Transition 39 (Complete=1, Pending=0, Fired=0, Skipped=3, Incomplete=0, Source=/var/lib/pengine/pe-input-94.bz2): Stopped
Jun  5 15:12:25 server1 crmd: [1600]: info: te_graph_trigger: Transition 39 is now complete
Jun  5 15:12:25 server1 crmd: [1600]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun  5 15:12:25 server1 crmd: [1600]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
Jun  5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke: Query 155: Requesting the current CIB: S_POLICY_ENGINE
Jun  5 15:12:25 server1 attrd: [1597]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-xenwin7 (1401961345)
Jun  5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the PE: query=155, ref=pe_calc-dc-1401961345-165, seq=64, quorate=0
Jun  5 15:12:25 server1 attrd: [1597]: notice: attrd_perform_update: Sent update 128: last-failure-xenwin7=1401961345
Jun  5 15:12:25 server1 pengine: [1599]: notice: unpack_config: On loss of CCM Quorum: Ignore
Jun  5 15:12:25 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed op xenwin7_last_failure_0 on server1: unknown exec error (-2)
Jun  5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun  5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun  5 15:12:25 server1 pengine: [1599]: WARN: common_apply_stickiness: Forcing xenwin7 away from server1 after 1000000 failures (max=1000000)
Jun  5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-DRBD:0#011(Master server1)
Jun  5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-DRBD:1#011(Stopped)
Jun  5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-Mount:0#011(Started server1)
Jun  5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-Mount:1#011(Stopped)
Jun  5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave   xenwin7#011(Started unmanaged)
Jun  5 15:12:25 server1 crmd: [1600]: info: abort_transition_graph: te_update_diff:164 - Triggered transition abort (complete=1, tag=nvpair, id=status-server1-last-failure-xenwin7, name=last-failure-xenwin7, value=1401961345, magic=NA, cib=0.53.6) : Transient attribute: update
Jun  5 15:12:25 server1 crmd: [1600]: info: handle_response: pe_calc calculation pe_calc-dc-1401961345-165 is obsolete
Jun  5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke: Query 156: Requesting the current CIB: S_POLICY_ENGINE
Jun  5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the PE: query=156, ref=pe_calc-dc-1401961345-166, seq=64, quorate=0
Jun  5 15:12:25 server1 pengine: [1599]: notice: process_pe_message: Transition 40: PEngine Input stored in: /var/lib/pengine/pe-input-95.bz2
Jun  5 15:12:25 server1 pengine: [1599]: notice: unpack_config: On loss of CCM Quorum: Ignore
Jun  5 15:12:25 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed op xenwin7_last_failure_0 on server1: unknown exec error (-2)
Jun  5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun  5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun  5 15:12:25 server1 pengine: [1599]: WARN: common_apply_stickiness: Forcing xenwin7 away from server1 after 1000000 failures (max=1000000)
Jun  5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-DRBD:0#011(Master server1)
Jun  5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-DRBD:1#011(Stopped)
Jun  5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-Mount:0#011(Started server1)
Jun  5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave   Clu-FS-Mount:1#011(Stopped)
Jun  5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave   xenwin7#011(Started unmanaged)
Jun  5 15:12:25 server1 crmd: [1600]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Jun  5 15:12:25 server1 crmd: [1600]: info: unpack_graph: Unpacked transition 41: 0 actions in 0 synapses
Jun  5 15:12:25 server1 crmd: [1600]: info: do_te_invoke: Processing graph 41 (ref=pe_calc-dc-1401961345-166) derived from /var/lib/pengine/pe-input-96.bz2
Jun  5 15:12:25 server1 crmd: [1600]: info: run_graph: ====================================================
Jun  5 15:12:25 server1 crmd: [1600]: notice: run_graph: Transition 41 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-96.bz2): Complete
Jun  5 15:12:25 server1 crmd: [1600]: info: te_graph_trigger: Transition 41 is now complete
Jun  5 15:12:25 server1 crmd: [1600]: info: notify_crmd: Transition 41 status: done - <null>
Jun  5 15:12:25 server1 crmd: [1600]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun  5 15:12:25 server1 crmd: [1600]: info: do_state_transition: Starting PEngine Recheck Timer
Jun  5 15:12:25 server1 pengine: [1599]: notice: process_pe_message: Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-96.bz2
Jun  5 15:12:28 server1 kernel: [ 2175.874477] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:28 server1 kernel: [ 2175.874485] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:12:33 server1 kernel: [ 2180.978498] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:33 server1 kernel: [ 2180.978506] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:12:38 server1 kernel: [ 2185.538465] o2net: No connection established with node 1 after 30.0 seconds, giving up.
Jun  5 15:12:38 server1 kernel: [ 2186.082473] (xend,9339,0):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:38 server1 kernel: [ 2186.082480] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:12:43 server1 kernel: [ 2191.186466] (xend,9339,0):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:43 server1 kernel: [ 2191.186474] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:12:44 server1 kernel: [ 2191.603442] (pool,9480,3):dlm_do_master_request:1332 ERROR: link to 1 went down!
Jun  5 15:12:44 server1 kernel: [ 2191.603449] (pool,9480,3):dlm_get_lock_resource:917 ERROR: status = -107
Jun  5 15:12:48 server1 kernel: [ 2196.290472] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:48 server1 kernel: [ 2196.290480] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:12:53 server1 kernel: [ 2201.394470] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:53 server1 kernel: [ 2201.394477] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:12:58 server1 kernel: [ 2206.498469] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1
Jun  5 15:12:58 server1 kernel: [ 2206.498476] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:13:00 server1 kernel: [ 2207.550684] o2cb: o2dlm has evicted node 1 from domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:13:01 server1 kernel: [ 2208.562466] o2dlm: Waiting on the recovery of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:13:03 server1 kernel: [ 2211.434473] o2dlm: Begin recovery on domain F18CB82626444DD0913312B7AE741C5B for node 1
Jun  5 15:13:03 server1 kernel: [ 2211.434501] o2dlm: Node 0 (me) is the Recovery Master for the dead node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:13:03 server1 kernel: [ 2211.434597] o2dlm: End recovery on domain F18CB82626444DD0913312B7AE741C5B
Jun  5 15:13:04 server1 kernel: [ 2211.602493] (pool,9480,3):dlm_restart_lock_mastery:1221 ERROR: node down! 1
Jun  5 15:13:04 server1 kernel: [ 2211.602502] (pool,9480,3):dlm_wait_for_lock_mastery:1038 ERROR: status = -11
Jun  5 15:13:05 server1 kernel: [ 2212.606674] ocfs2: Begin replay journal (node 1, slot 1) on device (147,0)
Jun  5 15:13:06 server1 kernel: [ 2214.350572] ocfs2: End replay journal (node 1, slot 1) on device (147,0)
Jun  5 15:13:06 server1 kernel: [ 2214.360790] ocfs2: Beginning quota recovery on device (147,0) for slot 1
Jun  5 15:13:06 server1 kernel: [ 2214.386783] ocfs2: Finishing quota recovery on device (147,0) for slot 1
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/4/768
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/4/5632
Jun  5 15:13:07 server1 kernel: [ 2214.638622] device tap4.0 entered promiscuous mode
Jun  5 15:13:07 server1 kernel: [ 2214.638685] xenbr1: port 2(tap4.0) entering forwarding state
Jun  5 15:13:07 server1 kernel: [ 2214.638699] xenbr1: port 2(tap4.0) entering forwarding state
Jun  5 15:13:07 server1 NetworkManager[887]:    SCPlugin-Ifupdown: devices added (path: /sys/devices/vif-4-0/net/vif4.0, iface: vif4.0)
Jun  5 15:13:07 server1 NetworkManager[887]:    SCPlugin-Ifupdown: device added (path: /sys/devices/vif-4-0/net/vif4.0, iface: vif4.0): no ifupdown configuration found.
Jun  5 15:13:07 server1 NetworkManager[887]: <warn> failed to allocate link cache: (-10) Operation not supported
Jun  5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): carrier is OFF
Jun  5 15:13:07 server1 NetworkManager[887]: <error> [1401961387.118193] [nm-device-ethernet.c:456] real_update_permanent_hw_address(): (vif4.0): unable to read permanent MAC address (error 0)
Jun  5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): new Ethernet device (driver: 'vif' ifindex: 12)
Jun  5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): exported as /org/freedesktop/NetworkManager/Devices/6
Jun  5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): now managed
Jun  5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2]
Jun  5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): bringing up device.
Jun  5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): preparing device.
Jun  5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): deactivating device (reason 'managed') [2]
Jun  5 15:13:07 server1 NetworkManager[887]: <info> Unmanaged Device found; state CONNECTED forced. (see http://bugs.launchpad.net/bugs/191889)
Jun  5 15:13:07 server1 NetworkManager[887]: <info> Unmanaged Device found; state CONNECTED forced. (see http://bugs.launchpad.net/bugs/191889)
Jun  5 15:13:07 server1 NetworkManager[887]: <info> Added default wired connection 'Wired connection 5' for /sys/devices/vif-4-0/net/vif4.0
Jun  5 15:13:07 server1 kernel: [ 2214.659589] ADDRCONF(NETDEV_UP): vif4.0: link is not ready
Jun  5 15:13:07 server1 kernel: [ 2214.660699] ADDRCONF(NETDEV_UP): vif4.0: link is not ready
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: online type_if=vif XENBUS_PATH=backend/vif/4/0
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: add type_if=tap XENBUS_PATH=
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/768/node /dev/loop0 to xenstore.
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/768/physical-device 7:0 to xenstore.
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/768/hotplug-status connected to xenstore.
Jun  5 15:13:07 server1 kernel: [ 2214.842610] xenbr1: port 2(tap4.0) entering forwarding state
Jun  5 15:13:07 server1 kernel: [ 2214.852647] device vif4.0 entered promiscuous mode
Jun  5 15:13:07 server1 kernel: [ 2214.858373] ADDRCONF(NETDEV_UP): vif4.0: link is not ready
Jun  5 15:13:07 server1 kernel: [ 2214.861475] xenbr1: port 2(tap4.0) entering forwarding state
Jun  5 15:13:07 server1 kernel: [ 2214.861487] xenbr1: port 2(tap4.0) entering forwarding state
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge add for tap4.0, bridge xenbr1.
Jun  5 15:13:07 server1 NetworkManager[887]:    SCPlugin-Ifupdown: devices added (path: /sys/devices/virtual/net/tap4.0, iface: tap4.0)
Jun  5 15:13:07 server1 NetworkManager[887]:    SCPlugin-Ifupdown: device added (path: /sys/devices/virtual/net/tap4.0, iface: tap4.0): no ifupdown configuration found.
Jun  5 15:13:07 server1 NetworkManager[887]: <warn> /sys/devices/virtual/net/tap4.0: couldn't determine device driver; ignoring...
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge online for vif4.0, bridge xenbr1.
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: Writing backend/vif/4/0/hotplug-status connected to xenstore.
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/5632/node /dev/loop1 to xenstore.
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/5632/physical-device 7:1 to xenstore.
Jun  5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/5632/hotplug-status connected to xenstore.
Jun  5 15:13:08 server1 avahi-daemon[898]: Joining mDNS multicast group on interface tap4.0.IPv6 with address fe80::fcff:ffff:feff:ffff.
Jun  5 15:13:08 server1 avahi-daemon[898]: New relevant interface tap4.0.IPv6 for mDNS.
Jun  5 15:13:08 server1 avahi-daemon[898]: Registering new address record for fe80::fcff:ffff:feff:ffff on tap4.0.*.
Jun  5 15:13:17 server1 kernel: [ 2225.202456] tap4.0: no IPv6 routers present


More information about the Pacemaker mailing list