[Pacemaker] crm resource doesn´t move after hardware crash

Tue Apr 1 09:50:52 UTC 2014

hi,

the kvm guest are different kvm host.

2014-03-24 0:30 GMT+01:00 Andrew Beekhof <andrew at beekhof.net>:

>
> On 21 Mar 2014, at 11:11 pm, Beo Banks <beo.banks at googlemail.com> wrote:
>
> > yap and that´s my issue.
> >
> > stonith is very powerfull but how can the cluster handle hardware
> failure?
>
> by connecting to the switch that supplies power to said hardware
> exactly the reason devices like fence_virsh and external/ssh are not
> considered reliable.
>
> are both these VMs running on the same physical hardware?
>
> >
> > primitive stonith-linux01 stonith:fence_virsh \
> >         params pcmk_host_list="linux01" pcmk_host_check="dynamic-list"
> pcmk_host_map="linux01:linux01" action="reboot" ipaddr="XXXXXX"
> secure="true" login="root" identity_file="/root/.ssh/id_rsa"
> debug="/var/log/stonith.log" verbose="false" \
>
> you dont need the host map if the name and value (name:value) are the same
>
> >         op monitor interval="300s" \
> >         op start interval="0" timeout="60s" \
> >         meta failure-timeout="180s"
> > primitive stonith-linux02 stonith:fence_virsh \
> >         params pcmk_host_list="linux02" pcmk_host_check="dynamic-list"
> pcmk_host_map="linux02:linux02" action="reboot" ipaddr="XXXXX"
> secure="true" login="root" identity_file="/root/.ssh/id_rsa" delay="5"
> debug="/var/log/stonith.log" verbose="false" \
> >         op monitor interval="60s" \
> >         op start interval="0" timeout="60s" \
> >         meta failure-timeout="180s"
> >
> >
> >
> >
> > 2014-03-18 13:54 GMT+01:00 emmanuel segura <emi2fast at gmail.com>:
> > do you have stonith configured?
> >
> >
> > 2014-03-18 13:07 GMT+01:00 Alex Samad - Yieldbroker <
> Alex.Samad at yieldbroker.com>:
> > Im not expert but
> >
> >
> >
> > Current DC: linux02 - partition WITHOUT quorum
> > Version: 1.1.10-14.el6_5.2-368c726
> > 2 Nodes configured, 2 expected votes
> >
> >
> >
> >
> > I think your 2nd node can't make quorum, there is some special config
> for 2 node cluster to allow nodes to make quorum with 1 vote..
> >
> >
> >
> > A
> >
> >
> >
> > From: Beo Banks [mailto:beo.banks at googlemail.com]
> > Sent: Tuesday, 18 March 2014 10:06 PM
> > To: pacemaker at oss.clusterlabs.org
> > Subject: [Pacemaker] crm resource doesn´t move after hardware crash
> >
> >
> >
> > hi,
> >
> > i have a hardware crash in a two-node drbd cluster.
> >
> > the active node has a hardware failure is actual down.
> >
> > i am wondering that my 2nd doesn´t migrate/move the resource.
> >
> > the 2nd node want´s to fence the device but that´s not possible (it´s
> down)
> >
> >
> > how can i enable the services on the last "good" node?
> >
> > and how can i optimize my config to handle that kind of error?
> >
> > crm status
> >
> > Last updated: Tue Mar 18 12:01:07 2014
> > Last change: Tue Mar 18 11:28:22 2014 via crmd on linux02
> > Stack: classic openais (with plugin)
> > Current DC: linux02 - partition WITHOUT quorum
> > Version: 1.1.10-14.el6_5.2-368c726
> > 2 Nodes configured, 2 expected votes
> > 21 Resources configured
> >
> >
> > Node linux01: UNCLEAN (offline)
> > Online: [ linux02 ]
> >
> >  Resource Group: mysql
> >      mysql_fs   (ocf::heartbeat:Filesystem):    Started linux01
> >      mysql_ip   (ocf::heartbeat:IPaddr2):       Started linux01
> >
> > .... and so on
> >
> >
> >
> > cluster.log
> >
> >
> > Mar 18 11:54:43 [2234] linux02       crmd:   notice:
> tengine_stonith_callback:      Stonith operation 17 for linux01 failed
> (Timer expired): aborting transition.
> > Mar 18 11:54:43 [2234] linux02       crmd:     info:
> abort_transition_graph:        tengine_stonith_callback:463 - Triggered
> transition abort (complete=0) : Stonith failed
> > Mar 18 11:54:43 [2234] linux02       crmd:   notice: run_graph:
> Transition 15 (Complete=9, Pending=0, Fired=0, Skipped=36, Incomplete=19,
> Source=/var/lib/pacemaker/pengine/pe-warn-63.bz2): Stopped
> > Mar 18 11:54:43 [2234] linux02       crmd:   notice:
> too_many_st_failures:  Too many failures to fence linux01 (16), giving up
> > Mar 18 11:54:43 [2234] linux02       crmd:     info: do_log:        FSA:
> Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
> > Mar 18 11:54:43 [2234] linux02       crmd:   notice:
> do_state_transition:   State transition S_TRANSITION_ENGINE -> S_IDLE [
> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> > Mar 18 11:54:43 [2230] linux02 stonith-ng:     info: stonith_command:
>     Processed st_notify reply from linux02: OK (0)
> > Mar 18 11:54:43 [2234] linux02       crmd:   notice:
> tengine_stonith_notify:        Peer linux01 was not terminated (reboot) by
> linux02 for linux02: Timer expired
> (ref=7939b264-699c-4d00-a89c-07e7e0193a80) by client crmd.2234
> > Mar 18 11:54:44 [2229] linux02        cib:     info: crm_client_new:
>    Connecting 0x155ac00 for uid=0 gid=0 pid=23360
> id=b88b2690-0c3f-48ac-b8b4-3a47b7f9114a
> > Mar 18 11:54:44 [2229] linux02        cib:     info:
> cib_process_request:   Completed cib_query operation for section 'all': OK
> (rc=0, origin=local/crm_mon/2, version=0.125.2)
> > Mar 18 11:54:44 [2229] linux02        cib:     info: crm_client_destroy:
>    Destroying 0 events
> > Mar 18 11:55:03 [2229] linux02        cib:     info: crm_client_new:
>    Connecting 0x155ac00 for uid=0 gid=0 pid=23415
> id=62e7a9d8-588e-427f-8178-85febce00151
> > Mar 18 11:55:03 [2229] linux02        cib:     info: crm_client_new:
>    Connecting 0x1585de0 for uid=0 gid=0 pid=23416
> id=79795042-699b-4347-abcb-4c7c96ed2291
> > Mar 18 11:55:03 [2229] linux02        cib:     info:
> cib_process_request:   Completed cib_query operation for section nodes: OK
> (rc=0, origin=local/crm_attribute/2, version=0.125.2)
> > Mar 18 11:55:03 [2229] linux02        cib:     info:
> cib_process_request:   Completed cib_query operation for section nodes: OK
> (rc=0, origin=local/crm_attribute/2, version=0.125.2)
> > Mar 18 11:55:03 [2229] linux02        cib:     info: crm_client_destroy:
>    Destroying 0 events
> > Mar 18 11:55:03 [2229] linux02        cib:     info: crm_client_destroy:
>    Destroying 0 events
> > Mar 18 11:55:43 [2230] linux02 stonith-ng:    error: remote_op_done:
>    Already sent notifications for 'reboot of linux01 by linux02'
> (for=crmd.2234 at linux02.7939b264, state=4): Timer expired
> > Mar 18 11:55:59 [2229] linux02        cib:     info: crm_client_new:
>    Connecting 0x155ac00 for uid=0 gid=0 pid=23468
> id=8dea3cab-9103-42fc-9747-76018c4a0500
> > Mar 18 11:55:59 [2229] linux02        cib:     info:
> cib_process_request:   Completed cib_query operation for section 'all': OK
> (rc=0, origin=local/crm_mon/2, version=0.125.2)
> > Mar 18 11:55:59 [2229] linux02        cib:     info: crm_client_destroy:
>    Destroying 0 events
> > Mar 18 11:56:03 [2229] linux02        cib:     info: crm_client_new:
>    Connecting 0x155ac00 for uid=0 gid=0 pid=23523
> id=b681390a-51a3-4d68-abf1-514ee8ab9351
> > Mar 18 11:56:03 [2229] linux02        cib:     info: crm_client_new:
>    Connecting 0x1585de0 for uid=0 gid=0 pid=23524
> id=005421e4-b079-4a16-b4cc-0fc2c8c73246
> > Mar 18 11:56:03 [2229] linux02        cib:     info:
> cib_process_request:   Completed cib_query operation for section nodes: OK
> (rc=0, origin=local/crm_attribute/2, version=0.125.2)
> > Mar 18 11:56:03 [2229] linux02        cib:     info:
> cib_process_request:   Completed cib_query operation for section nodes: OK
> (rc=0, origin=local/crm_attribute/2, version=0.125.2)
> > Mar 18 11:56:03 [2229] linux02        cib:     info: crm_client_destroy:
>    Destroying 0 events
> > Mar 18 11:56:03 [2229] linux02        cib:     info: crm_client_destroy:
>    Destroying 0 events
> >
> > thanks
> >
> > beo
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> >
> > --
> > esta es mi vida e me la vivo hasta que dios quiera
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140401/ba0905f3/attachment-0003.html>