[Pacemaker] Question about recovery policy after "Too many failures to fence"

Kazunori INOUE inouekazu at intellilink.co.jp
Wed Mar 27 04:45:34 EDT 2013


Hi,

I'm using pacemaker-1.1 (c7910371a5. the latest devel).

When fencing failed 10 times, S_TRANSITION_ENGINE state is kept.
(related: https://github.com/ClusterLabs/pacemaker/commit/e29d2f9)

How should I recover?  what kind of procedure should I make S_IDLE in?


Mar 27 15:34:34 dev2 crmd[17937]:   notice: tengine_stonith_callback:
Stonith operation 12/22:14:0:0927a8a0-8e09-494e-acf8-7fb273ca8c9e: Generic
Pacemaker error (-1001)
Mar 27 15:34:34 dev2 crmd[17937]:   notice: tengine_stonith_callback:
Stonith operation 12 for dev2 failed (Generic Pacemaker error): aborting
transition.
Mar 27 15:34:34 dev2 crmd[17937]:     info: abort_transition_graph:
tengine_stonith_callback:426 - Triggered transition abort (complete=0) :
Stonith failed
Mar 27 15:34:34 dev2 crmd[17937]:   notice: tengine_stonith_notify: Peer
dev2 was not terminated (st_notify_fence) by dev1 for dev2: Generic
Pacemaker error (ref=05f75ab8-34ae-4aae-bbc6-aa20dbfdc845) by client
crmd.17937
Mar 27 15:34:34 dev2 crmd[17937]:   notice: run_graph: Transition 14
(Complete=1, Pending=0, Fired=0, Skipped=8, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-2.bz2): Stopped
Mar 27 15:34:34 dev2 crmd[17937]:   notice: too_many_st_failures: Too many
failures to fence dev2 (11), giving up

$ crmadmin -S dev2
Status of crmd at dev2: S_TRANSITION_ENGINE (ok)

$ crm_mon
Last updated: Wed Mar 27 15:35:12 2013
Last change: Wed Mar 27 15:33:16 2013 via cibadmin on dev1
Stack: corosync
Current DC: dev2 (3232261523) - partition with quorum
Version: 1.1.10-1.el6-c791037
2 Nodes configured, unknown expected votes
3 Resources configured.


Node dev2 (3232261523): UNCLEAN (online)
Online: [ dev1 ]

 prmDummy       (ocf::pacemaker:Dummy): Started dev2 FAILED
 Resource Group: grpStonith1
     prmStonith1        (stonith:external/stonith-helper):      Started dev2
 Resource Group: grpStonith2
     prmStonith2        (stonith:external/stonith-helper):      Started dev1

Failed actions:
    prmDummy_monitor_10000 (node=dev2, call=23, rc=7, status=complete): not
running

----
Best Regards,
Kazunori INOUE





More information about the Pacemaker mailing list