[Pacemaker] PE ignores monitor failure of stonith:external/rackpdu

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Nov 2 12:18:30 UTC 2010


Hi,

On Tue, Nov 02, 2010 at 01:09:02PM +0100, Pavlos Parissis wrote:
> On 2 November 2010 13:02, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> [...snip...]
> 
> >
> > > > Definitely not. If you do the monitor action from the command
> > > > line does that also return the unexpected exit code:
> > > >
> > >
> > > from the code I pasted you can see it returned 1.
> >
> > There is a difference. stonith-ng (stonithd) is a daemon that
> > runs a perl script (fencing_legacy) which invokes stonith which
> > then invokes the plugin. A problem can occur in any of these
> > components. It's important to find out where.
> >
> > > > # stonith -t external/rackpdu community="empisteftiko"
> > > > names_oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.4" ... -lS
> > > >
> > > > Which pacemaker release do you run? I couldn't reproduce this
> > > > with a recent Pacemaker.
> > > >
> > >
> > > that it was on 1.1.3 and now I run 1.0.9.
> > > Do you want me to run the test on 1.0.9?
> >
> > Yes, please. 1.0.9 is still running the old, and well tested,
> > stonithd, so the result could be different.
> >
> >
> I have the pdu off because it stopped working anymore! As a result the
> resource is stopped.
> But I did the test I see that even rackpdu returns 1 on status stonithd
> reports 256

Ah, I understand what's going on now. It's a bug in the interface
to external plugins which was exposed by stonith-ng. It has been
fixed in August. The fix is here (in hg.linux-ha.org/glue):

changeset:   2427:b7df127fc09e
user:        Dejan Muhamedagic <dejan at hello-penguin.com>
date:        Thu Aug 12 14:01:10 2010 +0200
summary:     High: stonith: external: interpret properly exit codes from external stonith plugins (bnc#630357)

There hasn't been a glue release since then, but there should be
one fairly soon. Note that this affects only Pacemaker 1.1.

Thanks,

Dejan


> here is running stonith, remember pdu is off.
> 
> 
> [root at node-01 ~]# stonith -d -t external/rackpdu
> hostlist="node-01,node-02,node-03" pduip="192.168.100.100"
> community="empisteftiko" names_oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.4"  -l
> ** (process:8115): DEBUG: NewPILPluginUniv(0x8f690c8)
> ** (process:8115): DEBUG: PILS: Plugin path =
> /usr/lib/stonith/plugins:/usr/lib/heartbeat/plugins
> ** (process:8115): DEBUG: NewPILInterfaceUniv(0x8f69768)
> ** (process:8115): DEBUG: NewPILPlugintype(0x8f69a28)
> ** (process:8115): DEBUG: NewPILPlugin(0x8f69a40)
> ** (process:8115): DEBUG: NewPILInterface(0x8f69b50)
> ** (process:8115): DEBUG:
> NewPILInterface(0x8f69b50:InterfaceMgr/InterfaceMgr)*** user_data: 0x0
> *******
> ** (process:8115): DEBUG:
> InterfaceManager_plugin_init(0x8f69b50/InterfaceMgr)
> ** (process:8115): DEBUG: Registering Implementation manager for Interface
> type 'InterfaceMgr'
> ** (process:8115): DEBUG: PILS: Looking for InterfaceMgr/generic =>
> [/usr/lib/stonith/plugins/InterfaceMgr/generic.so]
> ** (process:8115): DEBUG: Plugin file
> /usr/lib/stonith/plugins/InterfaceMgr/generic.so does not exist
> ** (process:8115): DEBUG: PILS: Looking for InterfaceMgr/generic =>
> [/usr/lib/heartbeat/plugins/InterfaceMgr/generic.so]
> ** (process:8115): DEBUG: Plugin path for InterfaceMgr/generic =>
> [/usr/lib/heartbeat/plugins/InterfaceMgr/generic.so]
> ** (process:8115): DEBUG: PluginType InterfaceMgr already present
> ** (process:8115): DEBUG: Plugin InterfaceMgr/generic  init function:
> InterfaceMgr_LTX_generic_pil_plugin_init
> ** (process:8115): DEBUG: NewPILPlugin(0x8f6a1d8)
> ** (process:8115): DEBUG: Plugin InterfaceMgr/generic loaded and
> constructed.
> ** (process:8115): DEBUG: Calling init function in plugin
> InterfaceMgr/generic.
> ** (process:8115): DEBUG: NewPILInterface(0x8f69cd8)
> ** (process:8115): DEBUG:
> NewPILInterface(0x8f69cd8:InterfaceMgr/stonith2)*** user_data: 0x8f69b18
> *******
> ** (process:8115): DEBUG: Registering Implementation manager for Interface
> type 'stonith2'
> ** (process:8115): DEBUG: IfIncrRefCount(1 + 1 )
> ** (process:8115): DEBUG: PluginIncrRefCount(0 + 1 )
> ** (process:8115): DEBUG: IfIncrRefCount(1 + 100 )
> ** (process:8115): DEBUG: PILS: Looking for stonith2/external =>
> [/usr/lib/stonith/plugins/stonith2/external.so]
> ** (process:8115): DEBUG: Plugin path for stonith2/external =>
> [/usr/lib/stonith/plugins/stonith2/external.so]
> ** (process:8115): DEBUG: Creating PluginType for stonith2
> ** (process:8115): DEBUG: NewPILPlugintype(0x8f6a398)
> ** (process:8115): DEBUG: Plugin stonith2/external  init function:
> stonith2_LTX_external_pil_plugin_init
> ** (process:8115): DEBUG: NewPILPlugin(0x8f69d68)
> ** (process:8115): DEBUG: Plugin stonith2/external loaded and constructed.
> ** (process:8115): DEBUG: Calling init function in plugin stonith2/external.
> ** (process:8115): DEBUG: NewPILInterface(0x8f6a3b0)
> ** (process:8115): DEBUG: NewPILInterface(0x8f6a3b0:stonith2/external)***
> user_data: 0x9e9fbc *******
> ** (process:8115): DEBUG: IfIncrRefCount(101 + 1 )
> ** (process:8115): DEBUG: PluginIncrRefCount(0 + 1 )
> ** (process:8115): DEBUG: external_set_config: called.
> ** (process:8115): DEBUG: external_get_confignames: called.
> ** (process:8115): DEBUG: external_run_cmd: Calling
> '/usr/lib/stonith/plugins/external/rackpdu getconfignames'
> ** (process:8115): DEBUG: external_run_cmd:
> '/usr/lib/stonith/plugins/external/rackpdu getconfignames' output: hostlist
> pduip community
> 
> ** (process:8115): DEBUG: external_get_confignames: 'rackpdu getconfignames'
> returned 0
> ** (process:8115): DEBUG: plugin output: hostlist pduip community
> 
> ** (process:8115): DEBUG: external_get_confignames: rackpdu configname
> hostlist
> ** (process:8115): DEBUG: external_get_confignames: rackpdu configname pduip
> ** (process:8115): DEBUG: external_get_confignames: rackpdu configname
> community
> ** (process:8115): DEBUG: external_status: called.
> ** (process:8115): DEBUG: external_run_cmd: Calling
> '/usr/lib/stonith/plugins/external/rackpdu status'
> ** INFO: external_run_cmd: Calling
> '/usr/lib/stonith/plugins/external/rackpdu status' returned 256
> 
> ** (process:8115): CRITICAL **: external_status: 'rackpdu status' failed
> with rc 256
> ** (process:8115): DEBUG: external_getinfo: called.
> ** (process:8115): DEBUG: external_run_cmd: Calling
> '/usr/lib/stonith/plugins/external/rackpdu getinfo-devid'
> ** (process:8115): DEBUG: external_run_cmd:
> '/usr/lib/stonith/plugins/external/rackpdu getinfo-devid' output: rackpdu
> STONITH device
> 
> ** (process:8115): DEBUG: external_getinfo: 'rackpdu getinfo-devid' returned
> 0
> ** (process:8115): DEBUG: external_hostlist: called.
> ** (process:8115): DEBUG: external_run_cmd: Calling
> '/usr/lib/stonith/plugins/external/rackpdu gethosts'
> ** (process:8115): DEBUG: external_run_cmd:
> '/usr/lib/stonith/plugins/external/rackpdu gethosts' output: node-01
> node-02
> node-03
> 
> ** (process:8115): DEBUG: external_hostlist: running 'rackpdu gethosts'
> returned 0
> ** (process:8115): DEBUG: external_hostlist: rackpdu host node-01
> ** (process:8115): DEBUG: external_hostlist: rackpdu host node-02
> ** (process:8115): DEBUG: external_hostlist: rackpdu host node-03
> node-01
> node-02
> node-03
> ** (process:8115): DEBUG: external_destroy: called.
> ** (process:8115): DEBUG: IfIncrRefCount(1 + -1 )
> ** (process:8115): DEBUG: RemoveAPILInterface(0x8f6a3b0/external)
> ** (process:8115): DEBUG: RmAPILInterface(0x8f6a3b0/external)
> ** (process:8115): DEBUG: PILunregister_interface(stonith2/external)
> ** (process:8115): DEBUG: Calling InterfaceClose on stonith2/external
> ** (process:8115): DEBUG: IfIncrRefCount(102 + -1 )
> ** (process:8115): DEBUG: PluginIncrRefCount(1 + -1 )
> ** (process:8115): DEBUG: RemoveAPILPlugin(stonith2/external)
> ** (process:8115): DEBUG: RmAPILPlugin(stonith2/external)
> ** (process:8115): DEBUG: Closing dlhandle for (stonith2/external)
> ** (process:8115): DEBUG: RmAPILPluginType(stonith2)
> ** (process:8115): DEBUG: DelPILPluginType(stonith2)
> ** (process:8115): DEBUG: DelPILInterface(0x8f6a3b0/external)
> [root at node-01 ~]# stonith -t external/rackpdu
> hostlist="node-01,node-02,node-03" pduip="192.168.100.100"
> community="empisteftiko" names_oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.4"  -l
> ** INFO: external_run_cmd: Calling
> '/usr/lib/stonith/plugins/external/rackpdu status' returned 256
> 
> ** (process:8814): CRITICAL **: external_status: 'rackpdu status' failed
> with rc 256
> node-01
> node-02
> node-03
> 
> and invoke the rackpdu directly
> [root at node-01 ~]# /usr/lib/stonith/plugins/external/rackpdu status
> [root at node-01 ~]# echo $?
> 1
> 
> 
> in the following is the log when  I try to start the resource
> 
> Nov 02 12:55:58 node-01 crmd: [19385]: info: do_lrm_rsc_op: Performing
> key=108:59:0:569e2e9c-9272-4bd3-a262-b971cd349522 op=pdu_start_0 )
> Nov 02 12:55:58 node-01 lrmd: [19382]: info: rsc:pdu:27: start
> Nov 02 12:55:58 node-01 lrmd: [9248]: info: Try to start STONITH resource
> <rsc_id=pdu> : Device=external/rackpdu
> Nov 02 12:56:00 node-01 stonithd: [9254]: info: external_run_cmd: Calling
> '/usr/lib/stonith/plugins/external/rackpdu status' returned 256
> Nov 02 12:56:00 node-01 stonithd: [9254]: CRIT: external_status: 'rackpdu
> status' failed with rc 256
> Nov 02 12:56:00 node-01 stonithd: [19383]: WARN: start pdu failed, because
> its hostlist is empty
> Nov 02 12:56:00 node-01 crmd: [19385]: info: process_lrm_event: LRM
> operation pdu_start_0 (call=27, rc=1, cib-update=49, confirmed=true) unknown
> error
> Nov 02 12:56:03 node-01 attrd: [19384]: info: attrd_trigger_update: Sending
> flush op to all hosts for: fail-count-pdu (INFINITY)
> Nov 02 12:56:03 node-01 crmd: [19385]: info: do_lrm_rsc_op: Performing
> key=7:60:0:569e2e9c-9272-4bd3-a262-b971cd349522 op=pdu_stop_0 )
> Nov 02 12:56:03 node-01 lrmd: [19382]: info: rsc:pdu:28: stop
> Nov 02 12:56:03 node-01 lrmd: [9309]: info: Try to stop STONITH resource
> <rsc_id=pdu> : Device=external/rackpdu
> Nov 02 12:56:03 node-01 stonithd: [19383]: notice: try to stop a resource
> pdu who is not in started resource queue.
> Nov 02 12:56:03 node-01 crmd: [19385]: info: process_lrm_event: LRM
> operation pdu_stop_0 (call=28, rc=0, cib-update=50, confirmed=true) ok
> Nov 02 12:56:03 node-01 attrd: [19384]: info: attrd_perform_update: Sent
> update 300: fail-count-pdu=INFINITY
> Nov 02 12:56:03 node-01 attrd: [19384]: info: attrd_trigger_update: Sending
> flush op to all hosts for: last-failure-pdu (1288698962)
> Nov 02 12:56:03 node-01 attrd: [19384]: info: attrd_perform_update: Sent
> update 302: last-failure-pdu=1288698962
> Nov 02 12:56:04 node-01 lrmd: [19382]: info: rsc:pdu:29: start
> Nov 02 12:56:04 node-01 crmd: [19385]: info: do_lrm_rsc_op: Performing
> key=109:60:0:569e2e9c-9272-4bd3-a262-b971cd349522 op=pdu_start_0 )
> Nov 02 12:56:04 node-01 lrmd: [9311]: info: Try to start STONITH resource
> <rsc_id=pdu> : Device=external/rackpdu
> Nov 02 12:56:06 node-01 stonithd: [9316]: info: external_run_cmd: Calling
> '/usr/lib/stonith/plugins/external/rackpdu status' returned 256
> Nov 02 12:56:06 node-01 stonithd: [9316]: CRIT: external_status: 'rackpdu
> status' failed with rc 256
> Nov 02 12:56:06 node-01 stonithd: [19383]: WARN: start pdu failed, because
> its hostlist is empty
> Nov 02 12:56:06 node-01 crmd: [19385]: info: process_lrm_event: LRM
> operation pdu_start_0 (call=29, rc=1, cib-update=51, confirmed=true) unknown
> error
> Nov 02 12:56:08 node-01 attrd: [19384]: info: attrd_trigger_update: Sending
> flush op to all hosts for: last-failure-pdu (1288698969)
> Nov 02 12:56:08 node-01 crmd: [19385]: info: do_lrm_rsc_op: Performing
> key=7:61:0:569e2e9c-9272-4bd3-a262-b971cd349522 op=pdu_stop_0 )
> Nov 02 12:56:08 node-01 lrmd: [19382]: info: rsc:pdu:30: stop
> Nov 02 12:56:08 node-01 lrmd: [9358]: info: Try to stop STONITH resource
> <rsc_id=pdu> : Device=external/rackpdu
> Nov 02 12:56:08 node-01 stonithd: [19383]: notice: try to stop a resource
> pdu who is not in started resource queue.
> Nov 02 12:56:08 node-01 crmd: [19385]: info: process_lrm_event: LRM
> operation pdu_stop_0 (call=30, rc=0, cib-update=52, confirmed=true) ok
> Nov 02 12:56:08 node-01 attrd: [19384]: info: attrd_perform_update: Sent
> update 304: last-failure-pdu=1288698969
> Nov 02 12:56:34 node-01 crmd: [19385]: info: do_lrm_invoke: Removing
> resource pdu from the LRM
> Nov 02 12:56:34 node-01 crmd: [19385]: info: do_lrm_invoke: Resource 'pdu'
> deleted for 9638_crm_resource on node-01
> Nov 02 12:56:34 node-01 crmd: [19385]: info: notify_deleted: Notifying
> 9638_crm_resource on node-01 that pdu was deleted
> Nov 02 12:56:34 node-01 crmd: [19385]: info: send_direct_ack: ACK'ing
> resource op pdu_delete_60000 from 0:0:crm-resource-9638:
> lrm_invoke-lrmd-1288698994-27
> 
> 
> conf bit
> primitive pdu stonith:external/rackpdu \
>         params community="empisteftiko"
> names_oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.4"
> oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.3" hostlist="node-01,node-02,node-03"
> pduip="192.168.100.100" stonith-timeout="30" \
>         op monitor interval="1m" timeout="60s" \
>         meta target-role="Stopped"

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list