[Pacemaker] abrupt power failure problem

Schaefer, Diane E diane.schaefer at unisys.com
Tue Jun 15 16:58:41 UTC 2010


Hi,
  We are having trouble with our two node cluster after one node experiences an abrupt power failure.  The resources do not seem to start on the remaining node (ie DRBD resources do not promote to master).  In the log we notice:

Jan  8 02:12:27 qpr4 stonithd: [6622]: info: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/ipmi reset qpr3' returned 256
Jan  8 02:12:27 qpr4 stonithd: [6622]: CRIT: external_reset_req: 'ipmi reset' for host qpr3 failed with rc 256
Jan  8 02:12:27 qpr4 stonithd: [5854]: info: failed to STONITH node qpr3 with local device stonith0 (exitcode 5), gonna try the next local device
Jan  8 02:12:27 qpr4 stonithd: [5854]: info: we can't manage qpr3, broadcast request to other nodes
Jan  8 02:13:27 qpr4 stonithd: [5854]: ERROR: Failed to STONITH the node qpr3: optype=RESET, op_result=TIMEOUT

Jan  8 02:13:27 qpr4 stonithd: [6763]: info: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/ipmi reset qpr3' returned 256
Jan  8 02:13:27 qpr4 stonithd: [6763]: CRIT: external_reset_req: 'ipmi reset' for host qpr3 failed with rc 256
Jan  8 02:13:27 qpr4 stonithd: [5854]: info: failed to STONITH node qpr3 with local device stonith0 (exitcode 5), gonna try the next local device
Jan  8 02:13:27 qpr4 stonithd: [5854]: info: we can't manage qpr3, broadcast request to other nodes
Jan  8 02:14:27 qpr4 stonithd: [5854]: ERROR: Failed to STONITH the node qpr3: optype=RESET, op_result=TIMEOUT

Our bootstrap options are:
cluster-infrastructure="Heartbeat" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="true" \
        cluster-delay="60s" \
        start-failure-is-fatal="false" \
        cluster-recheck-interval="15m" \

Some info from the hbreport is attached.

Thanks for any insight.
Diane Schaefer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100615/d74ac98c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stonith_error_1.tar.bz2
Type: application/octet-stream
Size: 86385 bytes
Desc: stonith_error_1.tar.bz2
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100615/d74ac98c/attachment-0003.obj>


More information about the Pacemaker mailing list