[Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

Digimer lists at alteeve.ca
Thu Jun 27 16:53:01 UTC 2013


On 06/26/2013 03:52 PM, Digimer wrote:
> This question appears to be the same issue asked here:
> 
> http://oss.clusterlabs.org/pipermail/pacemaker/2013-June/018650.html
> 
> In my case, I have two fence methods per node; IPMI first with
> action="reboot" and, if that fails, two PDUs (one backing each side of
> the node's redundant PSUs).
> 
> Initially I setup the PDUs as action "reboot" figuring that the
> fence_toplogy tied them together, so pacemaker would call "pdu1:port1;
> off -> pdu2:port1; off; (verify both are off) -> pdu1:port1; on ->
> pdu2:port1; on".
> 
> This didn't happen though. It called 'pdu1:port1; reboot' then
> "pdu2:port1; reboot", so the first PSU in the node had it's power back
> before the second PSU lost power, meaning the node never powered off.
> 
> So next I tried;
> 
> pdu1:port1; off -> pdu2:port1; off -> pdu1:port1; on -> pdu1:port1; on
> 
> However, this seemed to have actually done;
> 
> pdu1:port1; reboot -> pdu2:port1; reboot -> pdu1:port1; reboot ->
> pdu1:port1; reboot
> 
> So again, the node never lost power to both PSUs at the same time, so
> the node didn't power off.
> 
> This makes PDU fencing unreliable. I know beekhof said:
> 
>   "My point would be that action=off is not the correct way to configure
> what you're trying to do."
> 
> in the other thread, but there was no elaborating on what *is* the right
> way. So if neither approach works, what is the proper way for configure
> PDU fencing when you have two different PDUs backing either PSU?
> 
>   I don't want to disable "reboot" globally because I still want the
> IPMI based fencing to do action="reboot". If I just do "off", then the
> node will not power back on after a successful fence. This is better
> than nothing, but still quite sub-optimal.

So with the help of several people on IRC yesterday, I seem to have got
this working. The trick was to sub-out "action" for "pcmk_reboot_action"
("action" is ignored and the global action is used).

So the working configuration is (in crm syntax):

====
node $id="1" an-c03n01.alteeve.ca
node $id="2" an-c03n02.alteeve.ca
primitive fence_n01_ipmi stonith:fence_ipmilan \
        params ipaddr="an-c03n01.ipmi" pcmk_reboot_action="reboot"
login="admin" passwd="secret" pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n01_psu1_off stonith:fence_apc_snmp \
        params ipaddr="an-p01" pcmk_reboot_action="off" port="1"
pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n01_psu1_on stonith:fence_apc_snmp \
        params ipaddr="an-p01" pcmk_reboot_action="on" port="1"
pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n01_psu2_off stonith:fence_apc_snmp \
        params ipaddr="an-p02" pcmk_reboot_action="off" port="1"
pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n01_psu2_on stonith:fence_apc_snmp \
        params ipaddr="an-p02" pcmk_reboot_action="on" port="1"
pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n02_ipmi stonith:fence_ipmilan \
        params ipaddr="an-c03n02.ipmi" pcmk_reboot_action="reboot"
login="admin" passwd="secret" pcmk_host_list="an-c03n02.alteeve.ca" \
        meta target-role="Started"
primitive fence_n02_psu1_off stonith:fence_apc_snmp \
        params ipaddr="an-p01" pcmk_reboot_action="off" port="2"
pcmk_host_list="an-c03n02.alteeve.ca"
primitive fence_n02_psu1_on stonith:fence_apc_snmp \
        params ipaddr="an-p01" pcmk_reboot_action="on" port="2"
pcmk_host_list="an-c03n02.alteeve.ca"
primitive fence_n02_psu2_off stonith:fence_apc_snmp \
        params ipaddr="an-p02" pcmk_reboot_action="off" port="2"
pcmk_host_list="an-c03n02.alteeve.ca"
primitive fence_n02_psu2_on stonith:fence_apc_snmp \
        params ipaddr="an-p02" pcmk_reboot_action="on" port="2"
pcmk_host_list="an-c03n02.alteeve.ca"
location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca
location loc_fence_n01_psu1_off fence_n01_psu1_off -inf:
an-c03n01.alteeve.ca
location loc_fence_n01_psu1_on fence_n01_psu1_on -inf: an-c03n01.alteeve.ca
location loc_fence_n01_psu2_off fence_n01_psu2_off -inf:
an-c03n01.alteeve.ca
location loc_fence_n01_psu2_on fence_n01_psu2_on -inf: an-c03n01.alteeve.ca
location loc_fence_n02_ipmi fence_n02_ipmi -inf: an-c03n02.alteeve.ca
location loc_fence_n02_psu1_off fence_n02_psu1_off -inf:
an-c03n02.alteeve.ca
location loc_fence_n02_psu1_on fence_n02_psu1_on -inf: an-c03n02.alteeve.ca
location loc_fence_n02_psu2_off fence_n02_psu2_off -inf:
an-c03n02.alteeve.ca
location loc_fence_n02_psu2_on fence_n02_psu2_on -inf: an-c03n02.alteeve.ca
fencing_topology \
        an-c03n01.alteeve.ca: fence_n01_ipmi
fence_n01_psu1_off,fence_n01_psu2_off,fence_n01_psu1_on,fence_n01_psu2_on \
        an-c03n02.alteeve.ca: fence_n02_ipmi
fence_n02_psu1_off,fence_n02_psu2_off,fence_n02_psu1_on,fence_n02_psu2_on
property $id="cib-bootstrap-options" \
        dc-version="1.1.10-3.1733.a903e62.git.el7-a903e62" \
        cluster-infrastructure="corosync" \
        no-quorum-policy="ignore" \
        stonith-enabled="true"
====

Again, this is after just one test. I will want to test it several more
times before I consider it reliable. Ideally, I would love to hear
Andrew or others confirm this looks sane/correct.

The crm commands used to configure this were (edited, may contain typos):

====
crm configure primitive fence_n01_ipmi stonith:fence_ipmilan params
ipaddr="an-c03n01.ipmi" pcmk_reboot_action="reboot" login="admin"
passwd="secret" pcmk_host_list="an-c03n01.alteeve.ca"
crm configure primitive fence_n02_ipmi stonith:fence_ipmilan params
ipaddr="an-c03n02.ipmi" pcmk_reboot_action="reboot" login="admin"
passwd="secret" pcmk_host_list="an-c03n02.alteeve.ca"
crm configure primitive fence_n01_psu1_off stonith:fence_apc_snmp params
ipaddr="an-p01" pcmk_reboot_action="off" port="1"
pcmk_host_list="an-c03n01.alteeve.ca"
crm configure primitive fence_n01_psu2_off stonith:fence_apc_snmp params
ipaddr="an-p02" pcmk_reboot_action="off" port="1"
pcmk_host_list="an-c03n01.alteeve.ca"
crm configure primitive fence_n01_psu1_on stonith:fence_apc_snmp params
ipaddr="an-p01" pcmk_reboot_action="on" port="1"
pcmk_host_list="an-c03n01.alteeve.ca"
crm configure primitive fence_n01_psu2_on stonith:fence_apc_snmp params
ipaddr="an-p02" pcmk_reboot_action="on" port="1"
pcmk_host_list="an-c03n01.alteeve.ca"
crm configure primitive fence_n02_psu1_off stonith:fence_apc_snmp params
ipaddr="an-p01" pcmk_reboot_action="off" port="2"
pcmk_host_list="an-c03n02.alteeve.ca"
crm configure primitive fence_n02_psu2_off stonith:fence_apc_snmp params
ipaddr="an-p02" pcmk_reboot_action="off" port="2"
pcmk_host_list="an-c03n02.alteeve.ca"
crm configure primitive fence_n02_psu1_on stonith:fence_apc_snmp params
ipaddr="an-p01" pcmk_reboot_action="on" port="2"
pcmk_host_list="an-c03n02.alteeve.ca"
crm configure primitive fence_n02_psu2_on stonith:fence_apc_snmp params
ipaddr="an-p02" pcmk_reboot_action="on" port="2"
pcmk_host_list="an-c03n02.alteeve.ca"
crm configure location loc_fence_n01_ipmi fence_n01_ipmi -inf:
an-c03n01.alteeve.ca
crm configure location loc_fence_n02_ipmi fence_n02_ipmi -inf:
an-c03n02.alteeve.ca
crm configure location loc_fence_n01_psu1_off fence_n01_psu1_off -inf:
an-c03n01.alteeve.ca
crm configure location loc_fence_n01_psu2_off fence_n01_psu2_off -inf:
an-c03n01.alteeve.ca
crm configure location loc_fence_n01_psu1_on fence_n01_psu1_on -inf:
an-c03n01.alteeve.ca
crm configure location loc_fence_n01_psu2_on fence_n01_psu2_on -inf:
an-c03n01.alteeve.ca
crm configure location loc_fence_n02_psu1_off fence_n02_psu1_off -inf:
an-c03n02.alteeve.ca
crm configure location loc_fence_n02_psu2_off fence_n02_psu2_off -inf:
an-c03n02.alteeve.ca
crm configure location loc_fence_n02_psu1_on fence_n02_psu1_on -inf:
an-c03n02.alteeve.ca
crm configure location loc_fence_n02_psu2_on fence_n02_psu2_on -inf:
an-c03n02.alteeve.ca
crm configure fencing_topology an-c03n01.alteeve.ca: fence_n01_ipmi
fence_n01_psu1_off,fence_n01_psu2_off,fence_n01_psu1_on,fence_n01_psu2_on an-c03n02.alteeve.ca:
fence_n02_ipmi
fence_n02_psu1_off,fence_n02_psu2_off,fence_n02_psu1_on,fence_n02_psu2_on
crm configure property stonith-enabled="true"
====

And the raw cib.xml is:

====
<cib epoch="38" num_updates="0" admin_epoch="0"
validate-with="pacemaker-1.2" cib-last-written="Thu Jun 27 12:20:19
2013" update-origin="an-c03n01.alteeve.ca" update-client="cibadmin"
crm_feature_set="3.0.7" have-quorum="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.10-3.1733.a903e62.git.el7-a903e62"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="ignore"/>
        <nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="true"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="an-c03n01.alteeve.ca"/>
      <node id="2" uname="an-c03n02.alteeve.ca"/>
    </nodes>
    <resources>
      <primitive class="stonith" id="fence_n01_ipmi" type="fence_ipmilan">
        <instance_attributes id="fence_n01_ipmi-instance_attributes">
          <nvpair id="fence_n01_ipmi-instance_attributes-ipaddr"
name="ipaddr" value="an-c03n01.ipmi"/>
          <nvpair
id="fence_n01_ipmi-instance_attributes-pcmk_reboot_action"
name="pcmk_reboot_action" value="reboot"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-login"
name="login" value="admin"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-passwd"
name="passwd" value="secret"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-pcmk_host_list"
name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n02_ipmi" type="fence_ipmilan">
        <instance_attributes id="fence_n02_ipmi-instance_attributes">
          <nvpair id="fence_n02_ipmi-instance_attributes-ipaddr"
name="ipaddr" value="an-c03n02.ipmi"/>
          <nvpair
id="fence_n02_ipmi-instance_attributes-pcmk_reboot_action"
name="pcmk_reboot_action" value="reboot"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-login"
name="login" value="admin"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-passwd"
name="passwd" value="secret"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-pcmk_host_list"
name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
        </instance_attributes>
        <meta_attributes id="fence_n02_ipmi-meta_attributes">
          <nvpair id="fence_n02_ipmi-meta_attributes-target-role"
name="target-role" value="Started"/>
        </meta_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n01_psu1_off"
type="fence_apc_snmp">
        <instance_attributes id="fence_n01_psu1_off-instance_attributes">
          <nvpair id="fence_n01_psu1_off-instance_attributes-ipaddr"
name="ipaddr" value="an-p01"/>
          <nvpair
id="fence_n01_psu1_off-instance_attributes-pcmk_reboot_action"
name="pcmk_reboot_action" value="off"/>
          <nvpair id="fence_n01_psu1_off-instance_attributes-port"
name="port" value="1"/>
          <nvpair
id="fence_n01_psu1_off-instance_attributes-pcmk_host_list"
name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n01_psu1_on"
type="fence_apc_snmp">
        <instance_attributes id="fence_n01_psu1_on-instance_attributes">
          <nvpair id="fence_n01_psu1_on-instance_attributes-ipaddr"
name="ipaddr" value="an-p01"/>
          <nvpair
id="fence_n01_psu1_on-instance_attributes-pcmk_reboot_action"
name="pcmk_reboot_action" value="on"/>
          <nvpair id="fence_n01_psu1_on-instance_attributes-port"
name="port" value="1"/>
          <nvpair
id="fence_n01_psu1_on-instance_attributes-pcmk_host_list"
name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n01_psu2_off"
type="fence_apc_snmp">
        <instance_attributes id="fence_n01_psu2_off-instance_attributes">
          <nvpair id="fence_n01_psu2_off-instance_attributes-ipaddr"
name="ipaddr" value="an-p02"/>
          <nvpair
id="fence_n01_psu2_off-instance_attributes-pcmk_reboot_action"
name="pcmk_reboot_action" value="off"/>
          <nvpair id="fence_n01_psu2_off-instance_attributes-port"
name="port" value="1"/>
          <nvpair
id="fence_n01_psu2_off-instance_attributes-pcmk_host_list"
name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n01_psu2_on"
type="fence_apc_snmp">
        <instance_attributes id="fence_n01_psu2_on-instance_attributes">
          <nvpair id="fence_n01_psu2_on-instance_attributes-ipaddr"
name="ipaddr" value="an-p02"/>
          <nvpair
id="fence_n01_psu2_on-instance_attributes-pcmk_reboot_action"
name="pcmk_reboot_action" value="on"/>
          <nvpair id="fence_n01_psu2_on-instance_attributes-port"
name="port" value="1"/>
          <nvpair
id="fence_n01_psu2_on-instance_attributes-pcmk_host_list"
name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n02_psu1_off"
type="fence_apc_snmp">
        <instance_attributes id="fence_n02_psu1_off-instance_attributes">
          <nvpair id="fence_n02_psu1_off-instance_attributes-ipaddr"
name="ipaddr" value="an-p01"/>
          <nvpair
id="fence_n02_psu1_off-instance_attributes-pcmk_reboot_action"
name="pcmk_reboot_action" value="off"/>
          <nvpair id="fence_n02_psu1_off-instance_attributes-port"
name="port" value="2"/>
          <nvpair
id="fence_n02_psu1_off-instance_attributes-pcmk_host_list"
name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n02_psu1_on"
type="fence_apc_snmp">
        <instance_attributes id="fence_n02_psu1_on-instance_attributes">
          <nvpair id="fence_n02_psu1_on-instance_attributes-ipaddr"
name="ipaddr" value="an-p01"/>
          <nvpair
id="fence_n02_psu1_on-instance_attributes-pcmk_reboot_action"
name="pcmk_reboot_action" value="on"/>
          <nvpair id="fence_n02_psu1_on-instance_attributes-port"
name="port" value="2"/>
          <nvpair
id="fence_n02_psu1_on-instance_attributes-pcmk_host_list"
name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n02_psu2_off"
type="fence_apc_snmp">
        <instance_attributes id="fence_n02_psu2_off-instance_attributes">
          <nvpair id="fence_n02_psu2_off-instance_attributes-ipaddr"
name="ipaddr" value="an-p02"/>
          <nvpair
id="fence_n02_psu2_off-instance_attributes-pcmk_reboot_action"
name="pcmk_reboot_action" value="off"/>
          <nvpair id="fence_n02_psu2_off-instance_attributes-port"
name="port" value="2"/>
          <nvpair
id="fence_n02_psu2_off-instance_attributes-pcmk_host_list"
name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n02_psu2_on"
type="fence_apc_snmp">
        <instance_attributes id="fence_n02_psu2_on-instance_attributes">
          <nvpair id="fence_n02_psu2_on-instance_attributes-ipaddr"
name="ipaddr" value="an-p02"/>
          <nvpair
id="fence_n02_psu2_on-instance_attributes-pcmk_reboot_action"
name="pcmk_reboot_action" value="on"/>
          <nvpair id="fence_n02_psu2_on-instance_attributes-port"
name="port" value="2"/>
          <nvpair
id="fence_n02_psu2_on-instance_attributes-pcmk_host_list"
name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
        </instance_attributes>
      </primitive>
    </resources>
    <constraints>
      <rsc_location id="loc_fence_n01_ipmi" node="an-c03n01.alteeve.ca"
rsc="fence_n01_ipmi" score="-INFINITY"/>
      <rsc_location id="loc_fence_n02_ipmi" node="an-c03n02.alteeve.ca"
rsc="fence_n02_ipmi" score="-INFINITY"/>
      <rsc_location id="loc_fence_n01_psu1_off"
node="an-c03n01.alteeve.ca" rsc="fence_n01_psu1_off" score="-INFINITY"/>
      <rsc_location id="loc_fence_n01_psu1_on"
node="an-c03n01.alteeve.ca" rsc="fence_n01_psu1_on" score="-INFINITY"/>
      <rsc_location id="loc_fence_n01_psu2_off"
node="an-c03n01.alteeve.ca" rsc="fence_n01_psu2_off" score="-INFINITY"/>
      <rsc_location id="loc_fence_n01_psu2_on"
node="an-c03n01.alteeve.ca" rsc="fence_n01_psu2_on" score="-INFINITY"/>
      <rsc_location id="loc_fence_n02_psu1_off"
node="an-c03n02.alteeve.ca" rsc="fence_n02_psu1_off" score="-INFINITY"/>
      <rsc_location id="loc_fence_n02_psu1_on"
node="an-c03n02.alteeve.ca" rsc="fence_n02_psu1_on" score="-INFINITY"/>
      <rsc_location id="loc_fence_n02_psu2_off"
node="an-c03n02.alteeve.ca" rsc="fence_n02_psu2_off" score="-INFINITY"/>
      <rsc_location id="loc_fence_n02_psu2_on"
node="an-c03n02.alteeve.ca" rsc="fence_n02_psu2_on" score="-INFINITY"/>
    </constraints>
    <fencing-topology>
      <fencing-level devices="fence_n01_ipmi" id="fencing" index="1"
target="an-c03n01.alteeve.ca"/>
      <fencing-level
devices="fence_n01_psu1_off,fence_n01_psu2_off,fence_n01_psu1_on,fence_n01_psu2_on"
id="fencing-3" index="2" target="an-c03n01.alteeve.ca"/>
      <fencing-level devices="fence_n02_ipmi" id="fencing-1" index="1"
target="an-c03n02.alteeve.ca"/>
      <fencing-level
devices="fence_n02_psu1_off,fence_n02_psu2_off,fence_n02_psu1_on,fence_n02_psu2_on"
id="fencing-4" index="2" target="an-c03n02.alteeve.ca"/>
    </fencing-topology>
  </configuration>
</cib>
====

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Pacemaker mailing list