[Pacemaker] Pacemaker fails to switch on or off PDU sockets with fence_wti

Thibaut Pouzet thibaut.pouzet at lyra-network.com
Fri Jun 21 03:38:13 EDT 2013


Le 20/06/2013 12:23, Andrew Beekhof a écrit :
> On 20/06/2013, at 6:51 PM, Thibaut Pouzet <thibaut.pouzet at lyra-network.com> wrote:
>
>> Le 19/06/2013 23:57, Andrew Beekhof a écrit :
>>> On 20/06/2013, at 1:57 AM, Thibaut Pouzet <thibaut.pouzet at lyra-network.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to configure fencing on a test platform with two nodes under corosync+cman+pacemaker on CentOS 6.4. Both nodes have a double power supply from a WTI NPS-8HD16-3. IPMI fencing works like a charm, however I cannot get the WTI fencing to work.
>>>>
>>>> The problem is that the parameter  action="" seems to be ignored by pacemaker.
>>>> * This is the primitive :
>>>> primitive wti_fence02_port2_off stonith:fence_wti \
>>>>         params ipaddr="" action="off" pcmk_host_check="none" port="A2" pcmk_host_check="static-list" pcmk_host_list="fence02.lyra-network.com" login="" passwd="" shell_timeout="20" login_timeout="20"
>>>>
>>>> * These are the corresponding log lines :
>>>> Jun 19 16:56:45 fence01 stonith-ng[19266]:   notice: log_operation: Operation 'reboot' [19953] (call 0 from crmd.19268) for host 'fence02.lyra-network.com' with device 'wti_fence02_port2_off' returned: 0 (OK)
>>>> Jun 19 16:56:45 fence01 stonith-ng[19266]:   notice: process_remote_stonith_exec: Call to wti_fence02_port2_off for fence02.lyra-network.com on behalf of crmd.19268 at fence01.lyra-network.com: passed (0)
>>>>
>>>> * These are the version used :
>>>> pacemaker-1.1.8-7.el6.x86_64
>>>> corosync-1.4.1-15.el6.x86_64
>>>> cman-3.0.12.1-49.el6.x86_64
>>>> fence-agents-3.1.5-25.el6_4.2.x86_64
>>>>
>>>> The same thing happens with "on" actions.
>>>>
>>>> When I run fence_wti from command line, it works perfectly fine with ON or OFF actions ! I feel there is a workaround with something like pcmk_reboot_action="/ON", but I don't understand how to use this...
>>>>
>>>> (FYI, I'm using fencing topology like this :
>>>> fencing_topology \
>>>>         fence01.lyra-network.com: wti_fence01_port1_off,wti_fence01_port5_off,wti_fence01_port5_on,wti_fence01_port1_on ipmi_fence01 \
>>>>         fence02.lyra-network.com: wti_fence02_port2_off,wti_fence02_port6_off,wti_fence02_port6_on,wti_fence02_port2_on ipmi_fence02 )
>>>>
>>>> What is wrong here ?
>>> I believe you're trying to use the per-agent pcmk_reboot_action option (man stonithd)
>>> But you might be better off with the global stonith-action option (man pengine)
>>>
>> Hum, I think I've not been clear enough on the initial e-mail. The usage of "pcmk_reboot_action" or "stonith-action" is not the root of my problem. The initial problem is that when I configure action="off"
> My point would be that action=off is not the correct way to configure what you're trying to do.
>
>> with a stonith primitive,  when this primitive is called, the actual action that is launched through fence_wti is "reboot".
>>
>> Therefore, when a node needs to be fenced, instead of having on the PDU :
>> Port 2 OFF -> Port 6 OFF -> Port 6 ON -> Port 2 ON
>> I have :
>> Port 2 Reboot -> Port 6 Reboot -> Port 6 Reboot -> Port 2 Reboot
>>
>> All actions are successful, pacemaker changes the fenced node's status from "UNCLEAN" to "OFFLINE", while the node has not been rebooted at all.
>>
>> -- 
>> Thibaut Pouzet
>>
>>
Okay, I took a look at these options, and replaced action="" from my 
primitives with stonith-action="off" as a global property. I removed the 
useless primitives and changed the topology :

fencing_topology \
         fence01.lyra-network.com: 
wti_fence01_port1_off,wti_fence01_port5_off ipmi_fence01 \
         fence02.lyra-network.com: 
wti_fence02_port2_off,wti_fence02_port6_off ipmi_fence02

My faulty node is off now, it's been shut down through the WTI. Next 
step : rebooting the nodes. I'm not sure we can achieve such thing with 
this method though...

I looked at the code of fence_wti, and how it was called from pacemaker, 
and I believe there could be a minor patch to the fencing agent that 
would make everything easier :
* On WTI switches, you can configure named port groups, and reboot a 
port group (i.e. several PSUs) the same way you reboot a single port.
* These port groups can be monitored via the command '/SG' in opposition 
to single ports, monitored with '/S'. The output is a bit different, but 
not so different.
* When you call fence_wti with a named port group, the script wants to 
get the status of the port group before making any action. Since the 
port groups statuses are not reachable from '/S' command, it fails. 
However, if fence_wti could only try '/SG' when '/S' fails, then it 
would get the group's status, and then be able to simply do '/OFF 
port_group_name' (or /ON, /BOOT ..) the same way it used to do '/OFF 
single_port' .

-- 
Thibaut Pouzet




More information about the Pacemaker mailing list