[Pacemaker] Fencing configuration with pcmk_host_map argument

Fri Feb 1 11:08:07 UTC 2013

Hi,

I am running some tests in order to implement fencing with two methods, 
and I got stuck on the WTI configuration while the IPMI configuration 
was pretty straight forward.

I have an installation with two nodes on Centos 6.3 running pacemaker 
1.1.7 + corosync 1.4.1 . Both servers supports IPMI, and are plugged to 
a WTI power switch. This is the physical configuration :
{node1,psu1} => WTI_Port1
{node1,psu2} => WTI_Port5
{node2,psu1} => WTI_Port2
{node2,psu2} => WTI_Port6
{WTI,port1..4} => Electrical circuit A
{WTI,port5..8} => Electrical circuit B

I cleared the IPMI configuration and kept only the two WTI fencing 
Primitives in my configuration to make it as simple as possible :

primitive wti_fence01 stonith:fence_wti \
         params ipaddr="192.168.0.7" action="reboot" verbose="true" 
pcmk_host_check="static-list" pcmk_host_list="fence01.domain" 
pcmk_host_map="fence01.domain:1,5" login_timeout="20" shell_timeout="20" \
         op monitor interval="30s"
primitive wti_fence02 stonith:fence_wti \
         params ipaddr="192.168.0.7" action="reboot" verbose="true" 
pcmk_host_check="static-list" pcmk_host_list="fence02.domain" 
pcmk_host_map="fence02.domain:2,6" login_timeout="20" shell_timeout="20" \
         op monitor interval="30s"

location wti_fence01-on-fence02 wti_fence01 \
         rule $id="wti_fence01-on-fence02-rule" -inf: #uname eq 
fence01.domain
location wti_fence02-on-fence01 wti_fence02 \
         rule $id="wti_fence02-on-fence01-rule" -inf: #uname eq 
fence02.domain
location bind-on-fence02 bind 100: fence01.domain

With this configuration, in /var/log/cluster/corosync.log, among the 
whole telnet session with the PDU, I can read this error  :
Feb 01 12:49:36 [4492] fence02.domain stonith-ng:    error: 
log_operation:    wti_fence01: IPS>Failed: Unable to obtain correct plug 
status or plug is not available

I believe my problem comes from the attribute 
pcmk_host_map="fence02.domain:2,6".  If I modify the value of this 
attribute to pcmk_host_map="fence01.domain:1" and 
pcmk_host_map="fence02.domain:2", I no longer have the errors in the 
logs. Furthermore, with only one port configured, when I provoke the 
fencing, I can see that it works fine in the logs :

Feb 01 12:47:12 [4492] fence02.domain stonith-ng:     info: 
initiate_remote_stonith_op:       Initiating remote operation reboot for 
fence01.lyra-network.com: 13eb69d2-6e94-4563-a6f8-60d849ab5926
[...]
Feb 01 12:47:14 [4492] fence02.domain stonith-ng:     info: 
log_operation:    wti_fence01: Plug | Name             | Password | 
Status | Boot/Seq. Delay | Default |
Feb 01 12:47:14 [4492] fence02.domain stonith-ng:     info: 
log_operation:    wti_fence01:  1   | f01-wti          | (undefined) |   
ON   |     1  Sec      |   ON    |
[...]
Feb 01 12:47:14 [4492] fence02.domain stonith-ng:     info: 
log_operation:    wti_fence01:  1   | f01-wti          | (undefined) |   
OFF  |     1  Sec      |   ON    |
[...]
Feb 01 12:47:14 [4492] fence02.domain stonith-ng:     info: 
log_operation:    wti_fence01:  1   | f01-wti          | (undefined) |   
ON   |     1  Sec      |   ON    |

The PDU works fine, as I can manually reboot without troubles :
- In CLI mode, with "/boot 1+5" or "/boot 1 5" in order to reboot the 
first node.
- In "remote" mode, with the fence_agent fence_wti, and this command (no 
passwd configured nor confirmation required) :
for port in 1 5; do  fence_wti -o reboot -a 192.168.0.7 -n $port -v; done

I've reach a dead-end here, and I have lost a lot of time trying to 
figure it out. Am I missing something obvious, am I a newbee that can't 
make a proper use of stonithd, or is this somehow a bug or incompatibility ?

Any help on this will be greatly appreciated !

Thibaut.