[Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1

Thu Jul 25 11:38:35 UTC 2013

Hi Andrew.
You are right. I renamed vmware machines to have the name in lowercase and
it worked. I tested also dash and bracket [ . With unusual characters I
mentioned stonith failed.
However my vmware machine gets infinitely rebooted ower and ower. I've found
somewhere on the web this was a bug of the pacemaker 1.1.7 and was fixed in
the version 1.1.8 . I will try to compile pacemaker source to have the
newest version.

Thank you.

Best regards,
Michal Mistina

On 18/07/2013, at 10:46 PM, Mistina Michal <Michal.Mistina at virte.sk> wrote:

> Hi Andrew.
> Thank you for a little insight. I tried to set higher timout limits 
> within fence_vmware_soap properties in cib database. After I had 
> altered these numbers I didn't experience SIGTERM or SIGKILL any more.
> However automatic fencing was still not successfull.
> I don't understand why "manual fencing" by using command
"fence_vmware_soap"
> is working though and automatic with same parameters isn't.

Because its not using the same parameters.
Until 1.1.10-rc6, Pacemaker used a calculated value for port and action -
regardless of what you specified.

Look in "man stonithd" or the online docs for details on pcmk_host_map.
You'd probably want "pcmk1:PCMK1;pcmk2:PCMK2;"

Or just name the hosts in lowercase in vmware

> 
> corosync.log attached further in the text shows there are some parsing 
> errors. I think this regards unusual characters used in the names of 
> the virtual machines which run on the ESX. This makes sense if unusual 
> character is used in the name of the fenced vmware machine. It isn't. 
> The corosyng.log shows names of other virtual machines on the ESX.
> 
> Is it safe to say the issue occured within fence_vmware_soap resource 
> agent because it cannot handle something, maybe names of the virtual 
> machines? If so, I will try to update that agent. I am using version 
> fence-agents-3.1.5-17.el6.x86_64.
> Is there a chance that changing timeout limits will help the 
> situation? I have feeling timeouts doesn't solve anything. It times 
> out because of something else.
> 
> This is how the crm configuration looks now....
> [root at pcmk1 ~]# crm configure show
> node pcmk1
> node pcmk2
> primitive drbd_pg ocf:linbit:drbd \
>        params drbd_resource="postgres" \
>        op monitor interval="15" role="Master" \
>        op monitor interval="16" role="Slave" \
>        op start interval="0" timeout="240" \
>        op stop interval="0" timeout="120"
> primitive pg_fs ocf:heartbeat:Filesystem \
>        params device="/dev/vg_local-lv_pgsql/lv_pgsql"
> directory="/var/lib/pgsql/9.2/data" options="noatime,nodiratime"
> fstype="xfs" \
>        op start interval="0" timeout="60" \
>        op stop interval="0" timeout="120"
> primitive pg_lsb lsb:postgresql-9.2 \
>        op monitor interval="30" timeout="60" \
>        op start interval="0" timeout="60" \
>        op stop interval="0" timeout="60"
> primitive pg_lvm ocf:heartbeat:LVM \
>        params volgrpname="vg_local-lv_pgsql" \
>        op start interval="0" timeout="30" \
>        op stop interval="0" timeout="30"
> primitive pg_vip ocf:heartbeat:IPaddr2 \
>        params ip="x.x.x.x" iflabel="tstcapsvip" \
>        op monitor interval="5"
> primitive vm-fence-pcmk1 stonith:fence_vmware_soap \
>        params ipaddr="x.x.x.x" login="administrator" passwd="password"
> port="PCMK1" ssl="1" retry_on="10" shell_timeout="120" login_timeout="120"
> action="reboot" \
>        op start interval="0" timeout="120"
> primitive vm-fence-pcmk2 stonith:fence_vmware_soap \
>        params ipaddr="x.x.x.x" login="administrator" passwd="password"
> port="PCMK2" ssl="1" retry_on="10" shell_timeout="120" login_timeout="120"
> action="reboot" \
>        op start interval="0" timeout="120"
> group PGServer pg_lvm pg_fs pg_lsb pg_vip \
>        meta target-role="Started"
> ms ms_drbd_pg drbd_pg \
>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> location l-st-pcmk1 vm-fence-pcmk1 -inf: pcmk1 location l-st-pcmk2 
> vm-fence-pcmk2 -inf: pcmk2 location master-prefer-node1 pg_vip 50: 
> pcmk1 colocation col_pg_drbd inf: PGServer ms_drbd_pg:Master order 
> ord_pg inf: ms_drbd_pg:promote PGServer:start property 
> $id="cib-bootstrap-options" \
>        dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="4" \
>        stonith-enabled="true" \
>        no-quorum-policy="ignore" \
>        maintenance-mode="false"
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="100"
> 
> Command crm_verify -LV shows nothing.
> [root at pcmk1 ~]# crm_verify -LV
> 
> 
> [root at pcmk1 ~]# crm_mon -1
> ============
> Last updated: Thu Jul 18 14:23:15 2013 Last change: Thu Jul 18 
> 14:20:54 2013 via crm_resource on pcmk1
> Stack: openais
> Current DC: pcmk2 - partition WITHOUT quorum
> Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
> 2 Nodes configured, 4 expected votes
> 8 Resources configured.
> ============
> 
> Online: [ pcmk1 pcmk2 ]
> 
> Resource Group: PGServer
>     pg_lvm     (ocf::heartbeat:LVM):   Started pcmk1
>     pg_fs      (ocf::heartbeat:Filesystem):    Started pcmk1
>     pg_lsb     (lsb:postgresql-9.2):   Started pcmk1
>     pg_vip     (ocf::heartbeat:IPaddr2):       Started pcmk1
> Master/Slave Set: ms_drbd_pg [drbd_pg]
>     Masters: [ pcmk1 ]
>     Slaves: [ pcmk2 ]
> vm-fence-pcmk1     (stonith:fence_vmware_soap):    Started pcmk2
> vm-fence-pcmk2     (stonith:fence_vmware_soap):    Started pcmk1
> 
> If I simulate split-brain by plugging out the cable from secondary 
> server pcmk2, /var/log/cluster/corosync.log on the primary server 
> pcmk1 tell this...
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:     info:
> can_fence_host_with_device:      Refreshing port list for vm-fence-pcmk2
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 21): [106.15],4222ac70-92c3-bddf-b524-24d848080cb2
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 21): [107.25],42224003-b614-5eb2-f141-5437fc8319d8
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 21): [107.29],4222719f-7bdc-84b2-4494-848a29c2bd5f
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (0 1): [ MEDI - WinXP with SP3 - MSDN 
> ],4222238c-c927-3af1-f2e7-e0dd374d373b
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (31 32): ],4222238c-c927-3af1-f2e7-e0dd374d373b
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (0 1): [ MEDI WIN7 32-bit  -
> MSDN],42223e4a-9541-2326-2a21-3b3532756b47
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 22): 
> [105.233],42220acd-6e21-4380-9b81-89d86f14317d
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (9 17): [106.21],42223377-1443-a44c-1dc0-815c2542898e
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (12 20): [106.29],4222394a-70f1-4612-6fcd-4525e13b0cc4
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (0 1): [ MEDI W2K8 R2 SP1 STD - MSDN 
> ],4222dc65-6752-b1b4-c0f7-38c94cd5609a
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (30 31): ],4222dc65-6752-b1b4-c0f7-38c94cd5609a
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (12 20): [106.52],4222aa80-0fe6-66c4-8d11-fea5f547b566
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 21): [106.14],422249fc-a902-ba5c-deb0-e6db6198b984
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (18 25): [106.2],4222851c-1a9d-021a-4e16-9f8adc5bcc42
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (12 20): [106.28],422235ab-83c4-c0b7-812b-bc5b7019aff7
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 21): [106.26],4222bbff-48eb-d60c-0347-430b8d72baa2
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 21): [107.27],4222da62-3c55-37f8-f6b8-239657892914
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (0 1): [ MEDI WIN7 64-bit - MSDN 
> ],4222289e-0bd2-4280-c0f4-548fd42e7eab
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (26 27): ],4222289e-0bd2-4280-c0f4-548fd42e7eab
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (17 26): 
> [105.242],42228b51-4ef6-f9b8-b64a-882d68023074
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (20 29): 
> [105.230],42223dcd-22c1-a0f7-c629-5c4489e2c55d
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (0 1): [ W2K3 R2 ENT 32-bit ENG 
> ],4233c1c8-e0f9-26f3-b854-6376ec6b1d1c
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (25 26): ],4233c1c8-e0f9-26f3-b854-6376ec6b1d1c
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (9 17): [106.20],422285ba-6a31-0832-1b38-a910031cd057
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 21): [106.27],4222d166-5647-79a3-d9d8-f90650b6188b
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (21 30): 
> [105.231],4222308c-41c7-02e9-3b20-c6df71838db9
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (25 28): !!! 
> [105.235],422283ac-c5d9-4bf1-96eb-a57d8d18c118
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (29 38): 
> [105.235],422283ac-c5d9-4bf1-96eb-a57d8d18c118
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (12 20): [106.13],42222137-0d67-ac9b-e3b6-11fb6d2c33e0
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (17 26): 
> [105.241],4222a40f-d91a-0e4f-2292-ef92c4836bb5
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (17 26): 
> [105.243],42222a9a-7440-6d19-b654-42c08a2abd69
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (0 1): [ MEDI W2K8 R2 SP1 ENT - MSDN
> ],42227507-c4fd-c5aa-b7d7-4ececd284f84
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (30 31): ],42227507-c4fd-c5aa-b7d7-4ececd284f84
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (0 1): [ MEDI_gw_chckpnt
> ],4222f42e-58c6-dc59-2a00-10041ad5ac08
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (18 19): ],4222f42e-58c6-dc59-2a00-10041ad5ac08
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 22): 
> [105.234],422295e3-644e-8b51-a373-e7f166b2fd5d
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 22): 
> [105.232],42228f9d-615f-1c3b-2158-d3ad08d40357
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (17 26): 
> [105.240],4222b273-68e7-379d-b874-6a47211e9449
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 21): [107.28],4222cbc8-565d-eee1-4430-555b059663d0
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 22): 
> [105.236],4222115e-789a-66dd-95e9-786ec0d84ec0
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (13 21): [107.26],4222fb16-fadc-9031-8e3d-110225505a0f
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (12 20): [106.12],42226bf9-8e78-9356-773c-ecde31cf2fa2
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
> Could not parse (12 20): [106.51],4222ae99-f1d9-9811-d72b-10e875c58f56
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:     info:
> can_fence_host_with_device:      vm-fence-pcmk2 can not fence pcmk2:
> dynamic-list
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:     info: stonith_command:
> Processed st_query from pcmk1: rc=0
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:    error: remote_op_done:
> Operation reboot of pcmk2 by <no-one> for
> pcmk1[7496e5e6-4ab4-4028-b44d-c34c52a3fd04]: Operation timed out
> Jul 18 14:31:00 [1498] pcmk1       crmd:     info:
tengine_stonith_callback:
> StonithOp <remote-op state="0" st_target="pcmk2" st_op="reboot" />
> Jul 18 14:31:00 [1498] pcmk1       crmd:   notice:
tengine_stonith_callback:
> Stonith operation 4 for pcmk2 failed (Operation timed out): aborting 
> transition.
> Jul 18 14:31:00 [1498] pcmk1       crmd:     info: abort_transition_graph:
> tengine_stonith_callback:454 - Triggered transition abort (complete=0) :
> Stonith failed
> Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: tengine_stonith_notify:
> Peer pcmk2 was not terminated (reboot) by <anyone> for pcmk1: 
> Operation timed out (ref=ca100580-8e00-49d4-b895-c538139a28dd)
> Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: run_graph:       ====
> Transition 2 (Complete=7, Pending=0, Fired=0, Skipped=4, Incomplete=5,
> Source=/var/lib/pengine/pe-warn-34.bz2): Stopped
> Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: do_state_transition:
> State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ 
> input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jul 18 14:31:00 [1497] pcmk1    pengine:   notice: unpack_config:   On
loss
> of CCM Quorum: Ignore
> Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: pe_fence_node:   Node
> pcmk2 will be fenced because it is un-expectedly down
> Jul 18 14:31:00 [1497] pcmk1    pengine:  warning:
determine_online_status:
> Node pcmk2 is unclean
> Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:   Action
> drbd_pg:1_stop_0 on pcmk2 is unrunnable (offline)
> Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:
Marking
> node pcmk2 unclean
> Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:   Action
> drbd_pg:1_stop_0 on pcmk2 is unrunnable (offline)
> Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:
Marking
> node pcmk2 unclean
> Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:   Action
> vm-fence-pcmk1_stop_0 on pcmk2 is unrunnable (offline)
> Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:
Marking
> node pcmk2 unclean
> Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: stage6:  Scheduling
Node
> pcmk2 for STONITH
> Jul 18 14:31:00 [1497] pcmk1    pengine:   notice: LogActions:      Stop
> drbd_pg:1       (pcmk2)
> Jul 18 14:31:00 [1497] pcmk1    pengine:   notice: LogActions:      Stop
> vm-fence-pcmk1      (pcmk2)
> Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: do_state_transition:
> State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Jul 18 14:31:00 [1498] pcmk1       crmd:     info: do_te_invoke:
> Processing graph 3 (ref=pe_calc-dc-1374150660-46) derived from
> /var/lib/pengine/pe-warn-35.bz2
> Jul 18 14:31:00 [1498] pcmk1       crmd:     info: te_rsc_command:
> Initiating action 63: notify drbd_pg:0_pre_notify_stop_0 on pcmk1 
> (local) Jul 18 14:31:00 pcmk1 lrmd: [1495]: info: rsc:drbd_pg:0:28: notify
> Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: te_fence_node:
> Executing reboot fencing operation (53) on pcmk2 (timeout=60000)
> Jul 18 14:31:00 [1494] pcmk1 stonith-ng:     info:
> initiate_remote_stonith_op:      Initiating remote operation reboot for
> pcmk2: d69db4e3-7d3b-4bee-9bd5-aa7afb05c358
> Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: process_pe_message:
> Transition 3: WARNINGs found during PE processing. PEngine Input stored
in:
> /var/lib/pengine/pe-warn-35.bz2
> Jul 18 14:31:00 [1497] pcmk1    pengine:   notice: process_pe_message:
> Configuration WARNINGs found during PE processing.  Please run 
> "crm_verify -L" to identify issues.
> Jul 18 14:31:01 [1498] pcmk1       crmd:     info: process_lrm_event:
> LRM operation drbd_pg:0_notify_0 (call=28, rc=0, cib-update=0,
> confirmed=true) ok
> 
> 
> Regards,
> Michal Mistina
> -----Original Message-----
> From: Andrew Beekhof [mailto:andrew at beekhof.net]
> Sent: Tuesday, July 16, 2013 5:23 AM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1
> 
> 
> On 15/07/2013, at 8:56 PM, Mistina Michal <Michal.Mistina at virte.sk> wrote:
> 
>> Hi Andrew.
>> 
>> Here is the ommited /var/log/messages with stonigh-ng sections.
>> 
>> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:   notice: stonith_device_action:
>> Device vm-fence-pcmk2 not found
>> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_command:
> Processed
>> st_execute from lrmd: rc=-12
>> Jul 15 09:53:38 PCMK1 crmd[1542]:     info: process_lrm_event: LRM
> operation
>> vm-fence-pcmk2_monitor_0 (call=11, rc=7, cib-update=21,
>> confirmed=true) not running Jul 15 09:53:38 PCMK1 lrmd: [1539]: info: 
>> rsc:vm-fence-pcmk2:12: start
>> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info:
stonith_device_register:
>> Added 'vm-fence-pcmk2' to the device list (1 active devices)
>> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_command:
> Processed
>> st_device_register from lrmd: rc=0
>> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_command:
> Processed
>> st_execute from lrmd: rc=-1
>> Jul 15 09:54:13 PCMK1 lrmd: [1539]: WARN: vm-fence-pcmk2:start 
>> process (PID
>> 3332) timed out (try 1).  Killing with signal SIGTERM (15).
> 
> you took too long, go away
> 
>> Jul 15 09:54:18 PCMK1 lrmd: [1539]: WARN: vm-fence-pcmk2:start 
>> process (PID
>> 3332) timed out (try 2).  Killing with signal SIGKILL (9).
> 
> seriously go away
> 
>> Jul 15 09:54:18 PCMK1 lrmd: [1539]: WARN: operation start[12] on
>> stonith::fence_vmware_soap::vm-fence-pcmk2 for client 1542, its
> parameters:
>> passwd=[password] shell_timeout=[20] ssl=[1] login=[administrator] 
>> action=[reboot] crm_feature_set=[3.0.6] retry_on=[10] 
>> ipaddr=[x.x.x.x] port=[T1-PCMK2] login_timeout=[15] 
>> CRM_meta_timeout=[20000] : pid [3332] timed out
> 
> whatever that agent is doing, its taking to long or you've not given 
> it long enough
> 
>> Jul 15 09:54:18 PCMK1 crmd[1542]:    error: process_lrm_event: LRM
> operation
>> vm-fence-pcmk2_start_0 (12) Timed Out (timeout=20000ms)
>> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_ais_dispatch: Update
>> relayed from pcmk2
>> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_trigger_update:
Sending
>> flush op to all hosts for: fail-count-vm-fence-pcmk2 (INFINITY)
>> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_perform_update: Sent
>> update 24: fail-count-vm-fence-pcmk2=INFINITY
>> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_ais_dispatch: Update
>> relayed from pcmk2
>> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_trigger_update:
Sending
>> flush op to all hosts for: last-failure-vm-fence-pcmk2 (1373874858)
>> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_perform_update: Sent
>> update 27: last-failure-vm-fence-pcmk2=1373874858
>> Jul 15 09:54:21 PCMK1 lrmd: [1539]: info: rsc:vm-fence-pcmk2:13: stop
>> Jul 15 09:54:21 PCMK1 stonith-ng[1538]:     info: stonith_device_remove:
>> Removed 'vm-fence-pcmk2' from the device list (0 active devices)
>> Jul 15 09:54:21 PCMK1 stonith-ng[1538]:     info: stonith_command:
> Processed
>> st_device_remove from lrmd: rc=0
>> Jul 15 09:54:21 PCMK1 crmd[1542]:     info: process_lrm_event: LRM
> operation
>> vm-fence-pcmk2_stop_0 (call=13, rc=0, cib-update=23, confirmed=true) 
>> ok
>> 
>> What does this output mean?
>> 
>> Best regards,
>> Michal Mistina
>> 
>> -----Original Message-----
>> From: Andrew Beekhof [mailto:andrew at beekhof.net]
>> Sent: Monday, July 15, 2013 3:06 AM
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1
>> 
>> 
>> On 13/07/2013, at 10:05 PM, Mistina Michal <Michal.Mistina at virte.sk>
> wrote:
>> 
>>> Hi,
>>> Does somebody know how to set up fence_vmware_soap correctly so that 
>>> it
>> will start fencing vmware machine in the esx 5.1?
>>> 
>>> My problem is the fence_vmware_soap resource agent for stonith timed
out.
>> Don't know why.
>> 
>> Nothing in the stonith-ng logs?
>> 
>>> 
>>> [root at pcmk1 ~]# crm_verify -L -V
>>> warning: unpack_rsc_op:        Processing failed op
>> vm-fence-pcmk2_last_failure_0 on pcmk1: unknown exec error (-2)
>>> warning: unpack_rsc_op:        Processing failed op
>> vm-fence-pcmk1_last_failure_0 on pcmk2: unknown exec error (-2)
>>> warning: common_apply_stickiness:      Forcing vm-fence-pcmk2 away from
>> pcmk1 after 1000000 failures (max=1000000)
>>> warning: common_apply_stickiness:      Forcing vm-fence-pcmk1 away from
>> pcmk2 after 1000000 failures (max=1000000)
>>> 
>>> I have 2 node cluster. If I tried to manually reboot vmware machine 
>>> by
>> calling fence_vmware_soap it worked.
>>> [root at pcmk1 ~]# fence_vmware_soap -a x.x.x.x -l administrator -p 
>>> password -n "pcmk2" -o reboot -z
>>> 
>>> My settings are.
>>> [root at pcmk1 ~]# stonith_admin -M -a fence_vmware_soap 
>>> <resource-agent name="fence_vmware_soap" shortdesc="Fence agent for 
>>> VMWare over SOAP API">  <longdesc>fence_vmware_soap is an I/O 
>>> Fencing agent which can be used
>> with the virtual machines managed by VMWare products that have SOAP 
>> API v4.1+.
>>> .P
>>> Name of virtual machine (-n / port) has to be used in inventory path
>> format (e.g. /datacenter/vm/Discovered virtual machine/myMachine). In 
>> the cases when name of yours VM is unique you can use it instead.
>> Alternatively you can always use UUID (-U / uuid) to access virtual 
>> machine.</longdesc>
>>> <vendor-url>http://www.vmware.com</vendor-url>
>>> <parameters>
>>>   <parameter name="action" unique="0" required="1">
>>>     <getopt mixed="-o, --action=<action>"/>
>>>     <content type="string" default="reboot"/>
>>>     <shortdesc lang="en">Fencing Action</shortdesc>
>>>   </parameter>
>>>   <parameter name="ipaddr" unique="0" required="1">
>>>     <getopt mixed="-a, --ip=<ip>"/>
>>>     <content type="string"/>
>>>     <shortdesc lang="en">IP Address or Hostname</shortdesc>
>>>   </parameter>
>>>   <parameter name="login" unique="0" required="1">
>>>     <getopt mixed="-l, --username=<name>"/>
>>>     <content type="string"/>
>>>     <shortdesc lang="en">Login Name</shortdesc>
>>>   </parameter>
>>>   <parameter name="passwd" unique="0" required="0">
>>>     <getopt mixed="-p, --password=<password>"/>
>>>     <content type="string"/>
>>>     <shortdesc lang="en">Login password or passphrase</shortdesc>
>>>   </parameter>
>>>   <parameter name="passwd_script" unique="0" required="0">
>>>     <getopt mixed="-S, --password-script=<script>"/>
>>>     <content type="string"/>
>>>     <shortdesc lang="en">Script to retrieve password</shortdesc>
>>>   </parameter>
>>>   <parameter name="ssl" unique="0" required="0">
>>>     <getopt mixed="-z, --ssl"/>
>>>     <content type="boolean"/>
>>>     <shortdesc lang="en">SSL connection</shortdesc>
>>>   </parameter>
>>>   <parameter name="port" unique="0" required="0">
>>>     <getopt mixed="-n, --plug=<id>"/>
>>>     <content type="string"/>
>>>     <shortdesc lang="en">Physical plug number or name of virtual
>> machine</shortdesc>
>>>   </parameter>
>>>   <parameter name="uuid" unique="0" required="0">
>>>     <getopt mixed="-U, --uuid"/>
>>>     <content type="string"/>
>>>     <shortdesc lang="en">The UUID of the virtual machine to
>> fence.</shortdesc>
>>>   </parameter>
>>>   <parameter name="ipport" unique="0" required="0">
>>>     <getopt mixed="-u, --ipport=<port>"/>
>>>     <content type="string"/>
>>>     <shortdesc lang="en">TCP port to use for connection with
>> device</shortdesc>
>>>   </parameter>
>>>   <parameter name="verbose" unique="0" required="0">
>>>     <getopt mixed="-v, --verbose"/>
>>>     <content type="boolean"/>
>>>     <shortdesc lang="en">Verbose mode</shortdesc>
>>>   </parameter>
>>>   <parameter name="debug" unique="0" required="0">
>>>     <getopt mixed="-D, --debug-file=<debugfile>"/>
>>>     <content type="string"/>
>>>     <shortdesc lang="en">Write debug information to given
>> file</shortdesc>
>>>   </parameter>
>>>   <parameter name="version" unique="0" required="0">
>>>     <getopt mixed="-V, --version"/>
>>>     <content type="boolean"/>
>>>     <shortdesc lang="en">Display version information and
>> exit</shortdesc>
>>>   </parameter>
>>>   <parameter name="help" unique="0" required="0">
>>>     <getopt mixed="-h, --help"/>
>>>     <content type="boolean"/>
>>>     <shortdesc lang="en">Display help and exit</shortdesc>
>>>   </parameter>
>>>   <parameter name="separator" unique="0" required="0">
>>>     <getopt mixed="-C, --separator=<char>"/>
>>>     <content type="string" default=","/>
>>>     <shortdesc lang="en">Separator for CSV created by operation
>> list</shortdesc>
>>>   </parameter>
>>>   <parameter name="power_timeout" unique="0" required="0">
>>>     <getopt mixed="--power-timeout"/>
>>>     <content type="string" default="20"/>
>>>     <shortdesc lang="en">Test X seconds for status change after
>> ON/OFF</shortdesc>
>>>   </parameter>
>>>   <parameter name="shell_timeout" unique="0" required="0">
>>>     <getopt mixed="--shell-timeout"/>
>>>     <content type="string" default="3"/>
>>>     <shortdesc lang="en">Wait X seconds for cmd prompt after issuing
>> command</shortdesc>
>>>   </parameter>
>>>   <parameter name="login_timeout" unique="0" required="0">
>>>     <getopt mixed="--login-timeout"/>
>>>     <content type="string" default="5"/>
>>>     <shortdesc lang="en">Wait X seconds for cmd prompt after
>> login</shortdesc>
>>>   </parameter>
>>>   <parameter name="power_wait" unique="0" required="0">
>>>     <getopt mixed="--power-wait"/>
>>>     <content type="string" default="0"/>
>>>     <shortdesc lang="en">Wait X seconds after issuing ON/OFF</shortdesc>
>>>   </parameter>
>>>   <parameter name="delay" unique="0" required="0">
>>>     <getopt mixed="--delay"/>
>>>     <content type="string" default="0"/>
>>>     <shortdesc lang="en">Wait X seconds before fencing is
>> started</shortdesc>
>>>   </parameter>
>>>   <parameter name="retry_on" unique="0" required="0">
>>>     <getopt mixed="--retry-on"/>
>>>     <content type="string" default="1"/>
>>>     <shortdesc lang="en">Count of attempts to retry power on</shortdesc>
>>>   </parameter>
>>> </parameters>
>>> <actions>
>>>   <action name="on"/>
>>>   <action name="off"/>
>>>   <action name="reboot"/>
>>>   <action name="status"/>
>>>   <action name="list"/>
>>>   <action name="monitor"/>
>>>   <action name="metadata"/>
>>>   <action name="stop" timeout="20s"/>
>>>   <action name="start" timeout="20s"/>  </actions> </resource-agent>
>>> 
>>> [root at pcmk1 ~]# crm configure show
>>> node pcmk1
>>> node pcmk2
>>> primitive drbd_pg ocf:linbit:drbd \
>>>       params drbd_resource="postgres" \
>>>       op monitor interval="15" role="Master" \
>>>       op monitor interval="16" role="Slave" \
>>>       op start interval="0" timeout="240" \
>>>       op stop interval="0" timeout="120"
>>> primitive pg_fs ocf:heartbeat:Filesystem \
>>>       params device="/dev/vg_local-lv_pgsql/lv_pgsql"
>> directory="/var/lib/pgsql/9.2/data" options="noatime,nodiratime"
>> fstype="xfs" \
>>>       op start interval="0" timeout="60" \
>>>       op stop interval="0" timeout="120"
>>> primitive pg_lsb lsb:postgresql-9.2 \
>>>       op monitor interval="30" timeout="60" \
>>>       op start interval="0" timeout="60" \
>>>       op stop interval="0" timeout="60"
>>> primitive pg_lvm ocf:heartbeat:LVM \
>>>       params volgrpname="vg_local-lv_pgsql" \
>>>       op start interval="0" timeout="30" \
>>>       op stop interval="0" timeout="30"
>>> primitive pg_vip ocf:heartbeat:IPaddr2 \
>>>       params ip="x.x.x.x" iflabel="pcmkvip" \
>>>       op monitor interval="5"
>>> primitive vm-fence-pcmk1 stonith:fence_vmware_soap \
>>>       params ipaddr="x.x.x.x" login="administrator" passwd="password"
>> port="pcmk1" ssl="1" retry_on="10" shell_timeout="20" login_timeout="15"
>> action="reboot"
>>> primitive vm-fence-pcmk2 stonith:fence_vmware_soap \
>>>       params ipaddr="x.x.x.x" login="administrator" passwd="password"
>> port="pcmk2" ssl="1" retry_on="10" shell_timeout="20" login_timeout="15"
>> action="reboot"
>>> group PGServer pg_lvm pg_fs pg_lsb pg_vip ms ms_drbd_pg drbd_pg \
>>>       meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true"
>>> location l-st-pcmk1 vm-fence-pcmk1 -inf: pcmk1 location l-st-pcmk2
>>> vm-fence-pcmk2 -inf: pcmk2 location master-prefer-node1 pg_vip 50: 
>>> pcmk1 colocation col_pg_drbd inf: PGServer ms_drbd_pg:Master order 
>>> ord_pg inf: ms_drbd_pg:promote PGServer:start property 
>>> $id="cib-bootstrap-options" \
>>>       dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14"
>> \
>>>       cluster-infrastructure="openais" \
>>>       expected-quorum-votes="4" \
>>>       stonith-enabled="true" \
>>>       no-quorum-policy="ignore" \
>>>       maintenance-mode="false"
>>> rsc_defaults $id="rsc-options" \
>>>       resource-stickiness="100"
>>> 
>>> Am I doing something wrong?
>>> 
>>> Best regards,
>>> Michal Mistina
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3057 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130725/72d570c9/attachment-0004.p7s>