[Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1

Andrew Beekhof andrew at beekhof.net
Tue Jul 16 05:23:10 CEST 2013


On 15/07/2013, at 8:56 PM, Mistina Michal <Michal.Mistina at virte.sk> wrote:

> Hi Andrew.
> 
> Here is the ommited /var/log/messages with stonigh-ng sections.
> 
> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:   notice: stonith_device_action:
> Device vm-fence-pcmk2 not found
> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_command: Processed
> st_execute from lrmd: rc=-12
> Jul 15 09:53:38 PCMK1 crmd[1542]:     info: process_lrm_event: LRM operation
> vm-fence-pcmk2_monitor_0 (call=11, rc=7, cib-update=21, confirmed=true) not
> running
> Jul 15 09:53:38 PCMK1 lrmd: [1539]: info: rsc:vm-fence-pcmk2:12: start
> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_device_register:
> Added 'vm-fence-pcmk2' to the device list (1 active devices)
> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_command: Processed
> st_device_register from lrmd: rc=0
> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_command: Processed
> st_execute from lrmd: rc=-1
> Jul 15 09:54:13 PCMK1 lrmd: [1539]: WARN: vm-fence-pcmk2:start process (PID
> 3332) timed out (try 1).  Killing with signal SIGTERM (15).

you took too long, go away

> Jul 15 09:54:18 PCMK1 lrmd: [1539]: WARN: vm-fence-pcmk2:start process (PID
> 3332) timed out (try 2).  Killing with signal SIGKILL (9).

seriously go away

> Jul 15 09:54:18 PCMK1 lrmd: [1539]: WARN: operation start[12] on
> stonith::fence_vmware_soap::vm-fence-pcmk2 for client 1542, its parameters:
> passwd=[password] shell_timeout=[20] ssl=[1] login=[administrator]
> action=[reboot] crm_feature_set=[3.0.6] retry_on=[10] ipaddr=[x.x.x.x]
> port=[T1-PCMK2] login_timeout=[15] CRM_meta_timeout=[20000] : pid [3332]
> timed out

whatever that agent is doing, its taking to long
or you've not given it long enough

> Jul 15 09:54:18 PCMK1 crmd[1542]:    error: process_lrm_event: LRM operation
> vm-fence-pcmk2_start_0 (12) Timed Out (timeout=20000ms)
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_ais_dispatch: Update
> relayed from pcmk2
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: fail-count-vm-fence-pcmk2 (INFINITY)
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_perform_update: Sent
> update 24: fail-count-vm-fence-pcmk2=INFINITY
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_ais_dispatch: Update
> relayed from pcmk2
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: last-failure-vm-fence-pcmk2 (1373874858)
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_perform_update: Sent
> update 27: last-failure-vm-fence-pcmk2=1373874858
> Jul 15 09:54:21 PCMK1 lrmd: [1539]: info: rsc:vm-fence-pcmk2:13: stop
> Jul 15 09:54:21 PCMK1 stonith-ng[1538]:     info: stonith_device_remove:
> Removed 'vm-fence-pcmk2' from the device list (0 active devices)
> Jul 15 09:54:21 PCMK1 stonith-ng[1538]:     info: stonith_command: Processed
> st_device_remove from lrmd: rc=0
> Jul 15 09:54:21 PCMK1 crmd[1542]:     info: process_lrm_event: LRM operation
> vm-fence-pcmk2_stop_0 (call=13, rc=0, cib-update=23, confirmed=true) ok
> 
> What does this output mean?
> 
> Best regards,
> Michal Mistina
> 
> -----Original Message-----
> From: Andrew Beekhof [mailto:andrew at beekhof.net] 
> Sent: Monday, July 15, 2013 3:06 AM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1
> 
> 
> On 13/07/2013, at 10:05 PM, Mistina Michal <Michal.Mistina at virte.sk> wrote:
> 
>> Hi,
>> Does somebody know how to set up fence_vmware_soap correctly so that it
> will start fencing vmware machine in the esx 5.1?
>> 
>> My problem is the fence_vmware_soap resource agent for stonith timed out.
> Don't know why.
> 
> Nothing in the stonith-ng logs?
> 
>> 
>> [root at pcmk1 ~]# crm_verify -L -V
>> warning: unpack_rsc_op:        Processing failed op
> vm-fence-pcmk2_last_failure_0 on pcmk1: unknown exec error (-2)
>> warning: unpack_rsc_op:        Processing failed op
> vm-fence-pcmk1_last_failure_0 on pcmk2: unknown exec error (-2)
>> warning: common_apply_stickiness:      Forcing vm-fence-pcmk2 away from
> pcmk1 after 1000000 failures (max=1000000)
>> warning: common_apply_stickiness:      Forcing vm-fence-pcmk1 away from
> pcmk2 after 1000000 failures (max=1000000)
>> 
>> I have 2 node cluster. If I tried to manually reboot vmware machine by
> calling fence_vmware_soap it worked.
>> [root at pcmk1 ~]# fence_vmware_soap -a x.x.x.x -l administrator -p 
>> password -n "pcmk2" -o reboot -z
>> 
>> My settings are.
>> [root at pcmk1 ~]# stonith_admin -M -a fence_vmware_soap <resource-agent 
>> name="fence_vmware_soap" shortdesc="Fence agent for VMWare over SOAP API">
>>  <longdesc>fence_vmware_soap is an I/O Fencing agent which can be used
> with the virtual machines managed by VMWare products that have SOAP API
> v4.1+.
>> .P
>> Name of virtual machine (-n / port) has to be used in inventory path
> format (e.g. /datacenter/vm/Discovered virtual machine/myMachine). In the
> cases when name of yours VM is unique you can use it instead. Alternatively
> you can always use UUID (-U / uuid) to access virtual machine.</longdesc>
>>  <vendor-url>http://www.vmware.com</vendor-url>
>>  <parameters>
>>    <parameter name="action" unique="0" required="1">
>>      <getopt mixed="-o, --action=&lt;action&gt;"/>
>>      <content type="string" default="reboot"/>
>>      <shortdesc lang="en">Fencing Action</shortdesc>
>>    </parameter>
>>    <parameter name="ipaddr" unique="0" required="1">
>>      <getopt mixed="-a, --ip=&lt;ip&gt;"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">IP Address or Hostname</shortdesc>
>>    </parameter>
>>    <parameter name="login" unique="0" required="1">
>>      <getopt mixed="-l, --username=&lt;name&gt;"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">Login Name</shortdesc>
>>    </parameter>
>>    <parameter name="passwd" unique="0" required="0">
>>      <getopt mixed="-p, --password=&lt;password&gt;"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">Login password or passphrase</shortdesc>
>>    </parameter>
>>    <parameter name="passwd_script" unique="0" required="0">
>>      <getopt mixed="-S, --password-script=&lt;script&gt;"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">Script to retrieve password</shortdesc>
>>    </parameter>
>>    <parameter name="ssl" unique="0" required="0">
>>      <getopt mixed="-z, --ssl"/>
>>      <content type="boolean"/>
>>      <shortdesc lang="en">SSL connection</shortdesc>
>>    </parameter>
>>    <parameter name="port" unique="0" required="0">
>>      <getopt mixed="-n, --plug=&lt;id&gt;"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">Physical plug number or name of virtual
> machine</shortdesc>
>>    </parameter>
>>    <parameter name="uuid" unique="0" required="0">
>>      <getopt mixed="-U, --uuid"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">The UUID of the virtual machine to
> fence.</shortdesc>
>>    </parameter>
>>    <parameter name="ipport" unique="0" required="0">
>>      <getopt mixed="-u, --ipport=&lt;port&gt;"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">TCP port to use for connection with
> device</shortdesc>
>>    </parameter>
>>    <parameter name="verbose" unique="0" required="0">
>>      <getopt mixed="-v, --verbose"/>
>>      <content type="boolean"/>
>>      <shortdesc lang="en">Verbose mode</shortdesc>
>>    </parameter>
>>    <parameter name="debug" unique="0" required="0">
>>      <getopt mixed="-D, --debug-file=&lt;debugfile&gt;"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">Write debug information to given
> file</shortdesc>
>>    </parameter>
>>    <parameter name="version" unique="0" required="0">
>>      <getopt mixed="-V, --version"/>
>>      <content type="boolean"/>
>>      <shortdesc lang="en">Display version information and
> exit</shortdesc>
>>    </parameter>
>>    <parameter name="help" unique="0" required="0">
>>      <getopt mixed="-h, --help"/>
>>      <content type="boolean"/>
>>      <shortdesc lang="en">Display help and exit</shortdesc>
>>    </parameter>
>>    <parameter name="separator" unique="0" required="0">
>>      <getopt mixed="-C, --separator=&lt;char&gt;"/>
>>      <content type="string" default=","/>
>>      <shortdesc lang="en">Separator for CSV created by operation
> list</shortdesc>
>>    </parameter>
>>    <parameter name="power_timeout" unique="0" required="0">
>>      <getopt mixed="--power-timeout"/>
>>      <content type="string" default="20"/>
>>      <shortdesc lang="en">Test X seconds for status change after
> ON/OFF</shortdesc>
>>    </parameter>
>>    <parameter name="shell_timeout" unique="0" required="0">
>>      <getopt mixed="--shell-timeout"/>
>>      <content type="string" default="3"/>
>>      <shortdesc lang="en">Wait X seconds for cmd prompt after issuing
> command</shortdesc>
>>    </parameter>
>>    <parameter name="login_timeout" unique="0" required="0">
>>      <getopt mixed="--login-timeout"/>
>>      <content type="string" default="5"/>
>>      <shortdesc lang="en">Wait X seconds for cmd prompt after
> login</shortdesc>
>>    </parameter>
>>    <parameter name="power_wait" unique="0" required="0">
>>      <getopt mixed="--power-wait"/>
>>      <content type="string" default="0"/>
>>      <shortdesc lang="en">Wait X seconds after issuing ON/OFF</shortdesc>
>>    </parameter>
>>    <parameter name="delay" unique="0" required="0">
>>      <getopt mixed="--delay"/>
>>      <content type="string" default="0"/>
>>      <shortdesc lang="en">Wait X seconds before fencing is
> started</shortdesc>
>>    </parameter>
>>    <parameter name="retry_on" unique="0" required="0">
>>      <getopt mixed="--retry-on"/>
>>      <content type="string" default="1"/>
>>      <shortdesc lang="en">Count of attempts to retry power on</shortdesc>
>>    </parameter>
>>  </parameters>
>>  <actions>
>>    <action name="on"/>
>>    <action name="off"/>
>>    <action name="reboot"/>
>>    <action name="status"/>
>>    <action name="list"/>
>>    <action name="monitor"/>
>>    <action name="metadata"/>
>>    <action name="stop" timeout="20s"/>
>>    <action name="start" timeout="20s"/>
>>  </actions>
>> </resource-agent>
>> 
>> [root at pcmk1 ~]# crm configure show
>> node pcmk1
>> node pcmk2
>> primitive drbd_pg ocf:linbit:drbd \
>>        params drbd_resource="postgres" \
>>        op monitor interval="15" role="Master" \
>>        op monitor interval="16" role="Slave" \
>>        op start interval="0" timeout="240" \
>>        op stop interval="0" timeout="120"
>> primitive pg_fs ocf:heartbeat:Filesystem \
>>        params device="/dev/vg_local-lv_pgsql/lv_pgsql"
> directory="/var/lib/pgsql/9.2/data" options="noatime,nodiratime"
> fstype="xfs" \
>>        op start interval="0" timeout="60" \
>>        op stop interval="0" timeout="120"
>> primitive pg_lsb lsb:postgresql-9.2 \
>>        op monitor interval="30" timeout="60" \
>>        op start interval="0" timeout="60" \
>>        op stop interval="0" timeout="60"
>> primitive pg_lvm ocf:heartbeat:LVM \
>>        params volgrpname="vg_local-lv_pgsql" \
>>        op start interval="0" timeout="30" \
>>        op stop interval="0" timeout="30"
>> primitive pg_vip ocf:heartbeat:IPaddr2 \
>>        params ip="x.x.x.x" iflabel="pcmkvip" \
>>        op monitor interval="5"
>> primitive vm-fence-pcmk1 stonith:fence_vmware_soap \
>>        params ipaddr="x.x.x.x" login="administrator" passwd="password"
> port="pcmk1" ssl="1" retry_on="10" shell_timeout="20" login_timeout="15"
> action="reboot"
>> primitive vm-fence-pcmk2 stonith:fence_vmware_soap \
>>        params ipaddr="x.x.x.x" login="administrator" passwd="password"
> port="pcmk2" ssl="1" retry_on="10" shell_timeout="20" login_timeout="15"
> action="reboot"
>> group PGServer pg_lvm pg_fs pg_lsb pg_vip ms ms_drbd_pg drbd_pg \
>>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
>> location l-st-pcmk1 vm-fence-pcmk1 -inf: pcmk1 location l-st-pcmk2 
>> vm-fence-pcmk2 -inf: pcmk2 location master-prefer-node1 pg_vip 50: 
>> pcmk1 colocation col_pg_drbd inf: PGServer ms_drbd_pg:Master order 
>> ord_pg inf: ms_drbd_pg:promote PGServer:start property 
>> $id="cib-bootstrap-options" \
>>        dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14"
> \
>>        cluster-infrastructure="openais" \
>>        expected-quorum-votes="4" \
>>        stonith-enabled="true" \
>>        no-quorum-policy="ignore" \
>>        maintenance-mode="false"
>> rsc_defaults $id="rsc-options" \
>>        resource-stickiness="100"
>> 
>> Am I doing something wrong?
>> 
>> Best regards,
>> Michal Mistina
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list