[Pacemaker] Resource Agent ethmonitor

kook kookliu at gmail.com
Fri Jun 29 01:27:54 EDT 2012


I resolved the problem. I found this is a bug in ethmonitor agent.

in ethmonitor :

255 # get the link status on $NIC
256 # asks ip about running (up) interfaces, returns the number of matching
interface names that are up
257 get_link_status () {
258        $IP2UTIL -o link show up dev "$NIC" | grep -c "$NIC"
259 }

   The command  "ip -o link show up dev eth0  ", just only detect the
interface down. but can't detect the link down.
   So , i guest the developer ,maybe just use command ifdown eth0/bond0 as
test.
   not consider the scene that unplug the cable.

Finaly, I decide add the function in IPaddr2. no longer use the agent
ethmonitor.

I changed monitor fuction of the agent ocf:heartbeat:IPaddr2.

760 ip_monitor() {
761         # TODO: Implement more elaborate monitoring like checking for
762         # interface health maybe via a daemon like FailSafe etc...
763
764         t=$(ip link show "$NIC" | grep -c "state UP")
765         #test $t -ne 1 && return $OCF_ERR_PERM
766         test $t -ne 1 && return $OCF_ERR_PERM
767

    so if the nic link down or interface down, the resource will be switch
to other node.

but u need add the meta to the ocf:heatbeat:IPaddr2. Some like this

node sles11264-node1
node sles11264-node2
primitive p_apache lsb:apache2 \
        op monitor interval="15" timeout="30"
primitive p_vip ocf:heartbeat:IPaddr2 \
        params ip="192.168.203.250" nic="eth0" iflabel="0" \
        op monitor interval="10" timeout="20" \
        meta failure-timeout="5"
group g_apache p_vip p_apache \
        meta target-role="Started"
property $id="cib-bootstrap-options" \
        dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="no" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1340872994"

about  meta failure-timeout="5"  , you must be careful to set this value.
If you set to small, will cause the other side node doesn't have enough
time take over. so calculate, set larger.

my english is so bad ,i hope so you can understand.

If you understand Chinese,you can see my blog.
http://linux.52zhe.info/read.php/275.htm





On Fri, Jun 29, 2012 at 1:01 PM, kook <kookliu at gmail.com> wrote:

> For test. I don't know how to reply this subject.
>
>
> On Mon, Jun 25, 2012 at 4:00 PM, kook <kookliu at gmail.com> wrote:
>
>> Dear Fiorenza:
>>
>>      I have the same problem with you. I checked the newest ethmonitor ra (ClusterLabs-resource-agents-v3.9.2-0-ge261943.tar). It's same with my sles 11 sp2.
>>
>> Failed actions:
>>
>>     p_ethmonitor:1_monitor_15000 (node=sles11264-node1, call=1591, rc=-2, status=Timed Out): unknown exec error
>>
>>        so, can you tell me. how did you solved this problem. Thanks.
>>
>> liujia
>>
>>
>>
>> Il 21/03/2012 09:06, Florian Haas ha scritto:
>> >* On Tue, Mar 20, 2012 at 4:18 PM, Fiorenza Meini<fmeini at esseweb.eu <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>>  wrote:*>>* Hi there,*>>* has anybody configured successfully the RA specified in the object of the*>>* message?*>>**>>* I got this error: if_eth0_monitor_0 (node=fw1, call=2297, rc=-2,*>>* status=Timed Out): unknown exec error*>**>* Your ethmonitor RA missed its 50-second timeout on the probe (that is,*>* the initial monitor operation). You should be seeing "Monitoring of*>* if_eth0 failed, X retries left" warnings in your logs. Grepping your*>* syslog for "ethmonitor" will probably turn up some useful results.*>**>* Cheers,*>* Florian*>**
>> Thank you, I solved the problem.
>>
>> Regards
>>
>> --
>>
>> Fiorenza Meini
>> Spazio Web S.r.l.
>>
>> V. Dante Alighieri, 10 - 13900 Biella
>> Tel.: 015.2431982 - 015.9526066
>> Fax: 015.2522600
>> Reg. Imprese, CF e P.I.: 02414430021
>> Iscr. REA: BI - 188936
>> Iscr. CCIAA: Biella - 188936
>> Cap. Soc.: 30.000,00 Euro i.v.
>>
>>
>> ----------------------------
>> Side A or B
>>
>
>
>
> --
> ----------------------------
> 我有一个梦想.呵呵....
>



-- 
----------------------------
我有一个梦想.呵呵....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120629/0524fc76/attachment-0003.html>


More information about the Pacemaker mailing list