[Pacemaker] Ignoring expired failure
Andrew Beekhof
andrew at beekhof.net
Wed Oct 12 00:40:50 UTC 2011
On Sat, Oct 1, 2011 at 8:14 AM, Proskurin Kirill
<k.proskurin at corp.mail.ru> wrote:
> Hello all.
>
> corosync-1.4.1
> pacemaker-1.1.5
> pacemaker runs with "ver: 1"
>
> I run again on monitoring fail and still don`t know why it happends.
> Details are here:
> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg09986.html
>
> Some info:
> I twice run on situation then pacemaker thinks what resource is started but
> it is not. We use slightly modifed version of "anything" agent for our
> scripts but they are aware of OCF return codes and other staff.
>
> I run monitoring by our agent from console:
>
> # env -i ; OCF_ROOT=/usr/lib/ocf
> OCF_RESKEY_binfile=/usr/local/mpop/bin/my/tranprocessor.pl
> /usr/lib/ocf/resource.d/mail.ru/generic monitor
> # generic[14992]: DEBUG: default monitor : 7
>
>
> But this time I see in logs:
> Oct 01 02:00:12 mysender34.mail.ru pengine: [26301]: notice: unpack_rsc_op:
> Ignoring expired failure tranprocessor_stop_0 (rc=-2,
> magic=2:-2;121:690:0:4c16dc39-1fd3-41f2-b582-0236f6b6eccc) on
> mysender34.mail.ru
>
> So Pacemaker knows what resource may be down but ignoring it. Why?
Its not ignoring it, you're preventing Pacemaker from doing anything
about it by having a broken RA (stop action doesn't work) and not
allowing/configuring fencing.
More information about the Pacemaker
mailing list