[Pacemaker] fence_legacy, stonith and apcmastersnmp

Kadlecsik József kadlecsik.jozsef at wigner.mta.hu
Fri Mar 2 08:47:38 CET 2012


On Fri, 2 Mar 2012, Andrew Beekhof wrote:

> 2012/3/2 Kadlecsik József <kadlecsik.jozsef at wigner.mta.hu>:
> >
> > After upgrading to pacemaker 1.1.6, cluster-glue 1.0.8 on Debian, our
> > working apcmastersnmp resources stopped to work:
> >
> > Feb 29 14:22:03 atlas0 stonith: [35438]: ERROR: apcmastersnmp device not
> > accessible.
> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: notice: log_operation:
> > Operation 'monitor' [35404] for device 'stonith-atlas6' returned: -2
> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation:
> > stonith-atlas6: Performing: stonith -t apcmastersnmp -S 161
> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation:
> > stonith-atlas6: Invalid config info for apcmastersnmp device
> >
> > Please note the strange "161" argument of stonith.
> >
> > After checking the source code and stracing stonithd, as far as I see, the
> > following happens:
> >
> > - stonithd calls fence_legacy, which steals the "port=161" parameter from
> >  apcmastersnmp. This produces the error message
> >  "Invalid config info for apcmastersnmp device"
> 
> You keep saying steals, what do you mean by that?  Where is it stolen from?

fence_legacy passes the parameters to the stonith drivers via environment 
variables, except the "port". However "port" is mandatory for 
apcmastersnmp. I should have worded it better.

> What does your config look like?

Before upgrade the working apcmastersnmp resource was

primitive stonith-atlas5 stonith:apcmastersnmp \
        params ipaddr="192.168.40.252" community="private" port="161" \
	...

"ipaddr", "community" are passed via environments variables by 
fence_legacy, but "port" doesn't.

We converted the resource to external/rackpdu, but that cannot handle 
nodes attached to multiple outlets, so we should have apcmastersnm working 
back.

> > - At stealing "port=161", fence_legacy sets the port value to the node
> >  name and passes to stonith, even in status mode. Therefore we
> >  get "stonith -t apcmastersnmp -S 161"
> > - However stonith cannot catch the invalid node parameter:
> >
> >        if (!(argcount == 1 || (argcount < 1
> >        &&      (status||listhosts||listtypes||listparanames||metadata))))
> > {
> >                ++errors;
> >        }
>  
> where is fragment this from?

The C code fragments are from cluster-glue-1.0.8/lib/stonith/main.c.
 
> >   and even in status mode wants to run the reset request too:
> >
> >                if (status) {
> >                        < no exit >
> >                }
> >                if (listhosts) {
> >                        < no exit >
> >                }
> >                if (optind < argc) {
> >                        ...
> >                        rc = stonith_req_reset(s, reset_type, nodename);
> >                }
> >
> > Fortunately the port value does not match nodename, so it won't kill any
> > node, but the agent fails.
> >
> > Am I on the right track? Would the following patch fix the issue? I'm
> > asking it, because I don't know why "port=" is handled separatedly and
> > what are the implications of deleting $opt_n below.
> >
> > --- fence_legacy.orig   2012-02-29 23:03:36.594945717 +0100
> > +++ fence_legacy        2012-03-01 14:41:46.454859212 +0100
> > @@ -105,6 +105,7 @@
> >        elsif ($name eq "port" )
> >        {
> >             $opt_n = $val;
> > +            $ENV{$name} = $val;
> 
> what is this for?

Passing "port" similarly to the other parameters to the stonith drivers.
 
> >         }
> >        elsif ($name eq "stonith" )
> >        {
> > @@ -176,8 +177,8 @@
> >    }
> >    elsif ( $opt_o eq "monitor" || $opt_o eq "stat" || $opt_o eq "status" )
> >    {
> > -       print "Performing: $opt_s -t $opt_t -S $opt_n\n" unless defined $opt_q;
> > -       exec "$opt_s -t $opt_t $extra_args -S $opt_n" or die "failed to exec \"$opt_s\"\n";
> > +       print "Performing: $opt_s -t $opt_t -S\n" unless defined $opt_q;
> > +       exec "$opt_s -t $opt_t $extra_args -S" or die "failed to exec \"$opt_s\"\n";
> 
> I was under the impression that -S needed a node name, I see however
> that this isnt the case.
> Some devices can query the state of an individual port, it seems that
> the stonith binary doesn't expose this.
> 
> Does everything work when you have this patch?

We'll give it a try today. It's the usual issue: we have to experiment 
on a in production cluster.

Best regards,
Jozsef
--
E-mail : kadlecsik.jozsef at wigner.mta.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
         H-1525 Budapest 114, POB. 49, Hungary


More information about the Pacemaker mailing list