[Pacemaker] fence_legacy, stonith and apcmastersnmp
Andrew Beekhof
andrew at beekhof.net
Thu Mar 1 23:13:07 UTC 2012
2012/3/2 Kadlecsik József <kadlecsik.jozsef at wigner.mta.hu>:
> Hello,
>
> After upgrading to pacemaker 1.1.6, cluster-glue 1.0.8 on Debian, our
> working apcmastersnmp resources stopped to work:
>
> Feb 29 14:22:03 atlas0 stonith: [35438]: ERROR: apcmastersnmp device not
> accessible.
> Feb 29 14:22:03 atlas0 stonith-ng: [32972]: notice: log_operation:
> Operation 'monitor' [35404] for device 'stonith-atlas6' returned: -2
> Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation:
> stonith-atlas6: Performing: stonith -t apcmastersnmp -S 161
> Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation:
> stonith-atlas6: Invalid config info for apcmastersnmp device
>
> Please note the strange "161" argument of stonith.
>
> After checking the source code and stracing stonithd, as far as I see, the
> following happens:
>
> - stonithd calls fence_legacy, which steals the "port=161" parameter from
> apcmastersnmp. This produces the error message
> "Invalid config info for apcmastersnmp device"
You keep saying steals, what do you mean by that? Where is it stolen from?
What does your config look like?
> - At stealing "port=161", fence_legacy sets the port value to the node
> name and passes to stonith, even in status mode. Therefore we
> get "stonith -t apcmastersnmp -S 161"
> - However stonith cannot catch the invalid node parameter:
>
> if (!(argcount == 1 || (argcount < 1
> && (status||listhosts||listtypes||listparanames||metadata))))
> {
> ++errors;
> }
where is fragment this from?
> and even in status mode wants to run the reset request too:
>
> if (status) {
> < no exit >
> }
> if (listhosts) {
> < no exit >
> }
> if (optind < argc) {
> ...
> rc = stonith_req_reset(s, reset_type, nodename);
> }
>
> Fortunately the port value does not match nodename, so it won't kill any
> node, but the agent fails.
>
> Am I on the right track? Would the following patch fix the issue? I'm
> asking it, because I don't know why "port=" is handled separatedly and
> what are the implications of deleting $opt_n below.
>
> --- fence_legacy.orig 2012-02-29 23:03:36.594945717 +0100
> +++ fence_legacy 2012-03-01 14:41:46.454859212 +0100
> @@ -105,6 +105,7 @@
> elsif ($name eq "port" )
> {
> $opt_n = $val;
> + $ENV{$name} = $val;
what is this for?
> }
> elsif ($name eq "stonith" )
> {
> @@ -176,8 +177,8 @@
> }
> elsif ( $opt_o eq "monitor" || $opt_o eq "stat" || $opt_o eq "status" )
> {
> - print "Performing: $opt_s -t $opt_t -S $opt_n\n" unless defined $opt_q;
> - exec "$opt_s -t $opt_t $extra_args -S $opt_n" or die "failed to exec \"$opt_s\"\n";
> + print "Performing: $opt_s -t $opt_t -S\n" unless defined $opt_q;
> + exec "$opt_s -t $opt_t $extra_args -S" or die "failed to exec \"$opt_s\"\n";
I was under the impression that -S needed a node name, I see however
that this isnt the case.
Some devices can query the state of an individual port, it seems that
the stonith binary doesn't expose this.
Does everything work when you have this patch?
> }
> else
> {
>
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.jozsef at wigner.mta.hu
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
> H-1525 Budapest 114, POB. 49, Hungary
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list