[Pacemaker] fence_legacy, stonith and apcmastersnmp

Kadlecsik József kadlecsik.jozsef at wigner.mta.hu
Thu Mar 1 13:51:35 UTC 2012


Hello,

After upgrading to pacemaker 1.1.6, cluster-glue 1.0.8 on Debian, our 
working apcmastersnmp resources stopped to work:

Feb 29 14:22:03 atlas0 stonith: [35438]: ERROR: apcmastersnmp device not 
accessible.
Feb 29 14:22:03 atlas0 stonith-ng: [32972]: notice: log_operation: 
Operation 'monitor' [35404] for device 'stonith-atlas6' returned: -2
Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation: 
stonith-atlas6: Performing: stonith -t apcmastersnmp -S 161
Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation: 
stonith-atlas6: Invalid config info for apcmastersnmp device

Please note the strange "161" argument of stonith.

After checking the source code and stracing stonithd, as far as I see, the 
following happens:

- stonithd calls fence_legacy, which steals the "port=161" parameter from 
  apcmastersnmp. This produces the error message
  "Invalid config info for apcmastersnmp device"
- At stealing "port=161", fence_legacy sets the port value to the node 
  name and passes to stonith, even in status mode. Therefore we
  get "stonith -t apcmastersnmp -S 161"
- However stonith cannot catch the invalid node parameter:

        if (!(argcount == 1 || (argcount < 1
        &&      (status||listhosts||listtypes||listparanames||metadata)))) 
{
                ++errors;
        }

   and even in status mode wants to run the reset request too:

                if (status) {
 			< no exit >
		}
		if (listhosts) {
			< no exit >
		}
		if (optind < argc) {
			...
			rc = stonith_req_reset(s, reset_type, nodename);
		}

Fortunately the port value does not match nodename, so it won't kill any 
node, but the agent fails.

Am I on the right track? Would the following patch fix the issue? I'm 
asking it, because I don't know why "port=" is handled separatedly and 
what are the implications of deleting $opt_n below.

--- fence_legacy.orig	2012-02-29 23:03:36.594945717 +0100
+++ fence_legacy	2012-03-01 14:41:46.454859212 +0100
@@ -105,6 +105,7 @@
 	elsif ($name eq "port" ) 
 	{
             $opt_n = $val;
+            $ENV{$name} = $val;
         } 
 	elsif ($name eq "stonith" ) 
 	{
@@ -176,8 +177,8 @@
    }
    elsif ( $opt_o eq "monitor" || $opt_o eq "stat" || $opt_o eq "status" ) 
    {
-       print "Performing: $opt_s -t $opt_t -S $opt_n\n" unless defined $opt_q;
-       exec "$opt_s -t $opt_t $extra_args -S $opt_n" or die "failed to exec \"$opt_s\"\n";
+       print "Performing: $opt_s -t $opt_t -S\n" unless defined $opt_q;
+       exec "$opt_s -t $opt_t $extra_args -S" or die "failed to exec \"$opt_s\"\n";
    }
    else
    {

Best regards,
Jozsef
--
E-mail : kadlecsik.jozsef at wigner.mta.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
         H-1525 Budapest 114, POB. 49, Hungary




More information about the Pacemaker mailing list