[Pacemaker] stonith-ng message in /var/log/messages
Andrew Daugherity
adaugherity at tamu.edu
Wed Sep 29 21:57:13 UTC 2010
Ron Kerry <rkerry at ...> writes:
> I am seeing the following sequence of messages with every monitor interval for
my stonith resource.
>
> Sep 28 10:44:01 genesis stonith-ng: [9493]: ERROR: run_stonith_agent: No
timeout set for stonith
> operation monitor with device fence_legacy
> Sep 28 10:44:01 genesis stonith: l2network device OK.
>
> It is unclear to me what this ERROR means as the resource itself says
everything is fine. There is a
> monitor timeout set in the resource definition.
>
> Distribution is SLES11SP1 (SLE11SP1-HAE).
> cluster-glue 1.0.6-0.3.7
I'm seeing the same problem ever since the latest update rollup from Novell (the
"sleshasp1-ha-update-201009" patch). Example:
Sep 29 16:28:35 imsxen3 stonith-ng: [5182]: ERROR: run_stonith_agent: No timeout
set for stonith operation monitor with device fence_legacy
Sep 29 16:28:36 imsxen3 stonith: external/ipmi device OK.
I downgraded the cluster-glue package (and a couple others, so RPM dependencies
were still satisfied) on one machine and the messages went away on that machine,
while they're still there on the others.
To clarify -- the "no timeout set" error is logged on the machine the stonith
resource is currently running on, each time the monitor operation fires. On the
machine I downgraded cluster-glue on, there are no such errors for any stonith
resource running on that server.
My stonith definitions (in "crm configure" format) are like this:
primitive stonith-imsxen1 stonith:external/ipmi \
meta target-role="Started" \
operations $id="stonith-imsxen2-operations" \
op monitor interval="300" timeout="15" start-delay="15" \
params hostname="imsxen1" ipaddr="10.95.12.51" userid="stonith" passwd="XXXX"
interface="lanplus"
and similarly for stonith-imsxen2 and stonith-imsxen3. (Node names are
imsxen[123].)
STONITH works properly, aside from the annoying messages with the latest version.
Here is the RPM version comparison:
v | SLE11-HAE-SP1-Updates | cluster-glue | 1.0.5-0.5.1 |
1.0.6-0.3.7 | x86_64
v | SLE11-HAE-SP1-Updates | libglue2 | 1.0.5-0.5.1 |
1.0.6-0.3.7 | x86_64
v | SLE11-HAE-SP1-Updates | libpacemaker3 | 1.1.2-0.2.1 |
1.1.2-0.6.1 | x86_64
v | SLE11-HAE-SP1-Updates | pacemaker | 1.1.2-0.2.1 |
1.1.2-0.6.1 | x86_64
v | SLE11-HAE-SP1-Updates | pacemaker-mgmt | 2.0.0-0.2.19 |
2.0.0-0.3.10 | x86_64
I intentionally rolled back the cluster-glue package, and the others were rolled
back to satisfy dependencies. According to the RPM changelog, the "good"
version of cluster-glue (1.0.5-0.5.1) is from Upstream version cs: 6cf2e36df9f4,
while the newer one is from cs: a146a145a3e.
While it's possible this is a problem with Novell's builds, I don't think that
to be likely, since there are no local patches in the RPM spec file.
More information about the Pacemaker
mailing list