[Pacemaker] Revenge of the cluster-glue clplumbing ABI change (a public service announcement)
Tim Serong
tserong at novell.com
Wed Jul 21 07:41:09 UTC 2010
Hi All,
A while ago (April, from memory), there was an ABI change in
clplumbing in cluster-glue. Presumably this went mostly unnoticed
in general usage, however I have twice seen systems where the cluster
could not run because of a missing (or incorrect) libglue2 package.
One was my development system, with a dodgy build, the other was
mentioned on #linux-ha yesterday, and was the result of ignoring a
conflict error when installing the pacemaker RPM on openSUSE. So,
let me be clear, this is not something anyone should need to worry
about... But I thought I'd mention it here, because the error
messages you get are, IMO, not very obvious.
Symptoms of a mismatched pacemaker/libglue build are errors like:
lrmd: [3004]: ERROR:
main: can not create wait connection for command.
lrmd: [3004]: ERROR:
Startup aborted (can't create comm channel). Shutting down.
...
pengine: [4011]: ERROR:
init_client_ipc_comms_nodispatch: Could not access channel on:
/var/run/crm/pengine
corosync[4000]: [pcmk ] ERROR:
pcmk_wait_dispatch: Child process pengine exited (pid=4011, rc=1)
corosync[4000]: [pcmk ] notice:
pcmk_wait_dispatch: Respawning failed child process: pengine
If your cluster won't start and you see this in /var/log/messages,
make sure libglue2 is up to date. And now that I've mentioned this
here and it's made it to the mailing list archive, Google will know,
and nobody else will ever have this problem again.
This has been a public service announcement. Thank you for reading.
Tim
--
Tim Serong <tserong at novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.
More information about the Pacemaker
mailing list