[Pacemaker] CIB write-to-disk bug?

Thu Apr 8 09:54:56 EDT 2010

On Fri, Apr 2, 2010 at 4:16 PM, Alan Robertson <alanr at unix.sh> wrote:
>> Do it again, with higher log level.  Sorry, no time right now to rebuild
>> your exact thing with your exact gcc and stuff to look at your core file.
>
> You can just download the RPM and extract the objects.  That's what I used.

Spend half a day mirroring the RHEL54 tree and farting around with gdb
to try to get a sensible trace? Not likely.
And please tell me these aren't production machines, you really should
know better than to be using external/ssh outside of CTS.

Back to the logs, it looks like the initial digest is incorrect.

Mar 31 19:02:52 vhost0384 cib: [13294]: info: write_cib_contents:
Wrote version 0.50.0 of the CIB to disk (digest:
316049fa7ee8d2e107573ce7cded07cf)
Mar 31 19:02:52 vhost0384 cib: [13294]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.uHFtAW (digest:
/var/lib/heartbeat/crm/cib.GUdD9T)
Mar 31 19:02:52 vhost0384 cib: [13294]: ERROR: validate_cib_digest:
Digest comparision failed: expected 316049fa7ee8d2e107573ce7cded07cf
(/var/lib/heartbeat/crm/cib.GUdD9T), calculated
0bac3440f5c42f0f37d22ea7dfe433e8

Based on cib.uHFtAW, the correct digest would appear to be the
calculated one and not the one written to cib.GUdD9T.
Absolutely no idea how that could be the case, is it repeatable?

I do notice though, that the location constraint is recorded in the
cib unformatted (indicating something is amiss):

    <rsc_location id="cli-standby-nginx-group" rsc="nginx-group"><rule
id="cli-standby-rule-nginx-group" score="-INFINITY"
boolean-op="and"><expression id="cli-standby-expr-nginx-group"
attribute="#uname" operation="eq" value="vhost0330"
type="string"/></rule></rsc_location></constraints>

and the addition of that constraint was also the change that triggered
the behavior.
It also looks related to the link lge posted.  Can you please verify
if your systems are affected by that bug.

How did you load it btw? There's no record of it in the logs.
This is why we prefer hb_reports containing the info from both machines.