[Pacemaker] Node in pending state, resources duplicated and data corruption
Gabriel Gomiz
ggomiz at cooperativaobrera.coop
Wed Mar 19 10:56:38 UTC 2014
On 03/18/2014 09:04 PM, Andrew Beekhof wrote:
> Riiiight, so this is the story:
>
> Mar 08 08:43:22 [9934] lorien crmd: info: do_dc_takeover: Taking over DC status for this partition
> Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Peer gandalf was terminated (st_notify_fence) by mordor for gandalf: OK (ref=10d27664-33ed-43e0-a5bd-7d0ef850eb05) by client crmd.31561
> Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Notified CMAN that 'gandalf' is now fenced
> Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Target may have been our leader gandalf (recorded: <unset>)
> Mar 08 09:13:52 [9934] lorien crmd: info: do_dc_takeover: Taking over DC status for this partition
> Mar 08 09:13:52 [9934] lorien crmd: notice: do_dc_takeover: Marking gandalf, target of a previous stonith action, as clean
>
> In tengine_stonith_notify() we potentially add things to stonith_cleanup_list and then in do_dc_takeover() we check the stonith_cleanup_list and mark any nodes in it as clean.
>
> As you can see above, the stonith notification comes just after the call to do_dc_takeover().
> In the version you have there is some dodgy code in tengine_stonith_notify() which incorrectly adds gandalf to stonith_cleanup_list, causing Pacemaker to (incorrectly) erase its status section at 9:13:52 when another election occurs.
>
> This was fixed during the RC-phase of Pacemaker-1.1.10:
>
> https://github.com/beekhof/pacemaker/commit/f30e1e43
>
> I don't believe I quite understood the severity of that fix at the time (otherwise I'd have made more noise about it).
>
> Since you're on CentOS 6.4, there should already be updated packages that include this fix.
Andrew: thanks again for taking the time to check this case. We will be updating to 1.1.10 as soon as possible. Hugs!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 555 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140319/0413cd36/attachment-0004.sig>
More information about the Pacemaker
mailing list