[Pacemaker] [Partially SOLVED] pacemaker/dlm problems

Vladislav Bogdanov bubble at hoster-ok.com
Mon Nov 14 15:36:19 EST 2011


Hi Andrew,

I just found another problem with dlm_controld.pcmk (with your latest
patch from github applied and also my fixes to actually build it - they
are included in a message referenced by this one).
One node which just requested fencing of another one stucks at printing
that message where you print ctime() in fence_node_time() (pacemaker.c
near 293) every second. No other messages appear, although
fence_node_time() is called only from check_fencing_done() (cpg.c near
444). So, both of (last_fenced_time >= node->fail_time) and
(!node->fence_queries || node->fence_time != last_fenced_time) are
false, otherwise one of messages for that cases should be shown. Then,
fence_node_time() seems to return 0 from
if (wait_count)
	return 0;
(wait_count is incremented if (last_fenced_time >= node->fail_time) is
false), so it never reaches check_fencing_done() call and never return
expected 1.
Offending node was actually fenced, but that was actually not handled by
dlm_controld.

May I ask you to help me a bit with all that logic (as you already dived
into dlm_controld sources again), I seem to be so near the success... :|

btw, I cant find what source is your dlm repo forked from, may be you
remember?

Best,
Vladislav

28.09.2011 17:41, Vladislav Bogdanov wrote:
> Hi Andrew,
> 
>>> All the more reason to start using the stonith api directly.
>>> I was playing around list night with the dlm_controld.pcmk code:
>>>    https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787
>>
>> Doesn't seem to apply to 3.0.17, so I rebased that commit against it for
>> my build. Then it doesn't compile without attached patch.
>> It may need to be rebased a bit against your tree.
>>
>> Now I have package built and am building node images. Will try shortly.
> 
> Fencing from within dlm_controld.pcmk still did not work with your first
> patch against that _no_mainloop function (expected).
> 
> So I did my best to build packages from the current git tree.
> 
> Voila! I got failed node correctly fenced!
> I'll do some more extensive testing next days, but I believe everything
> should be much better now.
> 
> I knew you're genius he-he ;)
> 
> So, here are steps to get DLM handle CPG NODEDOWN events correctly with
> pacemaker using openais stack:
> 
> 1. Build pacemaker (as of 2011-09-28) from git.
> 2. Apply attached patches to cluster-3.0.17 source tree.
> 3. Build dlm_controld.pcmk
> 
> One note - gfs2_controld probably needs to be fixed too (FIXME).
> 
> Best regards,
> Vladislav
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list