[Pacemaker] booth is the state of "started" on pacemaker before booth write ticket info in cib.
Yuichi SEINO
seino.cluster2 at gmail.com
Wed Jan 23 04:43:58 UTC 2013
Hi Jiaju,
I understood about the complete solution.
However because this issue causes the critical problem that multiple
resources start, Could you apply this request or simply revert a
commit to tentatively handle this issue until you are resolved at the
summer? I think that we are difficult to avoid this issue by the
operation unlike booth deadlock etc. If booth does not start at the
same time, then booth can avoid deadlock.
This issue caused following things.
* Multiple resources start.
* When booth causes deadlock, the resource timeout dose not happen.
Previous, we could watch timeout on crm_mon. Currently, timeout
happens after booth was daemon.
Sincerely,
Yuichi
2013/1/21 Jiaju Zhang <jjzhang at suse.de>:
> Hi Yuichi,
>
> On Fri, 2013-01-18 at 17:02 +0900, Yuichi SEINO wrote:
>> Hi Jiaju,
>>
>> I try fixing this issue by reverting a commit. What do you think about it?
>> https://github.com/jjzhang/booth/pull/48
>
> Moving the while setup stage before daemonizing seems not to be a sane
> solution. setup_ticket() needs to get the latest ticket information by
> communicating with other nodes. Currently it was there and using TCP,
> but long term and sane solution would be to move it to the main poll(),
> asynchronously waiting for catch-up result. Before catching-up was
> ready, booth can still response, it can participate in Paxos as a
> non-voting member.
>
> To fix this issue, how do you think if we remove the stale ticket
> information in the CIB once booth was starting? We already have the APIs
> in pacemaker.c which can clear the ticket information in the CIB. This
> step is reasonable because the tickets at that moment is really stale
> data.
>
> About the implementation, I have not thought it in very detail but one
> idea that came into my mind is that maybe we can expand lockfile() (or
> some wrapper to lockfile()) to let it do more things, not only record
> the daemon pid, but also record daemon starting status, like "starting",
> "started", thus, the controld RA can read that status and return more
> precise result.
>
> I'll have Xia to look into this problem in more detail.
>
> Thanks,
> Jiaju
>
>
--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.cluster2 at gmail.com
More information about the Pacemaker
mailing list