[Pacemaker] crm_resource -L not trustable right after restart
Andrew Beekhof
andrew at beekhof.net
Thu Jan 16 03:49:26 UTC 2014
On 16 Jan 2014, at 1:13 pm, Brian J. Murrell (brian) <brian at interlinx.bc.ca> wrote:
> On Thu, 2014-01-16 at 08:35 +1100, Andrew Beekhof wrote:
>>
>> I know, I was giving you another example of when the cib is not completely up-to-date with reality.
>
> Yeah, I understood that. I was just countering with why that example is
> actually more acceptable.
>
>> It may very well be partially started.
>
> Sure.
>
>> Its almost certainly not stopped which is what is being reported.
>
> Right. But until it is completely started (and ready to do whatever
> it's supposed to do), it might as well be considered stopped. If you
> have to make a binary state out of stopped, starting, started, I think
> most people will agree that the states are stopped and starting and
> stopped is anything < starting since most things are not useful until
> they are fully started.
>
>> You're not using the output to decide whether to perform some logic?
>
> Nope. Just reporting the state. But that's difficult when you have two
> participants making positive assertions about state when one is not
> really in a position to do so.
>
>> Because crm_mon is the more usual command to run right after startup
>
> The problem with crm_mon is that it doesn't tell you where a resource is
> running.
What crm_mon are you looking at?
I see stuff like:
virt-fencing (stonith:fence_xvm): Started rhos4-node3
Resource Group: mysql-group
mysql-vip (ocf::heartbeat:IPaddr2): Started rhos4-node3
mysql-fs (ocf::heartbeat:Filesystem): Started rhos4-node3
mysql-db (ocf::heartbeat:mysql): Started rhos4-node3
>
>> (which would give you enough context to know things are still syncing).
>
> That's interesting. Would polling crm_mon be more efficient than
> polling the remote CIB with cibadmin -Q?
crm_mon in interactive mode subscribes to updates from the cib.
which would be more efficient than repeatedly calling cibadmin or crm_mon
>
>> DC election happens at the crmd.
>
> So would it be fair to say then that I should not trust the local CIB
> until DC election has finished or could there be latency between that
> completing and the CIB being refreshed?
After the join completes (which happens after the election or when a new node is found), then it is safe.
You can tell this by running crmadmin -S -H `uname -n` and looking for S_IDLE, S_POLICY_ENGINE or S_TRANSITION_ENGINE iirc
>
> If DC election completion is accurate, what's the best way to determine
> that has completed?
Ideally it doesn't happen when a node joins an existing cluster.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140116/5922ccdb/attachment-0004.sig>
More information about the Pacemaker
mailing list