[Pacemaker] crm_resource -L not trustable right after restart

Thu Jan 16 03:49:26 UTC 2014

On 16 Jan 2014, at 1:13 pm, Brian J. Murrell (brian) <brian at interlinx.bc.ca> wrote:

> On Thu, 2014-01-16 at 08:35 +1100, Andrew Beekhof wrote:
>> 
>> I know, I was giving you another example of when the cib is not completely up-to-date with reality.
> 
> Yeah, I understood that.  I was just countering with why that example is
> actually more acceptable.
> 
>> It may very well be partially started.
> 
> Sure.
> 
>> Its almost certainly not stopped which is what is being reported.
> 
> Right.  But until it is completely started (and ready to do whatever
> it's supposed to do), it might as well be considered stopped.  If you
> have to make a binary state out of stopped, starting, started, I think
> most people will agree that the states are stopped and starting and
> stopped is anything < starting since most things are not useful until
> they are fully started.
> 
>> You're not using the output to decide whether to perform some logic?
> 
> Nope.  Just reporting the state.  But that's difficult when you have two
> participants making positive assertions about state when one is not
> really in a position to do so.
> 
>> Because crm_mon is the more usual command to run right after startup
> 
> The problem with crm_mon is that it doesn't tell you where a resource is
> running.

What crm_mon are you looking at?
I see stuff like:

 virt-fencing	(stonith:fence_xvm):	Started rhos4-node3 
 Resource Group: mysql-group
     mysql-vip	(ocf::heartbeat:IPaddr2):	Started rhos4-node3 
     mysql-fs	(ocf::heartbeat:Filesystem):	Started rhos4-node3 
     mysql-db	(ocf::heartbeat:mysql):	Started rhos4-node3 

> 
>> (which would give you enough context to know things are still syncing).
> 
> That's interesting.  Would polling crm_mon be more efficient than
> polling the remote CIB with cibadmin -Q?

crm_mon in interactive mode subscribes to updates from the cib.
which would be more efficient than repeatedly calling cibadmin or crm_mon 

> 
>> DC election happens at the crmd.
> 
> So would it be fair to say then that I should not trust the local CIB
> until DC election has finished or could there be latency between that
> completing and the CIB being refreshed?

After the join completes (which happens after the election or when a new node is found), then it is safe.
You can tell this by running crmadmin -S -H `uname -n` and looking for S_IDLE, S_POLICY_ENGINE or S_TRANSITION_ENGINE iirc

> 
> If DC election completion is accurate, what's the best way to determine
> that has completed?

Ideally it doesn't happen when a node joins an existing cluster.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140116/5922ccdb/attachment-0004.sig>