[Pacemaker] master-slave set staggered restarts

Jay Janssen jay.janssen at percona.com
Thu Mar 6 15:39:53 UTC 2014


primitive p_service ... \
        op monitor interval="2s" role="Master" \
        op monitor interval="5s" role="Slave" \
        op start timeout="10000s" interval="0"
ms ms_service p_service \
        meta master-max="3" clone-max="3" target-role="Started" is-managed="true" ordered="false" interleave="true" notify="false"

In my case I have all three nodes happily in the ‘master’ state and for a test I simultaneously cause the underlying service to fail on all of them.  In all cases, the next monitor operation returns OCF_FAILED_MASTER.  Subsequent monitor checks will then return OCF_ERR_NOT_RUNNING, since the node falls out of the Master state. 

I want all the resource clones to issue the start operation more or less at the same time (hence ordered=”false”) so I can use the start operation to coordination amongst the nodes as they start (true start order is important and depends on node state, so I’m overloading the start action for this coordination in the all-down state).  For the most part, this is working fine.

However, I seem to be getting into a race condition where a node (seemingly the one that happens to detect OCF_FAILED_MASTER last) ends up NOT starting until AFTER the others start.   Two nodes get ‘start’ issued at the same time like I want, but the 3rd gets stuck still monitoring ( and returning errors) until after the others start.  Once the others are up, then the 3rd finally unblocks and starts.

Is this expected behavior in Pacemaker?   Is there any way I can suppress this behavior and get nodes to start irrespective of long start operations on other nodes? 


FWIW, if someone can explain to me how I can enforce start action order in a group like this in a more Pacemaker friendly way, I’m all ears.  



Jay Janssen
http://about.me/jay.janssen


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140306/f61fbde1/attachment-0003.sig>


More information about the Pacemaker mailing list