[Pacemaker] stopped resource was judged to be active

Andrew Beekhof andrew at beekhof.net
Mon Feb 17 20:43:03 EST 2014


On 10 Feb 2014, at 5:28 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:

> Hi,
> 
> Pacemaker stopped, but it was judged that a resource was active.
> I put crm_report here.
> https://drive.google.com/file/d/0B9eNn1AWfKD4S29JWk1ldUJJNGs/edit?usp=sharing
> 
> [Steps to reproduce]
> 1) start up the cluster
> 
> Stack: corosync
> Current DC: bl460g1n7 (3232261593) - partition with quorum
> Version: 1.1.10-21de3a0
> 2 Nodes configured
> 34 Resources configured
> 
> 
> Online: [ bl460g1n6 bl460g1n7 ]
> 
> Full list of resources:
> ...snip...
> 
> 
> * election-attrd exists in bl460g1n7.
> Feb  4 14:06:38 bl460g1n7 attrd[28811]:     info: election_complete:
> Election election-attrd complete
> 
> 
> 2) banish election-attrd from DC node
> I suppose that it is a condition that there are DC and election-attrd
> in a different node.
> 
> [bl460g1n7]$ pkill -9 attrd
> Feb  4 14:07:15 bl460g1n6 attrd[16927]:     info: election_complete:
> Election election-attrd complete
> 
> 
> 3) stop DC ( after making a resource fail )
> [bl460g1n7]$ stop pacemaker.combined
> Feb  4 14:09:39 bl460g1n7 crmd[28813]:   notice: process_lrm_event:
> LRM operation prmClone9_stop_0 (call=150, rc=0, cib-update=98,
> confirmed=true) ok

There are cases when <= .11 could loose resource updates like this.
The subsequent behaviour by pacemaker (fencing the node) is correct but clearly suboptimal.

Happily the same code that improves the CIB's performance also makes this impossible.
So if you should find this problem gone if you try with the current git master.

> :
> Feb  4 14:09:39 bl460g1n7 pacemakerd[28803]:     info: main: Exiting pacemakerd
> Feb  4 14:09:39 bl460g1n7 pacemakerd[28803]:     info:
> crm_xml_cleanup: Cleaning up memory from libxml2
> 
> * pacemaker of bl460g1n7 stopped normally, but bl460g1n6 judged that a
>  resource was active.
> Feb  4 14:09:41 bl460g1n6 pengine[16928]:  warning: pe_fence_node:
> Node bl460g1n7 will be fenced because prmClone9:0 is thought to be
> active there
> 
> 
> Best regards,
> Kazunori INOUE
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140218/c0eab96e/attachment-0003.sig>


More information about the Pacemaker mailing list