[Pacemaker] Cluster node getting stopped from other node(resending mail)

Tue Aug 4 01:28:04 UTC 2015

We need a crm_report archive to be able to comment on this sort of thing.
A handful of logs from one of the nodes isn’t anywhere near enough.

> On 29 Jun 2015, at 4:42 pm, Arjun Pandey <apandepublic at gmail.com> wrote:
> 
> 
> Hi 
> 
> I am running a 2 node cluster with this config on centos 6.5/6.6
> 
> Master/Slave Set: foo-master [foo]
> Masters: [ messi ]
> Stopped: [ronaldo ]
>  eth1-CP        (ocf::pw:IPaddr):       Started messi
>  eth2-UP        (ocf::pw:IPaddr):       Started messi
>  eth3-UPCP      (ocf::pw:IPaddr):       Started messi
> 
> where i have a multi-state resource foo being run in master/slave mode and  IPaddr RA is just modified IPAddr2 RA. Additionally i have a
> collocation constraint for the IP addr to be collocated with the master.
> 
> Sometimes when i setup the cluster , i find that one of the nodes (the second node that joins ) gets stopped and i find this log.
> 
> 2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker Cluster Manager
> 2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]:   notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
> 2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]:   notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAG
> E origin=do_cl_join_finalize_respond ]
> 2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
> 2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]:   notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
> 2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]:    error: handle_request: We didn't ask to be shut down, yet our DC is telling us too.
> 2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]:   notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE
>  origin=route_message ]
> 2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]:   notice: lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown... waiting (2 ops remaining)
> 
> Based on the logs , pacemaker on active was stopping the secondary cloud everytime it joins cluster. This issue seems similar to 
> http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error
> 
> Packages used :-
> pacemaker-1.1.12-4.el6.x86_64
> pacemaker-libs-1.1.12-4.el6.x86_64
> pacemaker-cli-1.1.12-4.el6.x86_64
> pacemaker-cluster-libs-1.1.12-4.el6.x86_64
> pacemaker-debuginfo-1.1.10-14.el6.x86_64
> pcsc-lite-libs-1.5.2-13.el6_4.x86_64
> pcs-0.9.90-2.el6.centos.2.noarch
> pcsc-lite-1.5.2-13.el6_4.x86_64
> pcsc-lite-openct-0.6.19-4.el6.x86_64
> corosync-1.4.1-17.el6.x86_64
> corosynclib-1.4.1-17.el6.x86_64
> 
> 
> 
> Thanks in advance for your help
> 
> Regards
> Arjun
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org