[ClusterLabs] FLoating IP failing over but not failing back with active/active LDAP (dirsrv)
Ken Gaillot
kgaillot at redhat.com
Thu Mar 10 15:00:58 UTC 2016
On 03/10/2016 08:48 AM, Bernie Jones wrote:
> A bit more info..
>
>
>
> If, after I restart the failed dirsrv instance, I then perform a "pcs
> resource cleanup dirsrv-daemon" to clear the FAIL messages then the failover
> will work OK.
>
> So it's as if the cleanup is changing the status in some way..
>
>
>
> From: Bernie Jones [mailto:bernie at securityconsulting.ltd.uk]
> Sent: 10 March 2016 08:47
> To: 'Cluster Labs - All topics related to open-source clustering welcomed'
> Subject: [ClusterLabs] FLoating IP failing over but not failing back with
> active/active LDAP (dirsrv)
>
>
>
> Hi all, could you advise please?
>
>
>
> I'm trying to configure a floating IP with an active/active deployment of
> 389 directory server. I don't want pacemaker to manage LDAP but just to
> monitor and switch the IP as required to provide resilience. I've seen some
> other similar threads and based my solution on those.
>
>
>
> I've amended the ocf for slapd to work with 389 DS and this tests out OK
> (dirsrv).
>
>
>
> I've then created my resources as below:
>
>
>
> pcs resource create dirsrv-ip ocf:heartbeat:IPaddr2 ip="192.168.26.100"
> cidr_netmask="32" op monitor timeout="20s" interval="5s" op start
> interval="0" timeout="20" op stop interval="0" timeout="20"
>
> pcs resource create dirsrv-daemon ocf:heartbeat:dirsrv op monitor
> interval="10" timeout="5" op start interval="0" timeout="5" op stop
> interval="0" timeout="5" meta "is-managed=false"
is-managed=false means the cluster will not try to start or stop the
service. It should never be used in regular production, only when doing
maintenance on the service.
> pcs resource clone dirsrv-daemon meta globally-unique="false"
> interleave="true" target-role="Started" "master-max=2"
>
> pcs constraint colocation add dirsrv-daemon-clone with dirsrv-ip
> score=INFINITY
This constraint means that dirsrv is only allowed to run where dirsrv-ip
is. I suspect you want the reverse, dirsrv-ip with dirsrv-daemon-clone,
which means keep the IP with a working dirsrv instance.
> pcs property set no-quorum-policy=ignore
If you're using corosync 2, you generally don't need or want this.
Instead, ensure corosync.conf has two_node: 1 (which will be done
automatically if you used pcs cluster setup).
> pcs resource defaults migration-threshold=1
>
> pcs property set stonith-enabled=false
>
>
>
> On startup all looks well:
>
> ____________________________________________________________________________
> ____________
>
>
>
> Last updated: Thu Mar 10 08:28:03 2016
>
> Last change: Thu Mar 10 08:26:14 2016
>
> Stack: cman
>
> Current DC: ga2.idam.com - partition with quorum
>
> Version: 1.1.11-97629de
>
> 2 Nodes configured
>
> 3 Resources configured
>
>
>
>
>
> Online: [ ga1.idam.com ga2.idam.com ]
>
>
>
> dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga1.idam.com
>
> Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
>
> dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga2.idam.com
> (unmanaged)
>
> dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga1.idam.com
> (unmanaged)
>
>
>
>
>
> ____________________________________________________________________________
> ____________
>
>
>
> Stop dirsrv on ga1:
>
>
>
> Last updated: Thu Mar 10 08:28:43 2016
>
> Last change: Thu Mar 10 08:26:14 2016
>
> Stack: cman
>
> Current DC: ga2.idam.com - partition with quorum
>
> Version: 1.1.11-97629de
>
> 2 Nodes configured
>
> 3 Resources configured
>
>
>
>
>
> Online: [ ga1.idam.com ga2.idam.com ]
>
>
>
> dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga2.idam.com
>
> Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
>
> dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga2.idam.com
> (unmanaged)
>
> dirsrv-daemon (ocf::heartbeat:dirsrv): FAILED ga1.idam.com
> (unmanaged)
>
>
>
> Failed actions:
>
> dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): call=12,
> status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms,
> exec=0ms
>
>
>
> IP fails over to ga2 OK:
>
>
>
> ____________________________________________________________________________
> ____________
>
>
>
> Restart dirsrv on ga1
>
>
>
> Last updated: Thu Mar 10 08:30:01 2016
>
> Last change: Thu Mar 10 08:26:14 2016
>
> Stack: cman
>
> Current DC: ga2.idam.com - partition with quorum
>
> Version: 1.1.11-97629de
>
> 2 Nodes configured
>
> 3 Resources configured
>
>
>
>
>
> Online: [ ga1.idam.com ga2.idam.com ]
>
>
>
> dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga2.idam.com
>
> Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
>
> dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga2.idam.com
> (unmanaged)
>
> dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga1.idam.com
> (unmanaged)
>
>
>
> Failed actions:
>
> dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): call=12,
> status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms,
> exec=0ms
>
>
>
> ____________________________________________________________________________
> ____________
>
>
>
> Stop dirsrv on ga2:
>
>
>
> Last updated: Thu Mar 10 08:31:14 2016
>
> Last change: Thu Mar 10 08:26:14 2016
>
> Stack: cman
>
> Current DC: ga2.idam.com - partition with quorum
>
> Version: 1.1.11-97629de
>
> 2 Nodes configured
>
> 3 Resources configured
>
>
>
>
>
> Online: [ ga1.idam.com ga2.idam.com ]
>
>
>
> dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga2.idam.com
>
> Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
>
> dirsrv-daemon (ocf::heartbeat:dirsrv): FAILED ga2.idam.com
> (unmanaged)
>
> dirsrv-daemon (ocf::heartbeat:dirsrv): Started ga1.idam.com
> (unmanaged)
>
>
>
> Failed actions:
>
> dirsrv-daemon_monitor_10000 on ga2.idam.com 'not running' (7): call=11,
> status=complete, last-rc-change='Thu Mar 10 08:31:12 2016', queued=0ms,
> exec=0ms
>
> dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): call=12,
> status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms,
> exec=0ms
>
>
>
> But IP stays on failed node
>
> Looking in the logs it seems that the cluster is not aware that ga1 is
> available even though the status output shows it is.
>
>
>
> If I repeat the tests but with ga2 started up first the behaviour is similar
> i.e. it fails over to ga1 but not back to ga2.
>
>
>
> Many thanks,
>
> Bernie
More information about the Users
mailing list