[ClusterLabs] FLoating IP failing over but not failing back with active/active LDAP (dirsrv)

Bernie Jones bernie at securityconsulting.ltd.uk
Thu Mar 10 09:48:42 EST 2016


A bit more info..

 

If, after I restart the failed dirsrv instance, I then perform a "pcs
resource cleanup dirsrv-daemon" to clear the FAIL messages then the failover
will work OK.

So it's as if the cleanup is changing the status in some way..

 

From: Bernie Jones [mailto:bernie at securityconsulting.ltd.uk] 
Sent: 10 March 2016 08:47
To: 'Cluster Labs - All topics related to open-source clustering welcomed'
Subject: [ClusterLabs] FLoating IP failing over but not failing back with
active/active LDAP (dirsrv)

 

Hi all, could you advise please?

 

I'm trying to configure a floating IP with an active/active deployment of
389 directory server. I don't want pacemaker to manage LDAP but just to
monitor and switch the IP as required to provide resilience. I've seen some
other similar threads and based my solution on those.

 

I've amended the ocf for slapd to work with 389 DS and this tests out OK
(dirsrv).

 

I've then created my resources as below:

 

pcs resource create dirsrv-ip ocf:heartbeat:IPaddr2 ip="192.168.26.100"
cidr_netmask="32" op monitor timeout="20s" interval="5s" op start
interval="0" timeout="20" op stop interval="0" timeout="20"

pcs resource create dirsrv-daemon ocf:heartbeat:dirsrv op monitor
interval="10" timeout="5" op start interval="0" timeout="5" op stop
interval="0" timeout="5" meta "is-managed=false"

pcs resource clone dirsrv-daemon meta globally-unique="false"
interleave="true" target-role="Started" "master-max=2"

pcs constraint colocation add dirsrv-daemon-clone with dirsrv-ip
score=INFINITY

pcs property set no-quorum-policy=ignore

pcs resource defaults migration-threshold=1

pcs property set stonith-enabled=false

 

On startup all looks well:

____________________________________________________________________________
____________

 

Last updated: Thu Mar 10 08:28:03 2016

Last change: Thu Mar 10 08:26:14 2016

Stack: cman

Current DC: ga2.idam.com - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured

3 Resources configured

 

 

Online: [ ga1.idam.com ga2.idam.com ]

 

dirsrv-ip   (ocf::heartbeat:IPaddr2):     Started ga1.idam.com

 Clone Set: dirsrv-daemon-clone [dirsrv-daemon]

     dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started ga2.idam.com
(unmanaged)

     dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started ga1.idam.com
(unmanaged)

 

 

____________________________________________________________________________
____________

 

Stop dirsrv on ga1:

 

Last updated: Thu Mar 10 08:28:43 2016

Last change: Thu Mar 10 08:26:14 2016

Stack: cman

Current DC: ga2.idam.com - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured

3 Resources configured

 

 

Online: [ ga1.idam.com ga2.idam.com ]

 

dirsrv-ip   (ocf::heartbeat:IPaddr2):     Started ga2.idam.com

 Clone Set: dirsrv-daemon-clone [dirsrv-daemon]

     dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started ga2.idam.com
(unmanaged)

     dirsrv-daemon      (ocf::heartbeat:dirsrv):        FAILED ga1.idam.com
(unmanaged)

 

Failed actions:

    dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): call=12,
status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms,
exec=0ms

 

IP fails over to ga2 OK:

 

____________________________________________________________________________
____________

 

Restart dirsrv on ga1

 

Last updated: Thu Mar 10 08:30:01 2016

Last change: Thu Mar 10 08:26:14 2016

Stack: cman

Current DC: ga2.idam.com - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured

3 Resources configured

 

 

Online: [ ga1.idam.com ga2.idam.com ]

 

dirsrv-ip   (ocf::heartbeat:IPaddr2):     Started ga2.idam.com

 Clone Set: dirsrv-daemon-clone [dirsrv-daemon]

     dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started ga2.idam.com
(unmanaged)

     dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started ga1.idam.com
(unmanaged)

 

Failed actions:

    dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): call=12,
status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms,
exec=0ms

 

____________________________________________________________________________
____________

 

Stop dirsrv on ga2:

 

Last updated: Thu Mar 10 08:31:14 2016

Last change: Thu Mar 10 08:26:14 2016

Stack: cman

Current DC: ga2.idam.com - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured

3 Resources configured

 

 

Online: [ ga1.idam.com ga2.idam.com ]

 

dirsrv-ip   (ocf::heartbeat:IPaddr2):     Started ga2.idam.com

 Clone Set: dirsrv-daemon-clone [dirsrv-daemon]

     dirsrv-daemon      (ocf::heartbeat:dirsrv):        FAILED ga2.idam.com
(unmanaged)

     dirsrv-daemon      (ocf::heartbeat:dirsrv):        Started ga1.idam.com
(unmanaged)

 

Failed actions:

    dirsrv-daemon_monitor_10000 on ga2.idam.com 'not running' (7): call=11,
status=complete, last-rc-change='Thu Mar 10 08:31:12 2016', queued=0ms,
exec=0ms

    dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): call=12,
status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms,
exec=0ms

 

But IP stays on failed node

Looking in the logs it seems that the cluster is not aware that ga1 is
available even though the status output shows it is.

 

If I repeat the tests but with ga2 started up first the behaviour is similar
i.e. it fails over to ga1 but not back to ga2.

 

Many thanks,

Bernie

 

 

 

  _____  


 <https://www.avast.com/antivirus> Avast logo

This email has been checked for viruses by Avast antivirus software. 
www.avast.com <https://www.avast.com/antivirus>  





  _____  


 <https://www.avast.com/antivirus> Avast logo

This email has been checked for viruses by Avast antivirus software. 
www.avast.com <https://www.avast.com/antivirus>  

 



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160310/bc577f94/attachment-0003.html>


More information about the Users mailing list