[ClusterLabs] How can I prevent multiple start of IPaddr 2 in an environment using fence_mpath?
飯田 雄介
iidayuus at intellilink.co.jp
Fri Apr 6 00:30:46 EDT 2018
Hi, all
I am testing the environment using fence_mpath with the following settings.
=======
Stack: corosync
Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition with quorum
Last updated: Fri Apr 6 13:16:20 2018
Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e
2 nodes configured
13 resources configured
Online: [ x3650e x3650f ]
Full list of resources:
fenceMpath-x3650e (stonith:fence_mpath): Started x3650e
fenceMpath-x3650f (stonith:fence_mpath): Started x3650f
Resource Group: grpPostgreSQLDB
prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Started x3650e
prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Started x3650e
prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Started x3650e
prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650e
Resource Group: grpPostgreSQLIP
prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Started x3650e
Clone Set: clnDiskd1 [prmDiskd1]
Started: [ x3650e x3650f ]
Clone Set: clnDiskd2 [prmDiskd2]
Started: [ x3650e x3650f ]
Clone Set: clnPing [prmPing]
Started: [ x3650e x3650f ]
=======
When split-brain occurs in this environment, x3650f executes fence and the resource is started with x3650f.
=== view of x3650e ====
Stack: corosync
Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition WITHOUT quorum
Last updated: Fri Apr 6 13:16:36 2018
Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e
2 nodes configured
13 resources configured
Node x3650f: UNCLEAN (offline)
Online: [ x3650e ]
Full list of resources:
fenceMpath-x3650e (stonith:fence_mpath): Started x3650e
fenceMpath-x3650f (stonith:fence_mpath): Started[ x3650e x3650f ]
Resource Group: grpPostgreSQLDB
prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Started x3650e
prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Started x3650e
prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Started x3650e
prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650e
Resource Group: grpPostgreSQLIP
prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Started x3650e
Clone Set: clnDiskd1 [prmDiskd1]
prmDiskd1 (ocf::pacemaker:diskd): Started x3650f (UNCLEAN)
Started: [ x3650e ]
Clone Set: clnDiskd2 [prmDiskd2]
prmDiskd2 (ocf::pacemaker:diskd): Started x3650f (UNCLEAN)
Started: [ x3650e ]
Clone Set: clnPing [prmPing]
prmPing (ocf::pacemaker:ping): Started x3650f (UNCLEAN)
Started: [ x3650e ]
=== view of x3650f ====
Stack: corosync
Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition WITHOUT quorum
Last updated: Fri Apr 6 13:16:36 2018
Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e
2 nodes configured
13 resources configured
Online: [ x3650f ]
OFFLINE: [ x3650e ]
Full list of resources:
fenceMpath-x3650e (stonith:fence_mpath): Started x3650f
fenceMpath-x3650f (stonith:fence_mpath): Started x3650f
Resource Group: grpPostgreSQLDB
prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Started x3650f
prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Started x3650f
prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Started x3650f
prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650f
Resource Group: grpPostgreSQLIP
prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Started x3650f
Clone Set: clnDiskd1 [prmDiskd1]
Started: [ x3650f ]
Stopped: [ x3650e ]
Clone Set: clnDiskd2 [prmDiskd2]
Started: [ x3650f ]
Stopped: [ x3650e ]
Clone Set: clnPing [prmPing]
Started: [ x3650f ]
Stopped: [ x3650e ]
=======
However, IPaddr2 of x3650e will not stop until pgsql monitor error occurs.
At this time, IPaddr2 is temporarily started on two nodes.
=== view of after pgsql monitor error ===
Stack: corosync
Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition WITHOUT quorum
Last updated: Fri Apr 6 13:16:56 2018
Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e
2 nodes configured
13 resources configured
Node x3650f: UNCLEAN (offline)
Online: [ x3650e ]
Full list of resources:
fenceMpath-x3650e (stonith:fence_mpath): Started x3650e
fenceMpath-x3650f (stonith:fence_mpath): Started[ x3650e x3650f ]
Resource Group: grpPostgreSQLDB
prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Started x3650e
prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Started x3650e
prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Started x3650e
prmApPostgreSQLDB (ocf::heartbeat:pgsql): Stopped
Resource Group: grpPostgreSQLIP
prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Stopped
Clone Set: clnDiskd1 [prmDiskd1]
prmDiskd1 (ocf::pacemaker:diskd): Started x3650f (UNCLEAN)
Started: [ x3650e ]
Clone Set: clnDiskd2 [prmDiskd2]
prmDiskd2 (ocf::pacemaker:diskd): Started x3650f (UNCLEAN)
Started: [ x3650e ]
Clone Set: clnPing [prmPing]
prmPing (ocf::pacemaker:ping): Started x3650f (UNCLEAN)
Started: [ x3650e ]
Node Attributes:
* Node x3650e:
+ default_ping_set : 100
+ diskcheck_status : normal
+ diskcheck_status_internal : normal
Migration Summary:
* Node x3650e:
prmApPostgreSQLDB: migration-threshold=1 fail-count=1 last-failure='Fri Apr 6 13:16:39 2018'
Failed Actions:
* prmApPostgreSQLDB_monitor_10000 on x3650e 'not running' (7): call=60, status=complete, exitreason='Configuration file /dbfp/pgdata/data/postgresql.conf doesn't exist',
last-rc-change='Fri Apr 6 13:16:39 2018', queued=0ms, exec=0ms
======
We regard this behavior as a problem.
Is there a way to avoid this behavior?
Regards, Yusuke
More information about the Users
mailing list