[Pacemaker] [Problem]The trouble of the slave node influences a master.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Tue Mar 29 22:32:03 EDT 2011


Hi,

We examined master slave constitution of drbd.
We made a node of iSCSI in drbd as data of postgreSQL.

We confirmed stop trouble of drbd in an iSCSI node.

Step1) We start an iSCSI node. (Node C and Node D)
 * We use a stonith module(stonith-helper) to need time for.

============
Last updated: Wed Mar 30 10:47:08 2011
Stack: Heartbeat
Current DC: bl460g1d (2289caf8-1062-4f58-ab95-075cdcdb4de2) - partition with quorum
Version: 1.0.10-b0266dd5ffa9c51377c68b1f29d6bc84367f51dd
2 Nodes configured, unknown expected votes
7 Resources configured.
============

Online: [ bl460g1c bl460g1d ]

 Master/Slave Set: msGroup01
     Masters: [ bl460g1c ]
     Slaves: [ bl460g1d ]
 Resource Group: iSCSIgroup01
     prmiSCSITarget     (ocf::heartbeat:iSCSITarget):   Started bl460g1c
     prmiSCSILogicalUnit        (ocf::heartbeat:iSCSILogicalUnit):      Started bl460g1c
     prmIpiSCSI (ocf::heartbeat:IPaddr2):       Started bl460g1c
 Clone Set: clnPingd
     Started: [ bl460g1c bl460g1d ]
 Clone Set: clnDiskd1
     Started: [ bl460g1c bl460g1d ]
 Clone Set: clnDiskd2
     Started: [ bl460g1c bl460g1d ]
 Resource Group: grpStonith1
     prmStonithN1-1     (stonith:external/stonith-helper):      Started bl460g1d
     prmStonithN1-2     (stonith:external/riloe):       Started bl460g1d
     prmStonithN1-3     (stonith:meatware):     Started bl460g1d
 Resource Group: grpStonith2
     prmStonithN2-1     (stonith:external/stonith-helper):      Started bl460g1c
     prmStonithN2-2     (stonith:external/riloe):       Started bl460g1c
     prmStonithN2-3     (stonith:meatware):     Started bl460g1c

Step2) We start a pgsql node. (Node A and Node B)
 * These nodes refer to the data of the iSCSI node.

============
Last updated: Wed Mar 30 11:10:54 2011
Stack: Heartbeat
Current DC: bl460g1b (ac007adb-78c8-4209-9e8c-2cae225e775f) - partition with quorum
Version: 1.0.10-b0266dd5ffa9c51377c68b1f29d6bc84367f51dd
2 Nodes configured, unknown expected votes
6 Resources configured.
============

Online: [ bl460g1a bl460g1b ]

 Resource Group: grpPostgreSQLDB
     prmExPostgreSQLDB  (ocf::heartbeat:sfex):  Started bl460g1a
     prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem):    Started bl460g1a
     prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem):    Started bl460g1a
     prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem):    Started bl460g1a
     prmIpPostgreSQLDB  (ocf::heartbeat:IPaddr2):       Started bl460g1a
     prmApPostgreSQLDB  (ocf::heartbeat:pgsql): Started bl460g1a
 Clone Set: clnPingd
     Started: [ bl460g1a bl460g1b ]
 Clone Set: clnDiskd1
     Started: [ bl460g1a bl460g1b ]
 Clone Set: clnDiskd2
     Started: [ bl460g1a bl460g1b ]
 Resource Group: grpStonith1
     prmStonithN1-1     (stonith:external/stonith-helper):      Started bl460g1b
     prmStonithN1-2     (stonith:external/riloe):       Started bl460g1b
     prmStonithN1-3     (stonith:meatware):     Started bl460g1b
 Resource Group: grpStonith2
     prmStonithN2-1     (stonith:external/stonith-helper):      Started bl460g1a
     prmStonithN2-2     (stonith:external/riloe):       Started bl460g1a
     prmStonithN2-3     (stonith:meatware):     Started bl460g1a

Migration summary:
* Node bl460g1b: 
* Node bl460g1a: 


Step3) We executed psql -l in pgsql from another node.(Node F)

(snip)
Wed Mar 30 11:11:24 JST 2011 : List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-----------+-------+----------------------- postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres testdb | postgres | UTF8 | C | C | (4 rows)
Wed Mar 30 11:11:25 JST 2011 : List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-----------+-------+----------------------- postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres testdb | postgres | UTF8 | C | C | (4 rows)
(snip)

Step4) We generate stop trouble of drbd in an iSCSI slave node artificially.

(snip)
drbd_monitor() {
        local status
      return $OCF_ERR_GENERIC
(snip)
drbd_stop() {
        local rc=$OCF_ERR_GENERIC
        local first_try=true
        return $rc
(snip)


Step5) The iSCSI node detects trouble of drbd.

Step6) A slave node is done stonith of, but access of psql is blocked for a while.
 * Access of psql to the master node seems to be blocked by the trouble of the slave node.
  * 11:12:50 Block, 11:13:42 UnBlock. 

(snip)
Wed Mar 30 11:12:49 JST 2011 : List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-----------+-------+----------------------- postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres testdb | postgres | UTF8 | C | C | (4 rows)
Wed Mar 30 11:12:50 JST 2011 : List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-----------+-------+----------------------- postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres testdb | postgres | UTF8 | C | C | (4 rows)
Wed Mar 30 11:13:42 JST 2011 : List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-----------+-------+----------------------- postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres testdb | postgres | UTF8 | C | C | (4 rows)
(snip)


The influence seems to happen somehow or other till it is completed after stonith was executed.

I was not able to understand whether it was a problem of drbd whether it was a problem of Pacemaker.

We think the trouble of the slave node wants to avoid influencing a master node.
Is there a setting to break off this problem?

 * I registered the log with Bugzilla.(attached hb_report)
 * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2573

Best Regards,
Hideo Yamauchi.





More information about the Pacemaker mailing list