[Pacemaker] [Problem]The trouble of the slave node influences a master.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Tue Apr 5 15:37:05 CET 2011


Hi All,

As a result of having investigated it in various ways, there seemed to be the problem in a version of drbd which we used.

The problem was settled when we changed it into drbd8.3.9.

The details of the cause are unclear.
However, please ignore the report of this email because it was settled.

Thanks,
Hideo Yamauchi.


--- On Wed, 2011/3/30, renayama19661014 at ybb.ne.jp <renayama19661014 at ybb.ne.jp> wrote:

> Hi,
> 
> We examined master slave constitution of drbd.
> We made a node of iSCSI in drbd as data of postgreSQL.
> 
> We confirmed stop trouble of drbd in an iSCSI node.
> 
> Step1) We start an iSCSI node. (Node C and Node D)
>  * We use a stonith module(stonith-helper) to need time for.
> 
> ============
> Last updated: Wed Mar 30 10:47:08 2011
> Stack: Heartbeat
> Current DC: bl460g1d (2289caf8-1062-4f58-ab95-075cdcdb4de2) - partition with quorum
> Version: 1.0.10-b0266dd5ffa9c51377c68b1f29d6bc84367f51dd
> 2 Nodes configured, unknown expected votes
> 7 Resources configured.
> ============
> 
> Online: [ bl460g1c bl460g1d ]
> 
>  Master/Slave Set: msGroup01
>      Masters: [ bl460g1c ]
>      Slaves: [ bl460g1d ]
>  Resource Group: iSCSIgroup01
>      prmiSCSITarget     (ocf::heartbeat:iSCSITarget):   Started bl460g1c
>      prmiSCSILogicalUnit        (ocf::heartbeat:iSCSILogicalUnit):      Started bl460g1c
>      prmIpiSCSI (ocf::heartbeat:IPaddr2):       Started bl460g1c
>  Clone Set: clnPingd
>      Started: [ bl460g1c bl460g1d ]
>  Clone Set: clnDiskd1
>      Started: [ bl460g1c bl460g1d ]
>  Clone Set: clnDiskd2
>      Started: [ bl460g1c bl460g1d ]
>  Resource Group: grpStonith1
>      prmStonithN1-1     (stonith:external/stonith-helper):      Started bl460g1d
>      prmStonithN1-2     (stonith:external/riloe):       Started bl460g1d
>      prmStonithN1-3     (stonith:meatware):     Started bl460g1d
>  Resource Group: grpStonith2
>      prmStonithN2-1     (stonith:external/stonith-helper):      Started bl460g1c
>      prmStonithN2-2     (stonith:external/riloe):       Started bl460g1c
>      prmStonithN2-3     (stonith:meatware):     Started bl460g1c
> 
> Step2) We start a pgsql node. (Node A and Node B)
>  * These nodes refer to the data of the iSCSI node.
> 
> ============
> Last updated: Wed Mar 30 11:10:54 2011
> Stack: Heartbeat
> Current DC: bl460g1b (ac007adb-78c8-4209-9e8c-2cae225e775f) - partition with quorum
> Version: 1.0.10-b0266dd5ffa9c51377c68b1f29d6bc84367f51dd
> 2 Nodes configured, unknown expected votes
> 6 Resources configured.
> ============
> 
> Online: [ bl460g1a bl460g1b ]
> 
>  Resource Group: grpPostgreSQLDB
>      prmExPostgreSQLDB  (ocf::heartbeat:sfex):  Started bl460g1a
>      prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem):    Started bl460g1a
>      prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem):    Started bl460g1a
>      prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem):    Started bl460g1a
>      prmIpPostgreSQLDB  (ocf::heartbeat:IPaddr2):       Started bl460g1a
>      prmApPostgreSQLDB  (ocf::heartbeat:pgsql): Started bl460g1a
>  Clone Set: clnPingd
>      Started: [ bl460g1a bl460g1b ]
>  Clone Set: clnDiskd1
>      Started: [ bl460g1a bl460g1b ]
>  Clone Set: clnDiskd2
>      Started: [ bl460g1a bl460g1b ]
>  Resource Group: grpStonith1
>      prmStonithN1-1     (stonith:external/stonith-helper):      Started bl460g1b
>      prmStonithN1-2     (stonith:external/riloe):       Started bl460g1b
>      prmStonithN1-3     (stonith:meatware):     Started bl460g1b
>  Resource Group: grpStonith2
>      prmStonithN2-1     (stonith:external/stonith-helper):      Started bl460g1a
>      prmStonithN2-2     (stonith:external/riloe):       Started bl460g1a
>      prmStonithN2-3     (stonith:meatware):     Started bl460g1a
> 
> Migration summary:
> * Node bl460g1b: 
> * Node bl460g1a: 
> 
> 
> Step3) We executed psql -l in pgsql from another node.(Node F)
> 
> (snip)
> Wed Mar 30 11:11:24 JST 2011 : List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-----------+-------+----------------------- postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres testdb | postgres | UTF8 | C | C | (4 rows)
> Wed Mar 30 11:11:25 JST 2011 : List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-----------+-------+----------------------- postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres testdb | postgres | UTF8 | C | C | (4 rows)
> (snip)
> 
> Step4) We generate stop trouble of drbd in an iSCSI slave node artificially.
> 
> (snip)
> drbd_monitor() {
>         local status
>       return $OCF_ERR_GENERIC
> (snip)
> drbd_stop() {
>         local rc=$OCF_ERR_GENERIC
>         local first_try=true
>         return $rc
> (snip)
> 
> 
> Step5) The iSCSI node detects trouble of drbd.
> 
> Step6) A slave node is done stonith of, but access of psql is blocked for a while.
>  * Access of psql to the master node seems to be blocked by the trouble of the slave node.
>   * 11:12:50 Block, 11:13:42 UnBlock. 
> 
> (snip)
> Wed Mar 30 11:12:49 JST 2011 : List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-----------+-------+----------------------- postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres testdb | postgres | UTF8 | C | C | (4 rows)
> Wed Mar 30 11:12:50 JST 2011 : List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-----------+-------+----------------------- postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres testdb | postgres | UTF8 | C | C | (4 rows)
> Wed Mar 30 11:13:42 JST 2011 : List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-----------+-------+----------------------- postgres | postgres | UTF8 | C | C | template0 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | C | C | =c/postgres : postgres=CTc/postgres testdb | postgres | UTF8 | C | C | (4 rows)
> (snip)
> 
> 
> The influence seems to happen somehow or other till it is completed after stonith was executed.
> 
> I was not able to understand whether it was a problem of drbd whether it was a problem of Pacemaker.
> 
> We think the trouble of the slave node wants to avoid influencing a master node.
> Is there a setting to break off this problem?
> 
>  * I registered the log with Bugzilla.(attached hb_report)
>  * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2573
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 



More information about the Pacemaker mailing list