[Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.
Andrew Beekhof
andrew at beekhof.net
Wed May 15 05:03:50 UTC 2013
On 13/05/2013, at 4:14 PM, renayama19661014 at ybb.ne.jp wrote:
> Hi All,
>
> We constituted a simple cluster in environment of vSphere5.1.
>
> We composed it of two ESXi servers and shared disk.
>
> The guest located it to the shared disk.
What is on the shared disk? The whole OS or app-specific data (i.e. nothing pacemaker needs directly)?
>
>
> Step 1) Constitute a cluster.(A DC node is an active node.)
>
> ============
> Last updated: Mon May 13 14:16:09 2013
> Stack: Heartbeat
> Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum
> Version: 1.0.13-30bb726
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> ============
>
> Online: [ pgsr01 pgsr02 ]
>
> Resource Group: test-group
> Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
> Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
> Clone Set: clnPingd
> Started: [ pgsr01 pgsr02 ]
>
> Node Attributes:
> * Node pgsr01:
> + default_ping_set : 100
> * Node pgsr02:
> + default_ping_set : 100
>
> Migration summary:
> * Node pgsr01:
> * Node pgsr02:
>
>
> Step 2) Strace does the pengine process of the DC node.
>
> [root at pgsr01 ~]# ps -ef |grep heartbeat
> root 2072 1 0 13:56 ? 00:00:00 heartbeat: master control process
> root 2075 2072 0 13:56 ? 00:00:00 heartbeat: FIFO reader
> root 2076 2072 0 13:56 ? 00:00:00 heartbeat: write: bcast eth1
> root 2077 2072 0 13:56 ? 00:00:00 heartbeat: read: bcast eth1
> root 2078 2072 0 13:56 ? 00:00:00 heartbeat: write: bcast eth2
> root 2079 2072 0 13:56 ? 00:00:00 heartbeat: read: bcast eth2
> 496 2082 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/ccm
> 496 2083 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/cib
> root 2084 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/lrmd -r
> root 2085 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/stonithd
> 496 2086 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/attrd
> 496 2087 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/crmd
> 496 2089 2087 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/pengine
> root 2182 1 0 14:15 ? 00:00:00 /usr/lib64/heartbeat/pingd -D -p /var/run//pingd-default_ping_set -a default_ping_set -d 5s -m 100 -i 1 -h 192.168.101.254
> root 2287 1973 0 14:16 pts/0 00:00:00 grep heartbea
>
> [root at pgsr01 ~]# strace -p 2089
> Process 2089 attached - interrupt to quit
> restart_syscall(<... resuming interrupted call ...>) = 0
> times({tms_utime=5, tms_stime=6, tms_cutime=0, tms_cstime=0}) = 429527557
> recvfrom(5, 0xa93ff7, 953, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> poll([{fd=5, events=0}], 1, 0) = 0 (Timeout)
> recvfrom(5, 0xa93ff7, 953, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> poll([{fd=5, events=0}], 1, 0) = 0 (Timeout)
> (snip)
>
>
> Step 3) Disconnect the shared disk which an active node was placed.
>
> Step 4) Cut off pingd of the standby node.
> The score of pingd is reflected definitely, but handling of pengine blocks it.
>
> ~ # esxcfg-vswitch -N vmnic1 -p "ap-db" vSwitch1
> ~ # esxcfg-vswitch -N vmnic2 -p "ap-db" vSwitch1
>
>
> (snip)
> brk(0xd05000) = 0xd05000
> brk(0xeed000) = 0xeed000
> brk(0xf2d000) = 0xf2d000
> fstat(6, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f86a255a000
> write(6, "BZh51AY&SY\327\373\370\203\0\t(_\200UPX\3\377\377%cT \277\377\377"..., 2243) = 2243
> brk(0xb1d000) = 0xb1d000
> fsync(6 ------------------------------> BLOCKED
> (snip)
>
>
> ============
> Last updated: Mon May 13 14:19:15 2013
> Stack: Heartbeat
> Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum
> Version: 1.0.13-30bb726
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> ============
>
> Online: [ pgsr01 pgsr02 ]
>
> Resource Group: test-group
> Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
> Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
> Clone Set: clnPingd
> Started: [ pgsr01 pgsr02 ]
>
> Node Attributes:
> * Node pgsr01:
> + default_ping_set : 100
> * Node pgsr02:
> + default_ping_set : 0 : Connectivity is lost
>
> Migration summary:
> * Node pgsr01:
> * Node pgsr02:
>
>
> Step 4) Reconnect communication of pingd of the standby node.
> The score of pingd is reflected definitely, but handling of pengine blocks it.
>
>
> ~ # esxcfg-vswitch -M vmnic1 -p "ap-db" vSwitch1
> ~ # esxcfg-vswitch -M vmnic2 -p "ap-db" vSwitch1
>
> ============
> Last updated: Mon May 13 14:19:40 2013
> Stack: Heartbeat
> Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum
> Version: 1.0.13-30bb726
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> ============
>
> Online: [ pgsr01 pgsr02 ]
>
> Resource Group: test-group
> Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
> Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
> Clone Set: clnPingd
> Started: [ pgsr01 pgsr02 ]
>
> Node Attributes:
> * Node pgsr01:
> + default_ping_set : 100
> * Node pgsr02:
> + default_ping_set : 100
>
> Migration summary:
> * Node pgsr01:
> * Node pgsr02:
>
>
> --------- A block state of pengine continues -----
>
> Step 5) Cut off pingd of the active node.
> The score of pingd is reflected definitely, but handling of pengine blocks it.
>
>
> ~ # esxcfg-vswitch -N vmnic1 -p "ap-db" vSwitch1
> ~ # esxcfg-vswitch -N vmnic2 -p "ap-db" vSwitch1
>
>
> ============
> Last updated: Mon May 13 14:20:32 2013
> Stack: Heartbeat
> Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum
> Version: 1.0.13-30bb726
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> ============
>
> Online: [ pgsr01 pgsr02 ]
>
> Resource Group: test-group
> Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
> Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
> Clone Set: clnPingd
> Started: [ pgsr01 pgsr02 ]
>
> Node Attributes:
> * Node pgsr01:
> + default_ping_set : 0 : Connectivity is lost
> * Node pgsr02:
> + default_ping_set : 100
>
> Migration summary:
> * Node pgsr01:
> * Node pgsr02:
>
> --------- A block state of pengine continues -----
>
>
> After that the movement to the standby node of the resource does not happen because in condition transition is not made because a block state of pengine continues.
> In the vSphere environment, time considerably passes, and blocking is canceled, and transition is generated.
> * The IO blocking of pengine seems to occur repeatedly
> * Other processes may be blocked, too.
> * It took it from trouble to FO completion more than one hour.
>
> This problem shows that resource movement may not occur after disk trouble in vSphere environment.
>
> Because our user thinks that I use Pacemaker in vSphere environment, the solution to this problem is necessary.
>
> Do not you know the example which solved a similar problem on vSphere?
>
> We think that it is necessary to evade a block of pengine if there is not a solution example.
>
> For example...
> 1. crmd watches a request to pengine with a timer...
> 2. pengine writes in it with a timer and watches processing....
> ..etc...
>
> * This problem does not seem to occur in KVM.
> * There is the possibility of the difference of the hyper visor.
> * In addition, even an actual machine of Linux did not generate the problem.
>
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list