[Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Mon May 13 06:14:24 UTC 2013
Hi All,
We constituted a simple cluster in environment of vSphere5.1.
We composed it of two ESXi servers and shared disk.
The guest located it to the shared disk.
Step 1) Constitute a cluster.(A DC node is an active node.)
============
Last updated: Mon May 13 14:16:09 2013
Stack: Heartbeat
Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum
Version: 1.0.13-30bb726
2 Nodes configured, unknown expected votes
2 Resources configured.
============
Online: [ pgsr01 pgsr02 ]
Resource Group: test-group
Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
Clone Set: clnPingd
Started: [ pgsr01 pgsr02 ]
Node Attributes:
* Node pgsr01:
+ default_ping_set : 100
* Node pgsr02:
+ default_ping_set : 100
Migration summary:
* Node pgsr01:
* Node pgsr02:
Step 2) Strace does the pengine process of the DC node.
[root at pgsr01 ~]# ps -ef |grep heartbeat
root 2072 1 0 13:56 ? 00:00:00 heartbeat: master control process
root 2075 2072 0 13:56 ? 00:00:00 heartbeat: FIFO reader
root 2076 2072 0 13:56 ? 00:00:00 heartbeat: write: bcast eth1
root 2077 2072 0 13:56 ? 00:00:00 heartbeat: read: bcast eth1
root 2078 2072 0 13:56 ? 00:00:00 heartbeat: write: bcast eth2
root 2079 2072 0 13:56 ? 00:00:00 heartbeat: read: bcast eth2
496 2082 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/ccm
496 2083 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/cib
root 2084 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/lrmd -r
root 2085 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/stonithd
496 2086 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/attrd
496 2087 2072 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/crmd
496 2089 2087 0 13:57 ? 00:00:00 /usr/lib64/heartbeat/pengine
root 2182 1 0 14:15 ? 00:00:00 /usr/lib64/heartbeat/pingd -D -p /var/run//pingd-default_ping_set -a default_ping_set -d 5s -m 100 -i 1 -h 192.168.101.254
root 2287 1973 0 14:16 pts/0 00:00:00 grep heartbea
[root at pgsr01 ~]# strace -p 2089
Process 2089 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
times({tms_utime=5, tms_stime=6, tms_cutime=0, tms_cstime=0}) = 429527557
recvfrom(5, 0xa93ff7, 953, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=0}], 1, 0) = 0 (Timeout)
recvfrom(5, 0xa93ff7, 953, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=0}], 1, 0) = 0 (Timeout)
(snip)
Step 3) Disconnect the shared disk which an active node was placed.
Step 4) Cut off pingd of the standby node.
The score of pingd is reflected definitely, but handling of pengine blocks it.
~ # esxcfg-vswitch -N vmnic1 -p "ap-db" vSwitch1
~ # esxcfg-vswitch -N vmnic2 -p "ap-db" vSwitch1
(snip)
brk(0xd05000) = 0xd05000
brk(0xeed000) = 0xeed000
brk(0xf2d000) = 0xf2d000
fstat(6, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f86a255a000
write(6, "BZh51AY&SY\327\373\370\203\0\t(_\200UPX\3\377\377%cT \277\377\377"..., 2243) = 2243
brk(0xb1d000) = 0xb1d000
fsync(6 ------------------------------> BLOCKED
(snip)
============
Last updated: Mon May 13 14:19:15 2013
Stack: Heartbeat
Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum
Version: 1.0.13-30bb726
2 Nodes configured, unknown expected votes
2 Resources configured.
============
Online: [ pgsr01 pgsr02 ]
Resource Group: test-group
Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
Clone Set: clnPingd
Started: [ pgsr01 pgsr02 ]
Node Attributes:
* Node pgsr01:
+ default_ping_set : 100
* Node pgsr02:
+ default_ping_set : 0 : Connectivity is lost
Migration summary:
* Node pgsr01:
* Node pgsr02:
Step 4) Reconnect communication of pingd of the standby node.
The score of pingd is reflected definitely, but handling of pengine blocks it.
~ # esxcfg-vswitch -M vmnic1 -p "ap-db" vSwitch1
~ # esxcfg-vswitch -M vmnic2 -p "ap-db" vSwitch1
============
Last updated: Mon May 13 14:19:40 2013
Stack: Heartbeat
Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum
Version: 1.0.13-30bb726
2 Nodes configured, unknown expected votes
2 Resources configured.
============
Online: [ pgsr01 pgsr02 ]
Resource Group: test-group
Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
Clone Set: clnPingd
Started: [ pgsr01 pgsr02 ]
Node Attributes:
* Node pgsr01:
+ default_ping_set : 100
* Node pgsr02:
+ default_ping_set : 100
Migration summary:
* Node pgsr01:
* Node pgsr02:
--------- A block state of pengine continues -----
Step 5) Cut off pingd of the active node.
The score of pingd is reflected definitely, but handling of pengine blocks it.
~ # esxcfg-vswitch -N vmnic1 -p "ap-db" vSwitch1
~ # esxcfg-vswitch -N vmnic2 -p "ap-db" vSwitch1
============
Last updated: Mon May 13 14:20:32 2013
Stack: Heartbeat
Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with quorum
Version: 1.0.13-30bb726
2 Nodes configured, unknown expected votes
2 Resources configured.
============
Online: [ pgsr01 pgsr02 ]
Resource Group: test-group
Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
Clone Set: clnPingd
Started: [ pgsr01 pgsr02 ]
Node Attributes:
* Node pgsr01:
+ default_ping_set : 0 : Connectivity is lost
* Node pgsr02:
+ default_ping_set : 100
Migration summary:
* Node pgsr01:
* Node pgsr02:
--------- A block state of pengine continues -----
After that the movement to the standby node of the resource does not happen because in condition transition is not made because a block state of pengine continues.
In the vSphere environment, time considerably passes, and blocking is canceled, and transition is generated.
* The IO blocking of pengine seems to occur repeatedly
* Other processes may be blocked, too.
* It took it from trouble to FO completion more than one hour.
This problem shows that resource movement may not occur after disk trouble in vSphere environment.
Because our user thinks that I use Pacemaker in vSphere environment, the solution to this problem is necessary.
Do not you know the example which solved a similar problem on vSphere?
We think that it is necessary to evade a block of pengine if there is not a solution example.
For example...
1. crmd watches a request to pengine with a timer...
2. pengine writes in it with a timer and watches processing....
..etc...
* This problem does not seem to occur in KVM.
* There is the possibility of the difference of the hyper visor.
* In addition, even an actual machine of Linux did not generate the problem.
Best Regards,
Hideo Yamauchi.
More information about the Pacemaker
mailing list