[Pacemaker] S_POLICY_ENGINE state continues being maintained
Kazunori INOUE
inouekazu at intellilink.co.jp
Thu May 23 06:44:27 UTC 2013
Hi,
I'm using pacemaker-1.1 (c3486a4a8d. the latest devel).
After fencing caused by split-brain failed 11 times, S_POLICY_ENGINE state is kept even if I recover split-brain.
1. disconnect network connection
[dev1 ~]$ crm_mon
Last updated: Thu May 23 13:16:41 2013
Last change: Thu May 23 13:15:30 2013 via cibadmin on dev1
Stack: corosync
Current DC: dev1 (3232261525) - partition WITHOUT quorum
Version: 1.1.10-0.122.c3486a4.git.el6-c3486a4
2 Nodes configured, unknown expected votes
2 Resources configured.
Node dev2 (3232261523): UNCLEAN (offline)
Online: [ dev1 ]
f1 (stonith:external/libvirt.NG): Started dev2
f2 (stonith:external/libvirt.NG): Started dev1
[dev2 ~]$ crm_mon
Last updated: Thu May 23 13:16:41 2013
Last change: Thu May 23 13:15:30 2013 via cibadmin on dev1
Stack: corosync
Current DC: dev2 (3232261523) - partition WITHOUT quorum
Version: 1.1.10-0.122.c3486a4.git.el6-c3486a4
2 Nodes configured, unknown expected votes
2 Resources configured.
Node dev1 (3232261525): UNCLEAN (offline)
Online: [ dev2 ]
f1 (stonith:external/libvirt.NG): Started dev2
f2 (stonith:external/libvirt.NG): Started dev1
2. wait until fencing failed 11 times
[dev1 ~]$ egrep "CRIT|too_many_st_failures" /var/log/ha-log
May 23 13:16:46 dev1 stonith: [24981]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev2 failed with rc 1
(snip)
May 23 13:17:24 dev1 stonith: [25105]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev2 failed with rc 1
May 23 13:17:28 dev1 stonith: [25118]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev2 failed with rc 1
May 23 13:17:28 dev1 crmd[24868]: notice: too_many_st_failures: Too many failures to fence dev2 (11), giving up
[dev2 ~]$ egrep "CRIT|too_many_st_failures" /var/log/ha-log
May 23 13:16:46 dev2 stonith: [7177]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev1 failed with rc 1
(snip)
May 23 13:17:23 dev2 stonith: [7295]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev1 failed with rc 1
May 23 13:17:28 dev2 stonith: [7309]: CRIT: external_reset_req: 'libvirt.NG reset' for host dev1 failed with rc 1
May 23 13:17:28 dev2 crmd[7107]: notice: too_many_st_failures: Too many failures to fence dev1 (11), giving up
3. recover network disconnection
[dev1 ~]$ crm_mon
Last updated: Thu May 23 13:24:23 2013
Last change: Thu May 23 13:15:30 2013 via cibadmin on dev1
Stack: corosync
Current DC: dev2 (3232261523) - partition with quorum
Version: 1.1.10-0.122.c3486a4.git.el6-c3486a4
2 Nodes configured, unknown expected votes
2 Resources configured.
Online: [ dev1 dev2 ]
f1 (stonith:external/libvirt.NG): Started dev2
f2 (stonith:external/libvirt.NG): Started dev1
S_POLICY_ENGINE state continues being maintained although a member's join seems to have succeeded.
[13:47:54 root at dev1 ~]$ crmadmin -S dev2
Status of crmd at dev2: S_POLICY_ENGINE (ok)
Best Regards,
Kazunori INOUE
-------------- next part --------------
A non-text attachment was scrubbed...
Name: keeping-S_POLICY_ENGINE.tar.bz2
Type: application/octet-stream
Size: 169059 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130523/c354253b/attachment-0003.obj>
More information about the Pacemaker
mailing list