[Pacemaker] [Problem] The state of a node cut with the node that rebooted by a cluster is not recognized.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Tue Jun 4 05:17:51 UTC 2013
Hi All,
I registered this problem with Bugzilla.
* http://bugs.clusterlabs.org/show_bug.cgi?id=5160
Best Regards,
Hideo Yamauchi.
--- On Tue, 2013/6/4, renayama19661014 at ybb.ne.jp <renayama19661014 at ybb.ne.jp> wrote:
> Hi All,
>
> We confirmed a state of the recognition of the cluster in the next procedure.
> We confirm it by the next combination.(RHEL6.4 guest)
> * corosync-2.3.0
> * pacemaker-Pacemaker-1.1.10-rc3
>
> -------------------------
>
> Step 1) Start all nodes and constitute a cluster.
>
> [root at rh64-coro1 ~]# crm_mon -1 -Af
> Last updated: Tue Jun 4 22:30:25 2013
> Last change: Tue Jun 4 22:22:54 2013 via crmd on rh64-coro1
> Stack: corosync
> Current DC: rh64-coro3 (4231178432) - partition with quorum
> Version: 1.1.9-db294e1
> 3 Nodes configured, unknown expected votes
> 0 Resources configured.
>
>
> Online: [ rh64-coro1 rh64-coro2 rh64-coro3 ]
>
>
> Node Attributes:
> * Node rh64-coro1:
> * Node rh64-coro2:
> * Node rh64-coro3:
>
> Migration summary:
> * Node rh64-coro1:
> * Node rh64-coro3:
> * Node rh64-coro2:
>
>
> Step 2) Stop the first unit node.
>
> [root at rh64-coro2 ~]# crm_mon -1 -Af
> Last updated: Tue Jun 4 22:30:55 2013
> Last change: Tue Jun 4 22:22:54 2013 via crmd on rh64-coro1
> Stack: corosync
> Current DC: rh64-coro3 (4231178432) - partition with quorum
> Version: 1.1.9-db294e1
> 3 Nodes configured, unknown expected votes
> 0 Resources configured.
>
>
> Online: [ rh64-coro2 rh64-coro3 ]
> OFFLINE: [ rh64-coro1 ]
>
>
> Node Attributes:
> * Node rh64-coro2:
> * Node rh64-coro3:
>
> Migration summary:
> * Node rh64-coro3:
> * Node rh64-coro2:
>
>
> Step 3) Restart the first unit node.
>
> [root at rh64-coro1 ~]# crm_mon -1 -Af
> Last updated: Tue Jun 4 22:31:29 2013
> Last change: Tue Jun 4 22:22:54 2013 via crmd on rh64-coro1
> Stack: corosync
> Current DC: rh64-coro3 (4231178432) - partition with quorum
> Version: 1.1.9-db294e1
> 3 Nodes configured, unknown expected votes
> 0 Resources configured.
>
>
> Online: [ rh64-coro1 rh64-coro2 rh64-coro3 ]
>
>
> Node Attributes:
> * Node rh64-coro1:
> * Node rh64-coro2:
> * Node rh64-coro3:
>
> Migration summary:
> * Node rh64-coro1:
> * Node rh64-coro3:
> * Node rh64-coro2:
>
>
> Step 4) Interrupt the inter-connect of all nodes.
>
> [root at kvm-host ~]# brctl delif virbr2 vnet1;brctl delif virbr2 vnet4;brctl delif virbr2 vnet7;brctl delif virbr3 vnet2;brctl delif virbr3 vnet5;brctl delif virbr3 vnet8
>
> -------------------------
>
>
> Two nodes that do not reboot then recognize other nodes definitely.
>
> [root at rh64-coro2 ~]# crm_mon -1 -Af
> Last updated: Tue Jun 4 22:32:06 2013
> Last change: Tue Jun 4 22:22:54 2013 via crmd on rh64-coro1
> Stack: corosync
> Current DC: rh64-coro2 (4214401216) - partition WITHOUT quorum
> Version: 1.1.9-db294e1
> 3 Nodes configured, unknown expected votes
> 0 Resources configured.
>
>
> Node rh64-coro1 (4197624000): UNCLEAN (offline)
> Node rh64-coro3 (4231178432): UNCLEAN (offline)
> Online: [ rh64-coro2 ]
>
>
> Node Attributes:
> * Node rh64-coro2:
>
> Migration summary:
> * Node rh64-coro2:
>
> [root at rh64-coro3 ~]# crm_mon -1 -Af
> Last updated: Tue Jun 4 22:33:17 2013
> Last change: Tue Jun 4 22:22:54 2013 via crmd on rh64-coro1
> Stack: corosync
> Current DC: rh64-coro3 (4231178432) - partition WITHOUT quorum
> Version: 1.1.9-db294e1
> 3 Nodes configured, unknown expected votes
> 0 Resources configured.
>
>
> Node rh64-coro1 (4197624000): UNCLEAN (offline)
> Node rh64-coro2 (4214401216): UNCLEAN (offline)
> Online: [ rh64-coro3 ]
>
>
> Node Attributes:
> * Node rh64-coro3:
>
> Migration summary:
> * Node rh64-coro3:
>
>
> However, the node that rebooted does not recognize the state of one node definitely.
>
> [root at rh64-coro1 ~]# crm_mon -1 -Af
> Last updated: Tue Jun 4 22:33:31 2013
> Last change: Tue Jun 4 22:22:54 2013 via crmd on rh64-coro1
> Stack: corosync
> Current DC: rh64-coro1 (4197624000) - partition WITHOUT quorum
> Version: 1.1.9-db294e1
> 3 Nodes configured, unknown expected votes
> 0 Resources configured.
>
>
> Node rh64-coro3 (4231178432): UNCLEAN (offline)----------------> OKay.
> Online: [ rh64-coro1 rh64-coro2 ] ------------------------------> rh64-coro2 NG.
>
>
> Node Attributes:
> * Node rh64-coro1:
> * Node rh64-coro2:
>
> Migration summary:
> * Node rh64-coro1:
> * Node rh64-coro2:
>
>
> It is right movement that recognize other nodes in a UNCLEAN state in the node that rebooted, but seems to recognize it by mistake.
>
> It is like the problem of Pacemaker somehow or other.
> * There seems to be the problem with crm_peer_cache hush table.
>
> Best Regards,
> Hideo Yamauchi.
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list