[Pacemaker] Corosync service taking 100% cpu and is unable to stop gracefully
Dan Frincu
df.cluster at gmail.com
Thu Apr 19 13:03:58 UTC 2012
Hi,
On Thu, Apr 19, 2012 at 2:11 PM, Parshvi <parshvi.17 at gmail.com> wrote:
> Major issues:
> 1) Corosync reaching over 100% cpu usage.
> 2) Corosync unable to stop gracefully.
> 3) Virtual IP of a resources being assigned as the primary IP on a interface,
> after a cable disconnect/reconnect on that interface. The static IP on the
> interface shown as global secondary IP.
>
> Use case:
> 1) Two nodes in a cluster.
> 2) Two communication paths exists between the two nodes, with “rrp_mode” set to
> active in corosync.conf
Are both links of the same speed?
> a. One path is a back-to-back connection between the nodes.
> b. Second is via the LAN network switch.
> 3) The network cable was unplugged on one of the nodes for a while (on both the
> interfaces). It was reconnected after a short while.
>
> Observations:
> 1) Corosync service was taking 100% cpu on the node whose link was down:
What version of Corosync? What OS?
> a. In the above scenario Corosync service could not be stopped gracefully. A
> SIGKILL had to be issued to stop the service.
> b. On this node, of the two interfaces configured in corosync.conf, one was
> being used for the Virtual IP’s preferred eth.
> i. It was observed that when the link was up after a disconnection, the
> primary global IP on that interface was the Virtual IP configured for a
> resource.
> ii. The static IP assigned to the interface was listed as “scope global
> secondary” in the output of `ip addr show`.
> iii. Also the Virtual IP of the resources configured in pacemaker were
> active on both the nodes.
Can you pastebin.com your crm configure show?
> iv. `service network restart` also did not work.
> c. Coroysnc service was stopped (Killed since it could not be stopped), the
> network service was re-started and then corosync was re-started. All good after
> this.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
--
Dan Frincu
CCNA, RHCE
More information about the Pacemaker
mailing list