[Pacemaker] Corosync service taking 100% cpu and is unable to stop gracefully
Parshvi
parshvi.17 at gmail.com
Thu Apr 19 13:14:23 UTC 2012
Dan Frincu <df.cluster at ...> writes:
>
> Hi,
>
> On Thu, Apr 19, 2012 at 2:11 PM, Parshvi <parshvi.17 <at> gmail.com> wrote:
> > Major issues:
> > 1) Corosync reaching over 100% cpu usage.
> > 2) Corosync unable to stop gracefully.
> > 3) Virtual IP of a resources being assigned as the primary IP on a
interface,
> > after a cable disconnect/reconnect on that interface. The static IP on the
> > interface shown as global secondary IP.
> >
> > Use case:
> > 1) Two nodes in a cluster.
> > 2) Two communication paths exists between the two nodes, with “rrp_mode” set
to
> > active in corosync.conf
>
> Are both links of the same speed?
yes. speed of each: 1000Mb/s
>
> > a. One path is a back-to-back connection between the nodes.
> > b. Second is via the LAN network switch.
> > 3) The network cable was unplugged on one of the nodes for a while (on both
the
> > interfaces). It was reconnected after a short while.
> >
> > Observations:
> > 1) Corosync service was taking 100% cpu on the node whose link was down:
>
> What version of Corosync? What OS?
Corosync Cluster Engine, version '1.2.7' SVN revision '3008'
OEL (Oracle Enterprise Linux release 5.6)
>
> Can you pastebin.com your crm configure show?
would do that in a followup mail.
Thanks for a quick response Dan.
Here is a snapshot of top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4726 root RT 0 201m 5576 2004 R 100.4 0.1 36:35.31 corosync
Logs and core file have been saved and can be posted if required.
My response inline.
More information about the Pacemaker
mailing list