[Pacemaker] Corosync service taking 100% cpu and is unable to stop gracefully

Parshvi parshvi.17 at gmail.com
Thu Apr 19 13:14:23 UTC 2012


Dan Frincu <df.cluster at ...> writes:

> 
> Hi,
> 
> On Thu, Apr 19, 2012 at 2:11 PM, Parshvi <parshvi.17 <at> gmail.com> wrote:
> > Major issues:
> > 1) Corosync reaching over 100% cpu usage.
> > 2) Corosync unable to stop gracefully.
> > 3) Virtual IP of a resources being assigned as the primary IP on a 
interface,
> > after a cable disconnect/reconnect on that interface. The static IP on the
> > interface shown as global secondary IP.
> >
> > Use case:
> > 1) Two nodes in a cluster.
> > 2) Two communication paths exists between the two nodes, with “rrp_mode” set 
to
> > active in corosync.conf
> 
> Are both links of the same speed?
yes. speed of each: 1000Mb/s
> 
> >  a. One path is a back-to-back connection between the nodes.
> >  b. Second is  via the LAN network  switch.
> > 3) The network cable was unplugged on one of the nodes for a while (on both 
the
> > interfaces). It was reconnected after a short while.
> >
> > Observations:
> > 1) Corosync service was taking 100% cpu on the node whose link was down:
> 
> What version of Corosync? What OS?
Corosync Cluster Engine, version '1.2.7' SVN revision '3008'
OEL (Oracle Enterprise Linux release 5.6)
> 

> Can you pastebin.com your crm configure show?
would do that in a followup mail.

Thanks for a quick response Dan.

Here is a snapshot of top:

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4726 root      RT   0  201m 5576 2004 R 100.4  0.1  36:35.31 corosync

Logs and core file have been saved and can be posted if required.
My response inline.








More information about the Pacemaker mailing list