[Pacemaker] [Openais] very slow pacemaker/corosync shutdown
Lists
lists at benjamindsmith.com
Thu Sep 19 22:19:57 UTC 2013
On 09/18/2013 06:49 PM, Andrew Beekhof wrote:
> On 19/09/2013, at 8:25 AM, David Lang <david at lang.hm> wrote:
>
>> What's the best way to see what it's getting stuck doing?
> Log files.
>
>> Is there a good way to tell if this is a pacemaker or corosync problem (so I can drop one of the lists from the thread)?
> Not without further information
>
We've had the same problem here, trying to get HA dns/named service
working. Works great for a day or so, then seizes up, simple commands
like `crm_standby -v true` timeout after 120 seconds, etc. We're testing
for release, and keep running into issues like this. At first we
suspected firewall issues, but even after confirmed operation and
several hand-offs of HA services back and forth, it still dies within a
day or so.
We're on CentOS 6/64 with yum packages augmented from
http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/RedHat_RHEL-6/
with exclude=pacemaker* corosync*
In order to make the log files visible, I've snipped out a time period
during which it becomes unresponsive visible at
http://hal.schoolpathways.com/details/
I don't know the exact moment, this is a test cluster and not being
monitored by a netmon. Any other details I could provide that would be
useful/helpful?
More information about the Pacemaker
mailing list