[Pacemaker] Remote node did not respond

Andrew Beekhof andrew at beekhof.net
Tue Jun 22 04:43:03 EDT 2010


Without logs you're not going to get (m)any useful replies.

2010/6/21 lepace <lepace at 163.com>:
> Hi all,
> I have set up a two-node cluster, at first, it works fine, but when I write
> a lot of data to the fs,a resource of pacemaker,when the filesystem is
> almost full, it collapse. Run crm_mon and it appears:
> ============
> Last updated: Mon Jun 21 16:57:09 2010
> Stack: openais
> Current DC: mds2 - partition with quorum
> Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
> 2 Nodes configured, 2 expected votes
> 6 Resources configured.
> ============
> Online: [ mds1 mds2 ]
> ipmi_mds1 (stonith:external/ipmi) Started [ mds1    mds2 ]
> ipmi_mds2 (stonith:external/ipmi):        Started mds1 FAILED
> Resource Group: web_server
>     virtual_ip  (ocf::heartbeat:IPaddr) Started [ mds1    mds2 ]
>     apache (ocf::heartbeat:apache) Started [ mds1    mds2 ]
> Clone Set: pingd_manage_net
>     manage_pingd:0 (ocf::pacemaker:pingd): Started mds1 FAILED
>         Started: [ mds2 ]
> Clone Set: pingd_data_net
>     data_pingd:0        (ocf::pacemaker:pingd): Started mds1 FAILED
>         Started: [ mds2 ]
> metavol_mpath0  (ocf::heartbeat:Filesystem) Started [   mds1    mds2 ]
> Failed actions:
>     ipmi_mds1_monitor_0 (node=mds1, call=-1, rc=1, status=Timed Out):
> unknown error
>     ipmi_mds2_monitor_0 (node=mds1, call=-1, rc=1, status=Timed Out):
> unknown error
>     virtual_ip_monitor_0 (node=mds1, call=-1, rc=1, status=Timed Out):
> unknown error
>     apache_monitor_0 (node=mds1, call=-1, rc=1, status=Timed Out): unknown
> error
>     manage_pingd:0_monitor_0 (node=mds1, call=-1, rc=1, status=Timed Out):
> unknown error
>     data_pingd:0_monitor_0 (node=mds1, call=-1, rc=1, status=Timed Out):
> unknown error
>     metavol_mpath0_monitor_0 (node=mds1, call=-1, rc=1, status=Timed Out):
> unknown error
> and then I run crm resource start ipmi_mds2 , and wait about 5 minutes, then
> it says :Error performing operation: Remote node did not respond
> Which reason? And how can I avoid this situation?
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>




More information about the Pacemaker mailing list