[Pacemaker] Corosync fails to start when NIC is absent

Jan Friesse jfriesse at redhat.com
Wed Jan 14 11:59:53 CET 2015


Kostiantyn,

> Honza,
> 
> Thank you for helping me.
> So, there is no defined behavior in case one of the interfaces is not in
> the system?

You are right. There is no defined behavior.

Regards,
  Honza


> 
> 
> Thank you,
> Kostya
> 
> On Tue, Jan 13, 2015 at 12:01 PM, Jan Friesse <jfriesse at redhat.com> wrote:
> 
>> Kostiantyn,
>>
>>
>>> According to the https://access.redhat.com/solutions/638843 , the
>>> interface, that is defined in the corosync.conf, must be present in the
>>> system (see at the bottom of the article, section "ROOT CAUSE").
>>> To confirm that I made a couple of tests.
>>>
>>> Here is a part of the corosync.conf file (in a free-write form) (also
>>> attached the origin config file):
>>> ===============================
>>> rrp_mode: passive
>>> ring0_addr is defined in corosync.conf
>>> ring1_addr is defined in corosync.conf
>>> ===============================
>>>
>>> -------------------------------
>>>
>>> Two-node cluster
>>>
>>> -------------------------------
>>>
>>> Test #1:
>>> --------------------------------------------------
>>> IP for ring0 is not defines in the system:
>>> --------------------------------------------------
>>> Start Corosync simultaneously on both nodes.
>>> Corosync fails to start.
>>> From the logs:
>>> Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] parse error in
>>> config: No interfaces defined
>>> Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] Corosync Cluster
>>> Engine exiting with status 8 at main.c:1343.
>>> Result: Corosync and Pacemaker are not running.
>>>
>>> Test #2:
>>> --------------------------------------------------
>>> IP for ring1 is not defines in the system:
>>> --------------------------------------------------
>>> Start Corosync simultaneously on both nodes.
>>> Corosync starts.
>>> Start Pacemaker simultaneously on both nodes.
>>> Pacemaker fails to start.
>>> From the logs, the last writes from the "corosync":
>>> Jan 8 16:31:29 daemon.err<27> corosync[3728]: [TOTEM ] Marking ringid 0
>>> interface 169.254.1.3 FAULTY
>>> Jan 8 16:31:30 daemon.notice<29> corosync[3728]: [TOTEM ] Automatically
>>> recovered ring 0
>>> Result: Corosync and Pacemaker are not running.
>>>
>>>
>>> Test #3:
>>>
>>> "rrp_mode: active" leads to the same result, except Corosync and
>> Pacemaker
>>> init scripts return status "running".
>>> But still "vim /var/log/cluster/corosync.log" shows a lot of errors like:
>>> Jan 08 16:30:47 [4067] A6-402-1 cib: error: pcmk_cpg_dispatch: Connection
>>> to the CPG API failed: Library error (2)
>>>
>>> Result: Corosync and Pacemaker show their statuses as "running", but
>>> "crm_mon" cannot connect to the cluster database. And half of the
>>> Pacemaker's services are not running (including Cluster Information Base
>>> (CIB)).
>>>
>>>
>>> -------------------------------
>>>
>>> For a single node mode
>>>
>>> -------------------------------
>>>
>>> IP for ring0 is not defines in the system:
>>>
>>> Corosync fails to start.
>>>
>>> IP for ring1 is not defines in the system:
>>>
>>> Corosync and Pacemaker are started.
>>>
>>> It is possible that configuration will be applied successfully (50%),
>>>
>>> and it is possible that the cluster is not running any resources,
>>>
>>> and it is possible that the node cannot be put in a standby mode (shows:
>>> communication error),
>>>
>>> and it is possible that the cluster is running all resources, but applied
>>> configuration is not guaranteed to be fully loaded (some rules can be
>>> missed).
>>>
>>>
>>> -------------------------------
>>>
>>> Conclusions:
>>>
>>> -------------------------------
>>>
>>> It is possible that in some rare cases (see comments to the bug) the
>>> cluster will work, but in that case its working state is unstable and the
>>> cluster can stop working every moment.
>>>
>>>
>>> So, is it correct? Does my assumptions make any sense? I didn't any other
>>> explanation in the network ... .
>>
>> Corosync needs all interfaces during start and runtime. This doesn't
>> mean they must be connected (this would make corosync unusable for
>> physical NIC/Switch or cable failure), but they must be up and have
>> correct ip.
>>
>> When this is not the case, corosync rebinds to localhost and weird
>> things happens. Removal of this rebinding is long time TODO, but there
>> are still more important bugs (especially because rebind can be avoided).
>>
>> Regards,
>>   Honza
>>
>>>
>>>
>>>
>>> Thank you,
>>> Kostya
>>>
>>> On Fri, Jan 9, 2015 at 11:10 AM, Kostiantyn Ponomarenko <
>>> konstantin.ponomarenko at gmail.com> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> Corosync fails to start if there is no such network interface configured
>>>> in the system.
>>>> Even with "rrp_mode: passive" the problem is the same when at least one
>>>> network interface is not configured in the system.
>>>>
>>>> Is this the expected behavior?
>>>> I thought that when you use redundant rings, it is enough to have at
>> least
>>>> one NIC configured in the system. Am I wrong?
>>>>
>>>> Thank you,
>>>> Kostya
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Pacemaker mailing list