[Pacemaker] Corosync fails to start when NIC is absent

Kostiantyn Ponomarenko konstantin.ponomarenko at gmail.com
Mon Jan 12 15:04:27 UTC 2015


According to the https://access.redhat.com/solutions/638843 , the
interface, that is defined in the corosync.conf, must be present in the
system (see at the bottom of the article, section "ROOT CAUSE").
To confirm that I made a couple of tests.

Here is a part of the corosync.conf file (in a free-write form) (also
attached the origin config file):
===============================
rrp_mode: passive
ring0_addr is defined in corosync.conf
ring1_addr is defined in corosync.conf
===============================

-------------------------------

Two-node cluster

-------------------------------

Test #1:
--------------------------------------------------
IP for ring0 is not defines in the system:
--------------------------------------------------
Start Corosync simultaneously on both nodes.
Corosync fails to start.
>From the logs:
Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] parse error in
config: No interfaces defined
Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] Corosync Cluster
Engine exiting with status 8 at main.c:1343.
Result: Corosync and Pacemaker are not running.

Test #2:
--------------------------------------------------
IP for ring1 is not defines in the system:
--------------------------------------------------
Start Corosync simultaneously on both nodes.
Corosync starts.
Start Pacemaker simultaneously on both nodes.
Pacemaker fails to start.
>From the logs, the last writes from the "corosync":
Jan 8 16:31:29 daemon.err<27> corosync[3728]: [TOTEM ] Marking ringid 0
interface 169.254.1.3 FAULTY
Jan 8 16:31:30 daemon.notice<29> corosync[3728]: [TOTEM ] Automatically
recovered ring 0
Result: Corosync and Pacemaker are not running.


Test #3:

"rrp_mode: active" leads to the same result, except Corosync and Pacemaker
init scripts return status "running".
But still "vim /var/log/cluster/corosync.log" shows a lot of errors like:
Jan 08 16:30:47 [4067] A6-402-1 cib: error: pcmk_cpg_dispatch: Connection
to the CPG API failed: Library error (2)

Result: Corosync and Pacemaker show their statuses as "running", but
"crm_mon" cannot connect to the cluster database. And half of the
Pacemaker's services are not running (including Cluster Information Base
(CIB)).


-------------------------------

For a single node mode

-------------------------------

IP for ring0 is not defines in the system:

Corosync fails to start.

IP for ring1 is not defines in the system:

Corosync and Pacemaker are started.

It is possible that configuration will be applied successfully (50%),

and it is possible that the cluster is not running any resources,

and it is possible that the node cannot be put in a standby mode (shows:
communication error),

and it is possible that the cluster is running all resources, but applied
configuration is not guaranteed to be fully loaded (some rules can be
missed).


-------------------------------

Conclusions:

-------------------------------

It is possible that in some rare cases (see comments to the bug) the
cluster will work, but in that case its working state is unstable and the
cluster can stop working every moment.


So, is it correct? Does my assumptions make any sense? I didn't any other
explanation in the network ... .



Thank you,
Kostya

On Fri, Jan 9, 2015 at 11:10 AM, Kostiantyn Ponomarenko <
konstantin.ponomarenko at gmail.com> wrote:

> Hi guys,
>
> Corosync fails to start if there is no such network interface configured
> in the system.
> Even with "rrp_mode: passive" the problem is the same when at least one
> network interface is not configured in the system.
>
> Is this the expected behavior?
> I thought that when you use redundant rings, it is enough to have at least
> one NIC configured in the system. Am I wrong?
>
> Thank you,
> Kostya
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20150112/2553fba2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync.conf
Type: application/octet-stream
Size: 1328 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20150112/2553fba2/attachment-0004.obj>


More information about the Pacemaker mailing list