[ClusterLabs] Corosync ring shown faulty between healthy nodes & networks (rrp_mode: passive)

Tue Oct 4 22:09:21 UTC 2016

Hello all,

I am trying to understand why the following 2 Corosync heartbeat ring failure
scenarios 
I have been testing and hope somebody can explain why this makes any sense.

Consider the following cluster:

    * 3x Nodes: A, B and C
    * 2x NICs for each Node
    * Corosync 2.3.5 configured with "rrp_mode: passive" and 
      udpu transport with ring id 0 and 1 on each node.
    * On each node "corosync-cfgtool -s" shows:
        [...] ring 0 active with no faults
        [...] ring 1 active with no faults

Consider the following scenarios:

    1. On node A only block all communication on the first NIC  configured with
ring id 0
    2. On node A only block all communication on all       NICs configured with
ring id 0 and 1

The result of the above scenarios is as follows:

    1. Nodes A, B and C (!) display the following ring status:
        [...] Marking ringid 0 interface <IP-Address> FAULTY
        [...] ring 1 active with no faults
    2. Node A is shown as OFFLINE - B and C display the following ring status:
        [...] ring 0 active with no faults
        [...] ring 1 active with no faults

Questions:
    1. Is this the expected outcome ?
    2. In experiment 1. B and C can still communicate with each other over both
NICs, so why are 
       B and C not displaying a "no faults" status for ring id 0 and 1 just like
in experiment 2. 
       when node A is completely unreachable ?

Regards,
Martin Schlegel