[ClusterLabs] Antw: Re: Trouble with drbd/pacemaker: switch to secondary/secondary

Wed Oct 19 08:53:32 CEST 2016

>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 18.10.2016 um 17:07 in Nachricht
<9d3b547c-6035-e41d-18ef-9950db01e9dc at redhat.com>:
> On 10/14/2016 03:22 PM, Anne Nicolas wrote:

[...]
>> cluster logs are flooded by :
>> Oct 14 17:42:28 [3445] bzvairsvr      attrd:   notice:
>> attrd_trigger_update:    Sending flush op to all hosts for:
>> master-drbdserv (10000)
>> Oct 14 17:42:28 [3445] bzvairsvr      attrd:   notice:
>> attrd_perform_update:    Sent update master-drbdserv=10000 failed:
>> Transport endpoint is not connected
> 
> This is strange, and the cause of the problem. A master/slave resource
> agent will try to set node attributes indicating which node should
> become the master. Here, we see that this is failing -- it appears attrd
> (Pacemaker's node attribute daemon) is unable to talk to any other daemons.
> 
> I'm not sure why this would happen, especially if the rest of the
> daemons do not have a problem talking to each other. But that's where
> you need to investigate.

>From my little experience it's a bad idea to route I/O traffic and cluster communication over the same link: We had cases where cluster communication (especially when using SCTP) showed errors when traffic was high. Maybe that applies...

> 
> One thing I would say is that 1.1.8 is really old at this point, which
> means you're using the "legacy" attrd, which I'm not very familiar with.

I agree: Even SLES11 SP4 uses old software, but it's at "pacemaker-1.1.12-13.1" at least. Things _really_ got better with later releases.

[...]

Regards,
Ulrich