[ClusterLabs] ClusterLabsdlm reason for leaving the cluster changes when stopping gfs2-utils service

Momcilo Medic fedorauser at fedoraproject.org
Thu Mar 24 08:38:49 UTC 2016


On Wed, Mar 23, 2016 at 6:33 PM, Ferenc Wágner <wferi at niif.hu> wrote:
> (Please post only to the list, or at least keep it amongst the Cc-s.)
>
> Momcilo Medic <fedorauser at fedoraproject.org> writes:
>
>> On Wed, Mar 23, 2016 at 1:56 PM, Ferenc Wágner <wferi at niif.hu> wrote:
>>> Momcilo Medic <fedorauser at fedoraproject.org> writes:
>>>
>>>> I have three hosts setup in my test environment.
>>>> They each have two connections to the SAN which has GFS2 on it.
>>>>
>>>> Everything works like a charm, except when I reboot a host.
>>>> Once it tries to stop gfs2-utils service it will just hang.
>>>
>>> Are you sure the OS reboot sequence does not stop the network or
>>> corosync before GFS and DLM?
>>
>> I specifically configured services to start in this order:
>> Corosync - DLM - GFS2-utils
>> and to shutdown in this order:
>> GFS2-utils - DLM - Corosync.
>>
>> I've acomplish this with:
>>  update-rc.d -f corosync remove
>>  update-rc.d -f corosync-notifyd remove
>>  update-rc.d -f dlm remove
>>  update-rc.d -f gfs2-utils remove
>>  update-rc.d -f xendomains remove
>>  update-rc.d corosync start 25 2 3 4 5 . stop 35 0 1 6 .
>>  update-rc.d corosync-notifyd start 25 2 3 4 5 . stop 35 0 1 6 .
>>  update-rc.d dlm start 30 2 3 4 5 . stop 30 0 1 6 .
>>  update-rc.d gfs2-utils start 35 2 3 4 5 . stop 25 0 1 6 .
>>  update-rc.d xendomains start 40 2 3 4 5 . stop 20 0 1 6 .
>
> I don't know your OS, the above may or may not work.
>
>> Also, the moment I was capturing logs, corosync and dlm were not
>> running as services, but in foreground debugging mode.
>> SSH connection did not break until I powered down the host so network
>> is not stopped either.
>
> At least you've got interactive debugging ability then.  So try to find
> out why the Corosync membership broke down.  The output of
> corosync-quorumtool and corosync-cpgtool might help.  Also try pinging
> the Corosync ring0 addresses between the nodes.

Dear Feri,

Sorry, for leaving out lists from reply, it was hasty mistake :)
Just so I put all the information out there: I am using Ubuntu 14.04
across all hosts.

I've attached debugging logs in my first post. I cannot figure out
what is the key info there.
Today, I'll try to use tools you mentioned to see their output before
and during the issue.

Kind regards,
Momcilo "Momo" Medic.
(fedorauser)




More information about the Users mailing list