[ClusterLabs] Corosync+Pacemaker error during failover

Fri Jan 15 11:32:37 UTC 2016

On 2015-10-08 21:20, Ken Gaillot wrote:
> On 10/08/2015 10:16 AM, priyanka wrote:
>> Hi,
>>
>> We are trying to build a HA setup for our servers using DRBD + 
>> Corosync
>> + pacemaker stack.
>>
>> Attached is the configuration file for corosync/pacemaker and drbd.
>
> A few things I noticed:
>
> * Don't set become-primary-on in the DRBD configuration in a 
> Pacemaker
> cluster; Pacemaker should handle all promotions to primary.
>
> * I'm no NFS expert, but why is res_exportfs_root cloned? Can both
> servers export it at the same time? I would expect it to be in the 
> group
> before res_exportfs_export1.

We have followed following configuration guide for our setup,

https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html

which suggests to create clone of this resource. This resource will not 
export actual data, data is exported by exportfs_export1 resource in our 
setup. I did try the previous fail-over scenario without cloning this 
resource but same error appeared.
>
> * Your constraints need some adjustment. Partly it depends on the 
> answer
> to the previous question, but currently res_fs (via the group) is
> ordered after res_exportfs_root, and I don't see how that could work.
>
>> We are getting errors while testing this setup.
>> 1. When we stop corosync on Master machine say server1(lock), it is
>> Stonith'ed. In this case slave-server2(sher) is promoted to master.
>>    But when server1(lock) reboots res_exportfs_export1 is started on
>> both the servers and that resource goes into failed state followed 
>> by
>> servers going into unclean state.
>>    Then server1(lock) reboots and server2(sher) is master but in 
>> unclean
>> state. After server1(lock) comes up, server2(sher) is stonith'ed and
>> server1(lock) is slave(the only online node).
>>    When server2(sher) comes up, both the servers are slaves and 
>> resource
>> group(rg_export) is stopped. Then server2(sher) becomes Master and
>> server1(lock) is slave and resource group is started.
>>    At this point configuration becomes stable.
>>
>>
>> PFA logs(syslog) of server2(sher) after it is promoted to master 
>> till it
>> is first rebooted when resource exportfs goes into failed state.
>>
>> Please let us know if the configuration is appropriate. From the 
>> logs we
>> could not figure out exact reason of resource failure.
>> Your comment on this scenario will be very helpful.
>>
>> Thanks,
>> Priyanka
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Regards,
Priyanka
MTech3 Sysad
IIT Powai