[Pacemaker] 2 Node Clustering, when primary server goes down(shutdown) the secondary server restarts

Wed Oct 29 08:53:44 UTC 2014

 Googling on "fencing agent IPMI " helps :)

This link might be useful.
https://fedorahosted.org/cluster/wiki/IPMI_FencingConfig

Regards
Arjun

On Wed, Oct 29, 2014 at 2:11 PM, kamal kishi <kamal.kishi at gmail.com> wrote:
> Thanks for the info, was trying to configure IPMI in the servers.
> Can you please suggest a configuration procedure for enabling and
> configuring the IPMI(Which you might have referred to).
> The sites I came across are not understandable.
> The servers I'm using is DELL POWEREDGE R320
>
> On Tue, Oct 28, 2014 at 7:55 PM, Digimer <lists at alteeve.ca> wrote:
>>
>> On 28/10/14 02:24 AM, kamal kishi wrote:
>>>
>>> Hi,
>>>
>>>   I know, no fencing configuration creates issue.
>>> But the current scenario is due to fencing??
>>
>>
>> Maybe, maybe not. I can say that *not* having it will make solving the
>> problem much more difficult. Please get it working, it's pretty easy and it
>> will make your life a lot easier.
>>
>>> The syslog isn't revealing much about the same.
>>> I would love to configure fencing but currently need some solution to
>>> overcome the current scenario, if you say fencing is the only solution
>>> then I might have to do it remotely.
>>
>>
>> It is critical, yes. Please add it, test it and then hook DRBD into it.
>>
>>> OS -> UBUNTU 12.04 (64 bits)
>>> DRBD -> 8.3.11
>>
>>
>> That is quite old. Can you update to 8.3.16? Also, what version is
>> pacemaker and corosync?
>>
>>> Thanks for the quick reply
>>>
>>> On Tue, Oct 28, 2014 at 11:19 AM, Digimer <lists at alteeve.ca
>>> <mailto:lists at alteeve.ca>> wrote:
>>>
>>>     On 28/10/14 01:39 AM, kamal kishi wrote:
>>>
>>>         Hi all,
>>>
>>>                 Facing a strange issue which I'm not able to resolve as
>>>         I'm not
>>>         sure where what is going wrong as the logs is not giving away
>>>         much to my
>>>         knowledge.
>>>
>>>         Issue -
>>>         Have configured 2 Node Clustering, have attached the
>>> configuration
>>>         file(New CRM conf of BIC.txt).
>>>
>>>         If Server2 which is primary is shutdown(forcefully by turning
>>>         off the
>>>         switch), Server1 restarts within few seconds and starts the
>>>         resources.
>>>         Even though the Server1 restarts and starts the resources the
>>>         time taken
>>>         to recover is too long to convince the clients and the current
>>>         working
>>>         is erroneous is what I feel.
>>>
>>>         Have attached the syslog with this mail.(syslog)
>>>
>>>         Do go through the same and let know a solution to resolve the
>>>         same as
>>>         the setup is in clients place.
>>>
>>>         --
>>>         Regards,
>>>         Kamal Kishore B V
>>>
>>>
>>>     You really need fencing, first and foremost. This will cause the
>>>     survivor to put the lost node into a known state and then safely
>>>     begin taking over lost services. Do your nodes have IPMI (or iRMC,
>>>     iLO, DRAC, etc)? If so, setting up stonith is easy.
>>>
>>>     Once it is setup, configure DRBD to use the fence-handler
>>>     'crm-fence-peer.sh' and change the fencing policy to
>>>     'resource-and-stonith'. Without this, you will get split-brains and
>>>     fail-over will be unpredictable.
>>>
>>>     Once stonith is configured and tested in pacemaker and you've hooked
>>>     DRBD's fencing into pacemaker, see if you problem remains. If it
>>>     does, on both nodes, run: 'tail -f -n 0 /var/log/messages', kill a
>>>     node and wait for things to settle down. Share the log output here.
>>>
>>>     Please also tell us your OS, pacemaker, drbd and corosync versions.
>>>
>>>     --
>>>     Digimer
>>>     Papers and Projects: https://alteeve.ca/w/
>>>     What if the cure for cancer is trapped in the mind of a person
>>>     without access to education?
>>>
>>>     _________________________________________________
>>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>     http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
>>>     <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>>>
>>>     Project Home: http://www.clusterlabs.org
>>>     Getting started:
>>>     http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
>>>     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>     Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Kamal Kishore B V
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> --
> Regards,
> Kamal Kishore B V
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>