[Pacemaker] 2 Node Clustering, when primary server goes down(shutdown) the secondary server restarts

Tue Oct 28 01:49:20 EDT 2014

On 28/10/14 01:39 AM, kamal kishi wrote:
> Hi all,
>
>        Facing a strange issue which I'm not able to resolve as I'm not
> sure where what is going wrong as the logs is not giving away much to my
> knowledge.
>
> Issue -
> Have configured 2 Node Clustering, have attached the configuration
> file(New CRM conf of BIC.txt).
>
> If Server2 which is primary is shutdown(forcefully by turning off the
> switch), Server1 restarts within few seconds and starts the resources.
> Even though the Server1 restarts and starts the resources the time taken
> to recover is too long to convince the clients and the current working
> is erroneous is what I feel.
>
> Have attached the syslog with this mail.(syslog)
>
> Do go through the same and let know a solution to resolve the same as
> the setup is in clients place.
>
> --
> Regards,
> Kamal Kishore B V

You really need fencing, first and foremost. This will cause the 
survivor to put the lost node into a known state and then safely begin 
taking over lost services. Do your nodes have IPMI (or iRMC, iLO, DRAC, 
etc)? If so, setting up stonith is easy.

Once it is setup, configure DRBD to use the fence-handler 
'crm-fence-peer.sh' and change the fencing policy to 
'resource-and-stonith'. Without this, you will get split-brains and 
fail-over will be unpredictable.

Once stonith is configured and tested in pacemaker and you've hooked 
DRBD's fencing into pacemaker, see if you problem remains. If it does, 
on both nodes, run: 'tail -f -n 0 /var/log/messages', kill a node and 
wait for things to settle down. Share the log output here.

Please also tell us your OS, pacemaker, drbd and corosync versions.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?