[Pacemaker] Pacemaker very often STONITHs other node
Michał Margula
alchemyx at uznam.net.pl
Mon Nov 25 11:40:54 UTC 2013
Hello!
I wanted to ask for your help because we are having much trouble with
cluster based on Pacemaker.
We have two identical nodes - PowerEdge R510 with 2x Xeon X5650, 64 GB
of RAM, MegaRAID SAS 2108 RAID (PERC H700) - system disk - RAID 1 on
SSDs (SSDSC2CW060A3) and two volumes - one RAID 1 with WD3000FYYZ and
one RAID 1 with WD1002FBYS -- both Western Digital disks. Both nodes are
linked with two gigabit direct fiber links (no switch in between).
We have two DRBD volumes - /dev/drbd1 (1TB on WD1002FBYS disks) and
/dev/drbd2 (3TB on WD3000FYYZ disks). On top of DRBD (used as PVs) we
have a LVM with LVs for virtual machines which run under XEN.
Here is our CRM configuration - http://pastebin.com/raqsvRTA
We have previously used fast USB drives instead of SSD for root
filesystem and it caused some trouble - it was lagging on I/O and one
node "thought" that another one was having trouble and performing
STONITH on it. After replacing it with SSDs we had no more trouble with
that issue.
But now from time to time it happens that we get STONITH of one nodes,
and reason is unclear to us.
For example last time we found it in logs:
Nov 23 15:14:24 rivendell-B crmd: [9529]: info: process_lrm_event: LRM
operation primitive-LVM:1_monitor_120000 (call=54, rc=7, cib-update=124,
confirmed=false) not running
And after that node rivendell-B got STONITH. Previously we had trouble
with DRBD - node stopped DRBD for no apparent reason and again -
STONITH. Unfortunately we did not check logs that time.
Also when doing some tasks on one of nodes (for example "crm resource
migrate" of few XEN virtual machines) it can cause STONITH also.
Could you give us some hints? Maybe our configuration is wrong? To be
honest we had no previous experience with HA clusters so we created it
based on configuration.
It is working now for over a year now but giving us headaches and we are
wondering if we should drop Pacemaker and use something else (even
manual stopping and starting of virtual machines comes in mind).
Thank you in advance!
--
Michał Margula, alchemyx at uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]
More information about the Pacemaker
mailing list