[ClusterLabs] Automatic Recover for stonith:external/libvirt
    mr at inwx.de 
    mr at inwx.de
       
    Fri Jan  8 14:56:01 UTC 2016
    
    
  
Hello List,
I have here a test environment for checking pacemaker. Sometimes our 
kvm-hosts with libvirt have trouble with responding the stonith/libvirt 
resource, so I like to configure the service to realize as failed after 
three failed monitoring attempts. I was searching for a configuration  here:
 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html
But I failed after hours.
That's the configuration line for stonith/libvirt:
crm configure primitive p_fence_ha3 stonith:external/libvirt  params 
hostlist="ha3" hypervisor_uri="qemu+tls://debian1/system" op monitor 
interval="60"
Every 60 seconds pacemaker makes something like this:
  stonith -t external/libvirt hostlist="ha3" 
hypervisor_uri="qemu+tls://debian1/system" -S
  ok
To simulate the unavailability of the kvm host I remove the certificate 
in /etc/libvirt/libvirtd.conf and restart libvirtd. After 60 seconds or 
less I can see the error with "crm status". On the kvm host I add 
certificate again to /etc/libvirt/libvirtd.conf and restart libvirt 
again. Although libvirt is again available the stonith-resource did not 
start again.
I altered the configuration line for stonith/libvirt with following parts:
  op monitor interval="60" pcmk_status_retries="3"
  op monitor interval="60" pcmk_monitor_retries="3"
  op monitor interval="60" start-delay=180
  meta migration-threshold="200" failure-timeout="120"
But always with first failed monitor check after 60 or less seconds 
pacemakers did not resume stonith-libvirt after libvirt is again available.
Here is the "crm status"-output on debian 8 (Jessie):
  root at ha4:~# crm status
  Last updated: Tue Jan  5 10:04:18 2016
  Last change: Mon Jan  4 18:18:12 2016
  Stack: corosync
  Current DC: ha3 (167772400) - partition with quorum
  Version: 1.1.12-561c4cf
  2 Nodes configured
  2 Resources configured
  Online: [ ha3 ha4 ]
  Service-IP     (ocf::heartbeat:IPaddr2):       Started ha3
  haproxy        (lsb:haproxy):  Started ha3
  p_fence_ha3    (stonith:external/libvirt):     Started ha4
Kind regards
Michael R.
    
    
More information about the Users
mailing list