[Pacemaker] /var/lib/pacemaker/cores cleanup

Thu Nov 7 18:27:39 EST 2013

On 7 Oct 2013, at 5:52 pm, Mailing List SVR <lists at svrinformatica.it> wrote:

> Il 07/10/2013 04:16, Andrew Beekhof ha scritto:
>> On 05/10/2013, at 7:11 AM, Mailing List SVR <lists at svrinformatica.it>
>>  wrote:
>> 
>> 
>>> Hi,
>>> 
>>> I have a pacemaker cluster running fine since 2 months, I noticed that in the folder /var/lib/pacemaker/cores/root I have about 1,5 GB of files core.xxxx, who is responsabile to cleanup these files,
>>> 
>> Ideally they would have been reported upstream so the underlying problem that caused them could be fixed.
> 
> if you are interested here are some core dumps:
> 
> http://195.250.34.59/temp/cores.tar.bz2
> 

dammit, we're not correctly collecting metadata for the 'service' class.
these core files are produced when we try to parse the result as xml.

at least 

[root at pcmk-5 ~]# crm_resource --show-metadata service:nfs 
Usage: nfs {start|stop|status|restart|reload|force-reload|condrestart|try-restart|condstop}

vs.

[root at pcmk-5 ~]# crm_resource --show-metadata lsb:nfs 
<?xml version='1.0'?>
<!DOCTYPE resource-agent SYSTEM 'ra-api-1.dtd'>
<resource-agent name='nfs' version='0.1'>
  <version>1.0</version>
  <longdesc lang='en'>
     NFS is a popular protocol for file sharing across networks.
               This service provides NFS server functionality, which is \
               configured via the /etc/exports file.

  </longdesc>
  <shortdesc lang='en'>nfs</shortdesc>
  <parameters>
  </parameters>
  <actions>
    <action name='meta-data'    timeout='5' />
    <action name='start'        timeout='15' />
    <action name='stop'         timeout='15' />
    <action name='status'       timeout='15' />
    <action name='restart'      timeout='15' />
    <action name='force-reload' timeout='15' />
    <action name='monitor'      timeout='15' interval='15' />
  </actions>
  <special tag='LSB'>
    <Provides></Provides>
    <Required-Start></Required-Start>
    <Required-Stop></Required-Stop>
    <Should-Start></Should-Start>
    <Should-Stop></Should-Stop>
    <Default-Start></Default-Start>
    <Default-Stop></Default-Stop>
  </special>
</resource-agent>

Fixed in https://github.com/beekhof/pacemaker/commit/644752e

> this is a pacemaker/cman cluster on centos 6.4
> 
> pacemaker-libs-1.1.8-7.el6.x86_64
> pacemaker-cluster-libs-1.1.8-7.el6.x86_64
> pacemaker-1.1.8-7.el6.x86_64
> pacemaker-cli-1.1.8-7.el6.x86_64
> cman-3.0.12.1-49.el6_4.2.x86_64
> 
> pcs config
> Corosync Nodes:
>  
> Pacemaker Nodes:
>  server3.<domain.com> server4.<domain.com> 
> 
> Resources: 
>  Master: DatiClone
>   Resource: Dati (provider=linbit type=drbd class=ocf)
>    Attributes: drbd_resource=dati 
>    Operations: monitor interval=120s
>  Resource: DatiFs (provider=heartbeat type=Filesystem class=ocf)
>   Attributes: device=/dev/drbd/by-res/dati directory=/srv/dati fstype=ext4 options=noatime,nodiratime,nodev run_fsck=force 
>  Resource: ClusterIp (provider=heartbeat type=IPaddr2 class=ocf)
>   Attributes: ip=172.16.20.9 cidr_netmask=32 
>   Operations: monitor interval=60s
>  Resource: Smb (type=smb class=service)
>   Operations: monitor interval=1min
>  Resource: Nmb (type=nmb class=service)
>   Operations: monitor interval=1min
>  Resource: PgSQL (type=postgresql class=service)
>   Operations: monitor interval=1min
>  Resource: SmbManager (type=smbmanager class=service)
>   Operations: monitor interval=5min
>  Resource: ipmi-fencing3 (type=fence_ipmilan class=stonith)
>   Attributes: pcmk_host_list=server3.<domain.com>.com ipaddr=172.16.20.6 login=root passwd=pwd123 lanplus=1 
>   Operations: monitor interval=60s
>  Resource: ipmi-fencing4 (type=fence_ipmilan class=stonith)
>   Attributes: pcmk_host_list=server4.<domain.com> ipaddr=172.16.20.7 login=root passwd=pwd123 lanplus=1 
>   Operations: monitor interval=60s
> 
> Location Constraints:
>   Resource: ipmi-fencing4
>     Disabled on: server4.<domain.com>
>   Resource: ipmi-fencing3
>     Disabled on: server3.<domain.com>
> Ordering Constraints:
>   start ClusterIp then start Smb
>   start Nmb then start Smb
>   promote DatiClone then start DatiFs
>   start DatiFs then start Nmb
>   start DatiFs then start PgSQL
>   start PgSQL then start SmbManager
> Colocation Constraints:
>   ClusterIp with Smb
>   Smb with Nmb
>   Smb with DatiFs
>   DatiFs with DatiClone (with-rsc-role:Master)
>   PgSQL with DatiFs
>   SmbManager with DatiFs
> 
> Cluster Properties:
>  dc-version: 1.1.8-7.el6-394e906
>  cluster-infrastructure: cman
>  no-quorum-policy: ignore
>  stonith-enabled: true
> 
> 
>> 
>>> is it safe to remove the files older than a months with a cron script?
>>> 
>> Yes
> 
> ok thanks,
> Nicola
> 
>> 
>>> thanks
>>> Nicola
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: 
>>> Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> 
>>> Project Home: 
>>> http://www.clusterlabs.org
>>> 
>>> Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> 
>>> Bugs: 
>>> http://bugs.clusterlabs.org
>