[Pacemaker] /var/lib/pacemaker/cores cleanup

Thu Nov 7 23:36:58 UTC 2013

On 8 Nov 2013, at 10:27 am, Andrew Beekhof <andrew at beekhof.net> wrote:

> 
> On 7 Oct 2013, at 5:52 pm, Mailing List SVR <lists at svrinformatica.it> wrote:
> 
>> Il 07/10/2013 04:16, Andrew Beekhof ha scritto:
>>> On 05/10/2013, at 7:11 AM, Mailing List SVR <lists at svrinformatica.it>
>>> wrote:
>>> 
>>> 
>>>> Hi,
>>>> 
>>>> I have a pacemaker cluster running fine since 2 months, I noticed that in the folder /var/lib/pacemaker/cores/root I have about 1,5 GB of files core.xxxx, who is responsabile to cleanup these files,
>>>> 
>>> Ideally they would have been reported upstream so the underlying problem that caused them could be fixed.
>> 
>> if you are interested here are some core dumps:
>> 
>> http://195.250.34.59/temp/cores.tar.bz2
>> 
> 
> dammit, we're not correctly collecting metadata for the 'service' class.
> these core files are produced when we try to parse the result as xml.

The others are borking on control characters in the xml string.

lrmd_rsc_output=\"Stopping postgresql service: \033[60G[\033[0;32m  OK  \033[0; ..."

which was fixed by https://github.com/beekhof/pacemaker/commit/c351934 and is included in https://rhn.redhat.com/errata/RHEA-2013-1493.html

> 
> at least 
> 
> [root at pcmk-5 ~]# crm_resource --show-metadata service:nfs 
> Usage: nfs {start|stop|status|restart|reload|force-reload|condrestart|try-restart|condstop}
> 
> vs.
> 
> [root at pcmk-5 ~]# crm_resource --show-metadata lsb:nfs 
> <?xml version='1.0'?>
> <!DOCTYPE resource-agent SYSTEM 'ra-api-1.dtd'>
> <resource-agent name='nfs' version='0.1'>
>  <version>1.0</version>
>  <longdesc lang='en'>
>     NFS is a popular protocol for file sharing across networks.
>               This service provides NFS server functionality, which is \
>               configured via the /etc/exports file.
> 
>  </longdesc>
>  <shortdesc lang='en'>nfs</shortdesc>
>  <parameters>
>  </parameters>
>  <actions>
>    <action name='meta-data'    timeout='5' />
>    <action name='start'        timeout='15' />
>    <action name='stop'         timeout='15' />
>    <action name='status'       timeout='15' />
>    <action name='restart'      timeout='15' />
>    <action name='force-reload' timeout='15' />
>    <action name='monitor'      timeout='15' interval='15' />
>  </actions>
>  <special tag='LSB'>
>    <Provides></Provides>
>    <Required-Start></Required-Start>
>    <Required-Stop></Required-Stop>
>    <Should-Start></Should-Start>
>    <Should-Stop></Should-Stop>
>    <Default-Start></Default-Start>
>    <Default-Stop></Default-Stop>
>  </special>
> </resource-agent>
> 
> 
> Fixed in https://github.com/beekhof/pacemaker/commit/644752e
> 
>> this is a pacemaker/cman cluster on centos 6.4
>> 
>> pacemaker-libs-1.1.8-7.el6.x86_64
>> pacemaker-cluster-libs-1.1.8-7.el6.x86_64
>> pacemaker-1.1.8-7.el6.x86_64
>> pacemaker-cli-1.1.8-7.el6.x86_64
>> cman-3.0.12.1-49.el6_4.2.x86_64
>> 
>> pcs config
>> Corosync Nodes:
>> 
>> Pacemaker Nodes:
>> server3.<domain.com> server4.<domain.com> 
>> 
>> Resources: 
>> Master: DatiClone
>>  Resource: Dati (provider=linbit type=drbd class=ocf)
>>   Attributes: drbd_resource=dati 
>>   Operations: monitor interval=120s
>> Resource: DatiFs (provider=heartbeat type=Filesystem class=ocf)
>>  Attributes: device=/dev/drbd/by-res/dati directory=/srv/dati fstype=ext4 options=noatime,nodiratime,nodev run_fsck=force 
>> Resource: ClusterIp (provider=heartbeat type=IPaddr2 class=ocf)
>>  Attributes: ip=172.16.20.9 cidr_netmask=32 
>>  Operations: monitor interval=60s
>> Resource: Smb (type=smb class=service)
>>  Operations: monitor interval=1min
>> Resource: Nmb (type=nmb class=service)
>>  Operations: monitor interval=1min
>> Resource: PgSQL (type=postgresql class=service)
>>  Operations: monitor interval=1min
>> Resource: SmbManager (type=smbmanager class=service)
>>  Operations: monitor interval=5min
>> Resource: ipmi-fencing3 (type=fence_ipmilan class=stonith)
>>  Attributes: pcmk_host_list=server3.<domain.com>.com ipaddr=172.16.20.6 login=root passwd=pwd123 lanplus=1 
>>  Operations: monitor interval=60s
>> Resource: ipmi-fencing4 (type=fence_ipmilan class=stonith)
>>  Attributes: pcmk_host_list=server4.<domain.com> ipaddr=172.16.20.7 login=root passwd=pwd123 lanplus=1 
>>  Operations: monitor interval=60s
>> 
>> Location Constraints:
>>  Resource: ipmi-fencing4
>>    Disabled on: server4.<domain.com>
>>  Resource: ipmi-fencing3
>>    Disabled on: server3.<domain.com>
>> Ordering Constraints:
>>  start ClusterIp then start Smb
>>  start Nmb then start Smb
>>  promote DatiClone then start DatiFs
>>  start DatiFs then start Nmb
>>  start DatiFs then start PgSQL
>>  start PgSQL then start SmbManager
>> Colocation Constraints:
>>  ClusterIp with Smb
>>  Smb with Nmb
>>  Smb with DatiFs
>>  DatiFs with DatiClone (with-rsc-role:Master)
>>  PgSQL with DatiFs
>>  SmbManager with DatiFs
>> 
>> Cluster Properties:
>> dc-version: 1.1.8-7.el6-394e906
>> cluster-infrastructure: cman
>> no-quorum-policy: ignore
>> stonith-enabled: true
>> 
>> 
>>> 
>>>> is it safe to remove the files older than a months with a cron script?
>>>> 
>>> Yes
>> 
>> ok thanks,
>> Nicola
>> 
>>> 
>>>> thanks
>>>> Nicola
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: 
>>>> Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> 
>>>> Project Home: 
>>>> http://www.clusterlabs.org
>>>> 
>>>> Getting started: 
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> 
>>>> Bugs: 
>>>> http://bugs.clusterlabs.org
>> 
>