[Pacemaker] /var/lib/pacemaker/cores cleanup
Andrew Beekhof
andrew at beekhof.net
Thu Nov 7 23:36:58 UTC 2013
On 8 Nov 2013, at 10:27 am, Andrew Beekhof <andrew at beekhof.net> wrote:
>
> On 7 Oct 2013, at 5:52 pm, Mailing List SVR <lists at svrinformatica.it> wrote:
>
>> Il 07/10/2013 04:16, Andrew Beekhof ha scritto:
>>> On 05/10/2013, at 7:11 AM, Mailing List SVR <lists at svrinformatica.it>
>>> wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I have a pacemaker cluster running fine since 2 months, I noticed that in the folder /var/lib/pacemaker/cores/root I have about 1,5 GB of files core.xxxx, who is responsabile to cleanup these files,
>>>>
>>> Ideally they would have been reported upstream so the underlying problem that caused them could be fixed.
>>
>> if you are interested here are some core dumps:
>>
>> http://195.250.34.59/temp/cores.tar.bz2
>>
>
> dammit, we're not correctly collecting metadata for the 'service' class.
> these core files are produced when we try to parse the result as xml.
The others are borking on control characters in the xml string.
lrmd_rsc_output=\"Stopping postgresql service: \033[60G[\033[0;32m OK \033[0; ..."
which was fixed by https://github.com/beekhof/pacemaker/commit/c351934 and is included in https://rhn.redhat.com/errata/RHEA-2013-1493.html
>
> at least
>
> [root at pcmk-5 ~]# crm_resource --show-metadata service:nfs
> Usage: nfs {start|stop|status|restart|reload|force-reload|condrestart|try-restart|condstop}
>
> vs.
>
> [root at pcmk-5 ~]# crm_resource --show-metadata lsb:nfs
> <?xml version='1.0'?>
> <!DOCTYPE resource-agent SYSTEM 'ra-api-1.dtd'>
> <resource-agent name='nfs' version='0.1'>
> <version>1.0</version>
> <longdesc lang='en'>
> NFS is a popular protocol for file sharing across networks.
> This service provides NFS server functionality, which is \
> configured via the /etc/exports file.
>
> </longdesc>
> <shortdesc lang='en'>nfs</shortdesc>
> <parameters>
> </parameters>
> <actions>
> <action name='meta-data' timeout='5' />
> <action name='start' timeout='15' />
> <action name='stop' timeout='15' />
> <action name='status' timeout='15' />
> <action name='restart' timeout='15' />
> <action name='force-reload' timeout='15' />
> <action name='monitor' timeout='15' interval='15' />
> </actions>
> <special tag='LSB'>
> <Provides></Provides>
> <Required-Start></Required-Start>
> <Required-Stop></Required-Stop>
> <Should-Start></Should-Start>
> <Should-Stop></Should-Stop>
> <Default-Start></Default-Start>
> <Default-Stop></Default-Stop>
> </special>
> </resource-agent>
>
>
> Fixed in https://github.com/beekhof/pacemaker/commit/644752e
>
>> this is a pacemaker/cman cluster on centos 6.4
>>
>> pacemaker-libs-1.1.8-7.el6.x86_64
>> pacemaker-cluster-libs-1.1.8-7.el6.x86_64
>> pacemaker-1.1.8-7.el6.x86_64
>> pacemaker-cli-1.1.8-7.el6.x86_64
>> cman-3.0.12.1-49.el6_4.2.x86_64
>>
>> pcs config
>> Corosync Nodes:
>>
>> Pacemaker Nodes:
>> server3.<domain.com> server4.<domain.com>
>>
>> Resources:
>> Master: DatiClone
>> Resource: Dati (provider=linbit type=drbd class=ocf)
>> Attributes: drbd_resource=dati
>> Operations: monitor interval=120s
>> Resource: DatiFs (provider=heartbeat type=Filesystem class=ocf)
>> Attributes: device=/dev/drbd/by-res/dati directory=/srv/dati fstype=ext4 options=noatime,nodiratime,nodev run_fsck=force
>> Resource: ClusterIp (provider=heartbeat type=IPaddr2 class=ocf)
>> Attributes: ip=172.16.20.9 cidr_netmask=32
>> Operations: monitor interval=60s
>> Resource: Smb (type=smb class=service)
>> Operations: monitor interval=1min
>> Resource: Nmb (type=nmb class=service)
>> Operations: monitor interval=1min
>> Resource: PgSQL (type=postgresql class=service)
>> Operations: monitor interval=1min
>> Resource: SmbManager (type=smbmanager class=service)
>> Operations: monitor interval=5min
>> Resource: ipmi-fencing3 (type=fence_ipmilan class=stonith)
>> Attributes: pcmk_host_list=server3.<domain.com>.com ipaddr=172.16.20.6 login=root passwd=pwd123 lanplus=1
>> Operations: monitor interval=60s
>> Resource: ipmi-fencing4 (type=fence_ipmilan class=stonith)
>> Attributes: pcmk_host_list=server4.<domain.com> ipaddr=172.16.20.7 login=root passwd=pwd123 lanplus=1
>> Operations: monitor interval=60s
>>
>> Location Constraints:
>> Resource: ipmi-fencing4
>> Disabled on: server4.<domain.com>
>> Resource: ipmi-fencing3
>> Disabled on: server3.<domain.com>
>> Ordering Constraints:
>> start ClusterIp then start Smb
>> start Nmb then start Smb
>> promote DatiClone then start DatiFs
>> start DatiFs then start Nmb
>> start DatiFs then start PgSQL
>> start PgSQL then start SmbManager
>> Colocation Constraints:
>> ClusterIp with Smb
>> Smb with Nmb
>> Smb with DatiFs
>> DatiFs with DatiClone (with-rsc-role:Master)
>> PgSQL with DatiFs
>> SmbManager with DatiFs
>>
>> Cluster Properties:
>> dc-version: 1.1.8-7.el6-394e906
>> cluster-infrastructure: cman
>> no-quorum-policy: ignore
>> stonith-enabled: true
>>
>>
>>>
>>>> is it safe to remove the files older than a months with a cron script?
>>>>
>>> Yes
>>
>> ok thanks,
>> Nicola
>>
>>>
>>>> thanks
>>>> Nicola
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list:
>>>> Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>>
>>>> Project Home:
>>>> http://www.clusterlabs.org
>>>>
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>
>>>> Bugs:
>>>> http://bugs.clusterlabs.org
>>
>
More information about the Pacemaker
mailing list