[Pacemaker] Filesystem resource killing innocent processes on stop
Dejan Muhamedagic
dejanmm at fastmail.fm
Tue May 19 08:44:44 UTC 2015
On Mon, May 18, 2015 at 05:14:14PM +0200, Nikola Ciprich wrote:
> Hi Dejan,
>
> >
> > The list below seems too extensive. Which version of
> > resource-agents do you run?
> >
> > $ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
>
> yes, it's definitely wrong..
>
> here's the info you've requested:
>
> # Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7
>
> rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64
>
> I can already see the problem, this version simply uses
> fuser -m $MOUNTPOINT which seems to return pretty wrong results:
>
> [root at denovav1b ~]# fuser -m /home/cluster/virt/
> /home/cluster/virt/: 1m 3295m 3314m 4817m 4846m 4847m 4890m 4891m 4916m 4944m 4952m 4999m 5007m 5037m 5069m 5137m 5162m 5164m 5166m 5168m 5170m 5172m 5575m 8055m 9604m 9605m 10984m 11186m 11370m 11813m 11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 23902m 24572m 24580m 26300m 29790m 29792m 30785m
>
> (notice even process # 1!) while lsof returns:
>
> lsof | grep "cluster.*virt"
> qemu-syst 8055 root 21r REG 0,0 232783872 1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso
>
> which seems much saner to me..
Indeed. Is fuser broken or is there some kernel side confusion?
Did you also try:
lsof /home/cluster/virt/
Anyway, it would be good to bring this up with the centos people.
Thanks,
Dejan
> BR
>
> nik
>
>
> >
> > > here's example of the log:
> > >
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ? S<s 0:01 /sbin/udevd -d
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2 Ss+ 0:00 agetty ttyS2 115200 vt100
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4271 4395 0 21:58 ? Ss 0:00 sshd: root at pts/12
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ? Rs 0:00 [bash]
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ? Ss 0:03 /usr/sbin/sshd
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ? Ss 0:00 /sbin/portreserve
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ? S 0:00 supervising syslog-ng
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ? Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ? Ss 0:05 rpcbind
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ? Ss 0:00 rpc.statd
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ? Ss 0:00 rpc.idmapd
> > > Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ? S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
> > >
> > > while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be
> > > Filesystem resource itself..
> >
> > Hmm, that's quite strange. That implies that the RA script itself
> > had /home/cluster/virt as its WD.
> >
> > > before I dig deeper into this, did anyone else noticed this problem? Is this some known
> > > (and possibly already issue)?
> >
> > Never heard of this.
> >
> > Thanks,
> >
> > Dejan
> >
> > > thanks a lot in advance
> > >
> > > nik
> > >
> > >
> > > --
> > > -------------------------------------
> > > Ing. Nikola CIPRICH
> > > LinuxBox.cz, s.r.o.
> > > 28.rijna 168, 709 00 Ostrava
> > >
> > > tel.: +420 591 166 214
> > > fax: +420 596 621 273
> > > mobil: +420 777 093 799
> > > www.linuxbox.cz
> > >
> > > mobil servis: +420 737 238 656
> > > email servis: servis at linuxbox.cz
> > > -------------------------------------
> >
> >
> >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> --
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
>
> tel.: +420 591 166 214
> fax: +420 596 621 273
> mobil: +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis at linuxbox.cz
> -------------------------------------
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list