[Pacemaker] "stonith_admin -F node" results in a pair of reboots
Bob Haxo
bhaxo at sgi.com
Sat Jan 4 02:15:23 UTC 2014
Digimer,
Yes, for the configuration that includes drbd, 'crm-fence-peer.sh' and
'resource-and-stonith' are included in the configuration.
Thanks,
Bob Haxo
On Wed, 2014-01-01 at 01:04 -0500, Digimer wrote:
> Did you hook DRBD into pacemaker's fencing using 'crm-fence-peer.sh' and
> set the fencing policy to 'resource-and-stonith;'? If not, do so! It
> will protect against split-brains.
>
> digimer
>
> On 01/01/14 01:03 AM, Bob Haxo wrote:
> > Digimer,
> >
> > Ok, sounds reasonable and I will investigate this further on Jan 2. WRT
> > DRBD ... geeee, I don't recall multiple fencings. I'll check that also
> > on Jan 2.
> >
> > Emmanuel,
> >
> > I have not seen pending fencing operations with "dlm_tool ls" ... but I
> > have seen the word "pending" elsewhere (crm_mon?) without considering
> > that it might be fencing that is pending. Interesting.
> >
> > Thanks & my best wishes for a healthy new year.
> > Bob Haxo
> >
> >
> > On Wed, 2014-01-01 at 00:19 -0500, Digimer wrote:
> >> This is probably because cman (which is it's own cluster stack and used
> >> to provide DLM and quorum to pacemaker on EL6) detected the node failed
> >> after the initial fence and called it's own fence. You see a similar
> >> behaviour when using DRBD. It will also call a fence when the peer dies
> >> (even when it died because of a controlled fence call). In theory,
> >> pacemaker using cman's dlm with DRBD would trigger three fences per
> >> failure. :)
> >>
> >> digimer
> >>
> >> On 01/01/14 12:04 AM, emmanuel segura wrote:
> >> > maybe you missing log when you had fenced the node? because i think the
> >> > clvmd hungup because your node are in unclean state, use dlm_tool ls to
> >> > see if you any pending fencing operation.
> >> >
> >> >
> >> > 2014/1/1 Bob Haxo <bhaxo at sgi.com <mailto:bhaxo at sgi.com> <mailto:bhaxo at sgi.com>>
> >> >
> >> > __
> >> > Greetings ... Happy New Year!
> >> >
> >> > I am testing a configuration that is created from example in
> >> > "Chapter 6. Configuring a GFS2 File System in a Cluster" of the "Red
> >> > Hat Enterprise Linux 7.0 Beta Global File System 2" document. Only
> >> > addition is stonith:fence_ipmilan. After encountering this issue
> >> > when I configured with "crm", I re-configured using "pcs". I've
> >> > included the configuration below.
> >> >
> >> > I'm thinking that, in a 2-node cluster, if I run "stonith_admin -F
> >> > <peer-node>", then <peer-node> should reboot and cleanly rejoin the
> >> > cluster. This is not happening.
> >> >
> >> > What ultimately happens is that after the initially fenced node
> >> > reboots, the system from which the stonith_admin -F command was run
> >> > is fenced and reboots. The fencing stops there, leaving the cluster
> >> > in an appropriate state.
> >> >
> >> > The issue seems to reside with clvmd/lvm. With the reboot of the
> >> > initially fenced node, the clvmd resource fails on the surviving
> >> > node, with a maximum of errors. I hypothesize there is an issue
> >> > with locks, but have insufficient knowledge of clvmd/lvm locks to
> >> > prove or disprove this hypothesis.
> >> >
> >> > Have I missed something ...
> >> >
> >> > 1) Is this expected behavior, and always the reboot of the fencing
> >> > node happens?
> >> >
> >> > 2) Or, maybe I didn't correctly duplicate the Chapter 6 example?
> >> >
> >> > 3) Or, perhaps something is wrong or omitted from the Chapter 6 example?
> >> >
> >> > Suggestions will be much appreciated.
> >> >
> >> > Thanks,
> >> > Bob Haxo
> >> >
> >> > RHEL6.5
> >> > pacemaker-cli-1.1.10-14.el6_5.1.x86_64
> >> > crmsh-1.2.5-55.1sgi709r3.rhel6.x86_64
> >> > pacemaker-libs-1.1.10-14.el6_5.1.x86_64
> >> > cman-3.0.12.1-59.el6_5.1.x86_64
> >> > pacemaker-1.1.10-14.el6_5.1.x86_64
> >> > corosynclib-1.4.1-17.el6.x86_64
> >> > corosync-1.4.1-17.el6.x86_64
> >> > pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
> >> >
> >> > Cluster Name: mici
> >> > Corosync Nodes:
> >> >
> >> > Pacemaker Nodes:
> >> > mici-admin mici-admin2
> >> >
> >> > Resources:
> >> > Clone: clusterfs-clone
> >> > Meta Attrs: interleave=true target-role=Started
> >> > Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem)
> >> > Attributes: device=/dev/vgha2/lv_clust2 directory=/images
> >> > fstype=gfs2 options=defaults,noatime,nodiratime
> >> > Operations: monitor on-fail=fence interval=30s
> >> > (clusterfs-monitor-interval-30s)
> >> > Clone: clvmd-clone
> >> > Meta Attrs: interleave=true ordered=true target-role=Started
> >> > Resource: clvmd (class=lsb type=clvmd)
> >> > Operations: monitor on-fail=fence interval=30s
> >> > (clvmd-monitor-interval-30s)
> >> > Clone: dlm-clone
> >> > Meta Attrs: interleave=true ordered=true
> >> > Resource: dlm (class=ocf provider=pacemaker type=controld)
> >> > Operations: monitor on-fail=fence interval=30s
> >> > (dlm-monitor-interval-30s)
> >> >
> >> > Stonith Devices:
> >> > Resource: p_ipmi_fencing_1 (class=stonith type=fence_ipmilan)
> >> > Attributes: ipaddr=128.##.##.78 login=XXXXX passwd=XXXXX
> >> > lanplus=1 action=reboot pcmk_host_check=static-list
> >> > pcmk_host_list=mici-admin
> >> > Meta Attrs: target-role=Started
> >> > Operations: monitor start-delay=30 interval=60s timeout=30
> >> > (p_ipmi_fencing_1-monitor-60s)
> >> > Resource: p_ipmi_fencing_2 (class=stonith type=fence_ipmilan)
> >> > Attributes: ipaddr=128.##.##.220 login=XXXXX passwd=XXXXX
> >> > lanplus=1 action=reboot pcmk_host_check=static-list
> >> > pcmk_host_list=mici-admin2
> >> > Meta Attrs: target-role=Started
> >> > Operations: monitor start-delay=30 interval=60s timeout=30
> >> > (p_ipmi_fencing_2-monitor-60s)
> >> > Fencing Levels:
> >> >
> >> > Location Constraints:
> >> > Resource: p_ipmi_fencing_1
> >> > Disabled on: mici-admin (score:-INFINITY)
> >> > (id:location-p_ipmi_fencing_1-mici-admin--INFINITY)
> >> > Resource: p_ipmi_fencing_2
> >> > Disabled on: mici-admin2 (score:-INFINITY)
> >> > (id:location-p_ipmi_fencing_2-mici-admin2--INFINITY)
> >> > Ordering Constraints:
> >> > start dlm-clone then start clvmd-clone (Mandatory)
> >> > (id:order-dlm-clone-clvmd-clone-mandatory)
> >> > start clvmd-clone then start clusterfs-clone (Mandatory)
> >> > (id:order-clvmd-clone-clusterfs-clone-mandatory)
> >> > Colocation Constraints:
> >> > clusterfs-clone with clvmd-clone (INFINITY)
> >> > (id:colocation-clusterfs-clone-clvmd-clone-INFINITY)
> >> > clvmd-clone with dlm-clone (INFINITY)
> >> > (id:colocation-clvmd-clone-dlm-clone-INFINITY)
> >> >
> >> > Cluster Properties:
> >> > cluster-infrastructure: cman
> >> > dc-version: 1.1.10-14.el6_5.1-368c726
> >> > last-lrm-refresh: 1388530552
> >> > no-quorum-policy: ignore
> >> > stonith-enabled: true
> >> > Node Attributes:
> >> > mici-admin: standby=off
> >> > mici-admin2: standby=off
> >> >
> >> >
> >> > Last updated: Tue Dec 31 17:15:55 2013
> >> > Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin
> >> > Stack: cman
> >> > Current DC: mici-admin2 - partition with quorum
> >> > Version: 1.1.10-14.el6_5.1-368c726
> >> > 2 Nodes configured
> >> > 8 Resources configured
> >> >
> >> > Online: [ mici-admin mici-admin2 ]
> >> >
> >> > Full list of resources:
> >> >
> >> > p_ipmi_fencing_1 (stonith:fence_ipmilan): Started
> >> > mici-admin2
> >> > p_ipmi_fencing_2 (stonith:fence_ipmilan): Started
> >> > mici-admin
> >> > Clone Set: clusterfs-clone [clusterfs]
> >> > Started: [ mici-admin mici-admin2 ]
> >> > Clone Set: clvmd-clone [clvmd]
> >> > Started: [ mici-admin mici-admin2 ]
> >> > Clone Set: dlm-clone [dlm]
> >> > Started: [ mici-admin mici-admin2 ]
> >> >
> >> > Migration summary:
> >> > * Node mici-admin:
> >> > * Node mici-admin2:
> >> >
> >> > =====================================================
> >> > crm_mon after the fenced node reboots. Shows the failure of clvmd
> >> > that then
> >> > occurs, which in turn triggers a fencing of that nnode
> >> >
> >> > Last updated: Tue Dec 31 17:06:55 2013
> >> > Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin
> >> > Stack: cman
> >> > Current DC: mici-admin - partition with quorum
> >> > Version: 1.1.10-14.el6_5.1-368c726
> >> > 2 Nodes configured
> >> > 8 Resources configured
> >> >
> >> > Node mici-admin: UNCLEAN (online)
> >> > Online: [ mici-admin2 ]
> >> >
> >> > Full list of resources:
> >> >
> >> > p_ipmi_fencing_1 (stonith:fence_ipmilan): Stopped
> >> > p_ipmi_fencing_2 (stonith:fence_ipmilan): Started
> >> > mici-admin
> >> > Clone Set: clusterfs-clone [clusterfs]
> >> > Started: [ mici-admin ]
> >> > Stopped: [ mici-admin2 ]
> >> > Clone Set: clvmd-clone [clvmd]
> >> > clvmd (lsb:clvmd): FAILED mici-admin
> >> > Stopped: [ mici-admin2 ]
> >> > Clone Set: dlm-clone [dlm]
> >> > Started: [ mici-admin mici-admin2 ]
> >> >
> >> > Migration summary:
> >> > * Node mici-admin:
> >> > clvmd: migration-threshold=1000000 fail-count=1
> >> > last-failure='Tue Dec 31 17:04:29 2013'
> >> > * Node mici-admin2:
> >> >
> >> > Failed actions:
> >> > clvmd_monitor_30000 on mici-admin 'unknown error' (1): call=60,
> >> > status=Timed Out, la
> >> > st-rc-change='Tue Dec 31 17:04:29 2013', queued=0ms, exec=0ms
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Pacemaker mailing list:Pacemaker at oss.clusterlabs.org <mailto:Pacemaker at oss.clusterlabs.org>
> >> > <mailto:Pacemaker at oss.clusterlabs.org>
> >> >http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> >
> >> > Project Home:http://www.clusterlabs.org
> >> > Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > Bugs:http://bugs.clusterlabs.org
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > esta es mi vida e me la vivo hasta que dios quiera
> >> >
> >> >
> >> > _______________________________________________
> >> > Pacemaker mailing list:Pacemaker at oss.clusterlabs.org <mailto:Pacemaker at oss.clusterlabs.org>
> >> >http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> >
> >> > Project Home:http://www.clusterlabs.org
> >> > Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > Bugs:http://bugs.clusterlabs.org
> >> >
> >>
> >>
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140103/a4ecc864/attachment.htm>
More information about the Pacemaker
mailing list