[Pacemaker] Nodes will not promote DRBD resources to master on failover
emmanuel segura
emi2fast at gmail.com
Fri Mar 30 14:43:45 UTC 2012
can you show me?
crm configure show
Il giorno 30 marzo 2012 16:10, Andrew Martin <amartin at xes-inc.com> ha
scritto:
> Hi Andreas,
>
> Here is a copy of my complete CIB:
> http://pastebin.com/v5wHVFuy
>
> I'll work on generating a report using crm_report as well.
>
> Thanks,
>
> Andrew
>
> ------------------------------
> *From: *"Andreas Kurz" <andreas at hastexo.com>
> *To: *pacemaker at oss.clusterlabs.org
> *Sent: *Friday, March 30, 2012 4:41:16 AM
> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> master on failover
>
> On 03/28/2012 04:56 PM, Andrew Martin wrote:
> > Hi Andreas,
> >
> > I disabled the DRBD init script and then restarted the slave node
> > (node2). After it came back up, DRBD did not start:
> > Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending
> > Online: [ node2 node1 ]
> >
> > Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
> > Masters: [ node1 ]
> > Stopped: [ p_drbd_vmstore:1 ]
> > Master/Slave Set: ms_drbd_mount1 [p_drbd_tools]
> > Masters: [ node1 ]
> > Stopped: [ p_drbd_mount1:1 ]
> > Master/Slave Set: ms_drbd_mount2 [p_drbdmount2]
> > Masters: [ node1 ]
> > Stopped: [ p_drbd_mount2:1 ]
> > ...
> >
> > root at node2:~# service drbd status
> > drbd not loaded
>
> Yes, expected unless Pacemaker starts DRBD
>
> >
> > Is there something else I need to change in the CIB to ensure that DRBD
> > is started? All of my DRBD devices are configured like this:
> > primitive p_drbd_mount2 ocf:linbit:drbd \
> > params drbd_resource="mount2" \
> > op monitor interval="15" role="Master" \
> > op monitor interval="30" role="Slave"
> > ms ms_drbd_mount2 p_drbd_mount2 \
> > meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true"
>
> That should be enough ... unable to say more without seeing the complete
> configuration ... too much fragments of information ;-)
>
> Please provide (e.g. pastebin) your complete cib (cibadmin -Q) when
> cluster is in that state ... or even better create a crm_report archive
>
> >
> > Here is the output from the syslog (grep -i drbd /var/log/syslog):
> > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
> > key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
> > op=p_drbd_vmstore:1_monitor_0 )
> > Mar 28 09:24:47 node2 lrmd: [3210]: info: rsc:p_drbd_vmstore:1 probe[2]
> > (pid 3455)
> > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
> > key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
> > op=p_drbd_mount1:1_monitor_0 )
> > Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount1:1 probe[3]
> > (pid 3456)
> > Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
> > key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
> > op=p_drbd_mount2:1_monitor_0 )
> > Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount2:1 probe[4]
> > (pid 3457)
> > Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING: Couldn't find
> > device [/dev/drbd0]. Expected /dev/??? to exist
> > Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked:
> > crm_attribute -N node2 -n master-p_drbd_mount2:1 -l reboot -D
> > Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked:
> > crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l reboot -D
> > Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked:
> > crm_attribute -N node2 -n master-p_drbd_mount1:1 -l reboot -D
> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[4] on
> > p_drbd_mount2:1 for client 3213: pid 3457 exited with return code 7
> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[2] on
> > p_drbd_vmstore:1 for client 3213: pid 3455 exited with return code 7
> > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
> > operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=10,
> > confirmed=true) not running
> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[3] on
> > p_drbd_mount1:1 for client 3213: pid 3456 exited with return code 7
> > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
> > operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11,
> > confirmed=true) not running
> > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
> > operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=12,
> > confirmed=true) not running
>
> No errors, just probing ... so for any reason Pacemaker does not like to
> start it ... use crm_simulate to find out why ... or provide information
> as requested above.
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
> >
> > Thanks,
> >
> > Andrew
> >
> > ------------------------------------------------------------------------
> > *From: *"Andreas Kurz" <andreas at hastexo.com>
> > *To: *pacemaker at oss.clusterlabs.org
> > *Sent: *Wednesday, March 28, 2012 9:03:06 AM
> > *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> > master on failover
> >
> > On 03/28/2012 03:47 PM, Andrew Martin wrote:
> >> Hi Andreas,
> >>
> >>> hmm ... what is that fence-peer script doing? If you want to use
> >>> resource-level fencing with the help of dopd, activate the
> >>> drbd-peer-outdater script in the line above ... and double check if the
> >>> path is correct
> >> fence-peer is just a wrapper for drbd-peer-outdater that does some
> >> additional logging. In my testing dopd has been working well.
> >
> > I see
> >
> >>
> >>>> I am thinking of making the following changes to the CIB (as per the
> >>>> official DRBD
> >>>> guide
> >>
> > http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html)
> in
> >>>> order to add the DRBD lsb service and require that it start before the
> >>>> ocf:linbit:drbd resources. Does this look correct?
> >>>
> >>> Where did you read that? No, deactivate the startup of DRBD on system
> >>> boot and let Pacemaker manage it completely.
> >>>
> >>>> primitive p_drbd-init lsb:drbd op monitor interval="30"
> >>>> colocation c_drbd_together inf:
> >>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
> >>>> ms_drbd_mount2:Master
> >>>> order drbd_init_first inf: ms_drbd_vmstore:promote
> >>>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start
> >>>>
> >>>> This doesn't seem to require that drbd be also running on the node
> where
> >>>> the ocf:linbit:drbd resources are slave (which it would need to do to
> be
> >>>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere?
> >>>> (clone cl_drbd p_drbd-init ?)
> >>>
> >>> This is really not needed.
> >> I was following the official DRBD Users Guide:
> >>
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html
> >>
> >> If I am understanding your previous message correctly, I do not need to
> >> add a lsb primitive for the drbd daemon? It will be
> >> started/stopped/managed automatically by my ocf:linbit:drbd resources
> >> (and I can remove the /etc/rc* symlinks)?
> >
> > Yes, you don't need that LSB script when using Pacemaker and should not
> > let init start it.
> >
> > Regards,
> > Andreas
> >
> > --
> > Need help with Pacemaker?
> > http://www.hastexo.com/now
> >
> >>
> >> Thanks,
> >>
> >> Andrew
> >>
> >> ------------------------------------------------------------------------
> >> *From: *"Andreas Kurz" <andreas at hastexo.com <mailto:andreas at hastexo.com
> >>
> >> *To: *pacemaker at oss.clusterlabs.org <mailto:
> pacemaker at oss.clusterlabs.org>
> >> *Sent: *Wednesday, March 28, 2012 7:27:34 AM
> >> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> >> master on failover
> >>
> >> On 03/28/2012 12:13 AM, Andrew Martin wrote:
> >>> Hi Andreas,
> >>>
> >>> Thanks, I've updated the colocation rule to be in the correct order. I
> >>> also enabled the STONITH resource (this was temporarily disabled before
> >>> for some additional testing). DRBD has its own network connection over
> >>> the br1 interface (192.168.5.0/24 network), a direct crossover cable
> >>> between node1 and node2:
> >>> global { usage-count no; }
> >>> common {
> >>> syncer { rate 110M; }
> >>> }
> >>> resource vmstore {
> >>> protocol C;
> >>> startup {
> >>> wfc-timeout 15;
> >>> degr-wfc-timeout 60;
> >>> }
> >>> handlers {
> >>> #fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t
> 5";
> >>> fence-peer "/usr/local/bin/fence-peer";
> >>
> >> hmm ... what is that fence-peer script doing? If you want to use
> >> resource-level fencing with the help of dopd, activate the
> >> drbd-peer-outdater script in the line above ... and double check if the
> >> path is correct
> >>
> >>> split-brain "/usr/lib/drbd/notify-split-brain.sh
> >>> me at example.com <mailto:me at example.com>";
> >>> }
> >>> net {
> >>> after-sb-0pri discard-zero-changes;
> >>> after-sb-1pri discard-secondary;
> >>> after-sb-2pri disconnect;
> >>> cram-hmac-alg md5;
> >>> shared-secret "xxxxx";
> >>> }
> >>> disk {
> >>> fencing resource-only;
> >>> }
> >>> on node1 {
> >>> device /dev/drbd0;
> >>> disk /dev/sdb1;
> >>> address 192.168.5.10:7787;
> >>> meta-disk internal;
> >>> }
> >>> on node2 {
> >>> device /dev/drbd0;
> >>> disk /dev/sdf1;
> >>> address 192.168.5.11:7787;
> >>> meta-disk internal;
> >>> }
> >>> }
> >>> # and similar for mount1 and mount2
> >>>
> >>> Also, here is my ha.cf. It uses both the direct link between the nodes
> >>> (br1) and the shared LAN network on br0 for communicating:
> >>> autojoin none
> >>> mcast br0 239.0.0.43 694 1 0
> >>> bcast br1
> >>> warntime 5
> >>> deadtime 15
> >>> initdead 60
> >>> keepalive 2
> >>> node node1
> >>> node node2
> >>> node quorumnode
> >>> crm respawn
> >>> respawn hacluster /usr/lib/heartbeat/dopd
> >>> apiauth dopd gid=haclient uid=hacluster
> >>>
> >>> I am thinking of making the following changes to the CIB (as per the
> >>> official DRBD
> >>> guide
> >>
> > http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html)
> in
> >>> order to add the DRBD lsb service and require that it start before the
> >>> ocf:linbit:drbd resources. Does this look correct?
> >>
> >> Where did you read that? No, deactivate the startup of DRBD on system
> >> boot and let Pacemaker manage it completely.
> >>
> >>> primitive p_drbd-init lsb:drbd op monitor interval="30"
> >>> colocation c_drbd_together inf:
> >>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
> >>> ms_drbd_mount2:Master
> >>> order drbd_init_first inf: ms_drbd_vmstore:promote
> >>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start
> >>>
> >>> This doesn't seem to require that drbd be also running on the node
> where
> >>> the ocf:linbit:drbd resources are slave (which it would need to do to
> be
> >>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere?
> >>> (clone cl_drbd p_drbd-init ?)
> >>
> >> This is really not needed.
> >>
> >> Regards,
> >> Andreas
> >>
> >> --
> >> Need help with Pacemaker?
> >> http://www.hastexo.com/now
> >>
> >>>
> >>> Thanks,
> >>>
> >>> Andrew
> >>>
> ------------------------------------------------------------------------
> >>> *From: *"Andreas Kurz" <andreas at hastexo.com <mailto:
> andreas at hastexo.com>>
> >>> *To: *pacemaker at oss.clusterlabs.org
> > <mailto:*pacemaker at oss.clusterlabs.org>
> >>> *Sent: *Monday, March 26, 2012 5:56:22 PM
> >>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> >>> master on failover
> >>>
> >>> On 03/24/2012 08:15 PM, Andrew Martin wrote:
> >>>> Hi Andreas,
> >>>>
> >>>> My complete cluster configuration is as follows:
> >>>> ============
> >>>> Last updated: Sat Mar 24 13:51:55 2012
> >>>> Last change: Sat Mar 24 13:41:55 2012
> >>>> Stack: Heartbeat
> >>>> Current DC: node2 (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition
> >>>> with quorum
> >>>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> >>>> 3 Nodes configured, unknown expected votes
> >>>> 19 Resources configured.
> >>>> ============
> >>>>
> >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE
> > (standby)
> >>>> Online: [ node2 node1 ]
> >>>>
> >>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
> >>>> Masters: [ node2 ]
> >>>> Slaves: [ node1 ]
> >>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
> >>>> Masters: [ node2 ]
> >>>> Slaves: [ node1 ]
> >>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
> >>>> Masters: [ node2 ]
> >>>> Slaves: [ node1 ]
> >>>> Resource Group: g_vm
> >>>> p_fs_vmstore(ocf::heartbeat:Filesystem):Started node2
> >>>> p_vm(ocf::heartbeat:VirtualDomain):Started node2
> >>>> Clone Set: cl_daemons [g_daemons]
> >>>> Started: [ node2 node1 ]
> >>>> Stopped: [ g_daemons:2 ]
> >>>> Clone Set: cl_sysadmin_notify [p_sysadmin_notify]
> >>>> Started: [ node2 node1 ]
> >>>> Stopped: [ p_sysadmin_notify:2 ]
> >>>> stonith-node1(stonith:external/tripplitepdu):Started node2
> >>>> stonith-node2(stonith:external/tripplitepdu):Started node1
> >>>> Clone Set: cl_ping [p_ping]
> >>>> Started: [ node2 node1 ]
> >>>> Stopped: [ p_ping:2 ]
> >>>>
> >>>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \
> >>>> attributes standby="off"
> >>>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \
> >>>> attributes standby="off"
> >>>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \
> >>>> attributes standby="on"
> >>>> primitive p_drbd_mount2 ocf:linbit:drbd \
> >>>> params drbd_resource="mount2" \
> >>>> op monitor interval="15" role="Master" \
> >>>> op monitor interval="30" role="Slave"
> >>>> primitive p_drbd_mount1 ocf:linbit:drbd \
> >>>> params drbd_resource="mount1" \
> >>>> op monitor interval="15" role="Master" \
> >>>> op monitor interval="30" role="Slave"
> >>>> primitive p_drbd_vmstore ocf:linbit:drbd \
> >>>> params drbd_resource="vmstore" \
> >>>> op monitor interval="15" role="Master" \
> >>>> op monitor interval="30" role="Slave"
> >>>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \
> >>>> params device="/dev/drbd0" directory="/vmstore" fstype="ext4"
> \
> >>>> op start interval="0" timeout="60s" \
> >>>> op stop interval="0" timeout="60s" \
> >>>> op monitor interval="20s" timeout="40s"
> >>>> primitive p_libvirt-bin upstart:libvirt-bin \
> >>>> op monitor interval="30"
> >>>> primitive p_ping ocf:pacemaker:ping \
> >>>> params name="p_ping" host_list="192.168.1.10 192.168.1.11"
> >>>> multiplier="1000" \
> >>>> op monitor interval="20s"
> >>>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \
> >>>> params email="me at example.com <mailto:me at example.com>" \
> >>>> params subject="Pacemaker Change" \
> >>>> op start interval="0" timeout="10" \
> >>>> op stop interval="0" timeout="10" \
> >>>> op monitor interval="10" timeout="10"
> >>>> primitive p_vm ocf:heartbeat:VirtualDomain \
> >>>> params config="/vmstore/config/vm.xml" \
> >>>> meta allow-migrate="false" \
> >>>> op start interval="0" timeout="120s" \
> >>>> op stop interval="0" timeout="120s" \
> >>>> op monitor interval="10" timeout="30"
> >>>> primitive stonith-node1 stonith:external/tripplitepdu \
> >>>> params pdu_ipaddr="192.168.1.12" pdu_port="1"
> pdu_username="xxx"
> >>>> pdu_password="xxx" hostname_to_stonith="node1"
> >>>> primitive stonith-node2 stonith:external/tripplitepdu \
> >>>> params pdu_ipaddr="192.168.1.12" pdu_port="2"
> pdu_username="xxx"
> >>>> pdu_password="xxx" hostname_to_stonith="node2"
> >>>> group g_daemons p_libvirt-bin
> >>>> group g_vm p_fs_vmstore p_vm
> >>>> ms ms_drbd_mount2 p_drbd_mount2 \
> >>>> meta master-max="1" master-node-max="1" clone-max="2"
> >>>> clone-node-max="1" notify="true"
> >>>> ms ms_drbd_mount1 p_drbd_mount1 \
> >>>> meta master-max="1" master-node-max="1" clone-max="2"
> >>>> clone-node-max="1" notify="true"
> >>>> ms ms_drbd_vmstore p_drbd_vmstore \
> >>>> meta master-max="1" master-node-max="1" clone-max="2"
> >>>> clone-node-max="1" notify="true"
> >>>> clone cl_daemons g_daemons
> >>>> clone cl_ping p_ping \
> >>>> meta interleave="true"
> >>>> clone cl_sysadmin_notify p_sysadmin_notify
> >>>> location l-st-node1 stonith-node1 -inf: node1
> >>>> location l-st-node2 stonith-node2 -inf: node2
> >>>> location l_run_on_most_connected p_vm \
> >>>> rule $id="l_run_on_most_connected-rule" p_ping: defined p_ping
> >>>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
> >>>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
> >>>
> >>> As Emmanuel already said, g_vm has to be in the first place in this
> >>> collocation constraint .... g_vm must be colocated with the drbd
> masters.
> >>>
> >>>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote
> >>>> ms_drbd_mount2:promote cl_daemons:start g_vm:start
> >>>> property $id="cib-bootstrap-options" \
> >>>> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> >>>> cluster-infrastructure="Heartbeat" \
> >>>> stonith-enabled="false" \
> >>>> no-quorum-policy="stop" \
> >>>> last-lrm-refresh="1332539900" \
> >>>> cluster-recheck-interval="5m" \
> >>>> crmd-integration-timeout="3m" \
> >>>> shutdown-escalation="5m"
> >>>>
> >>>> The STONITH plugin is a custom plugin I wrote for the Tripp-Lite
> >>>> PDUMH20ATNET that I'm using as the STONITH device:
> >>>> http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf
> >>>
> >>> And why don't using it? .... stonith-enabled="false"
> >>>
> >>>>
> >>>> As you can see, I left the DRBD service to be started by the operating
> >>>> system (as an lsb script at boot time) however Pacemaker controls
> >>>> actually bringing up/taking down the individual DRBD devices.
> >>>
> >>> Don't start drbd on system boot, give Pacemaker the full control.
> >>>
> >>> The
> >>>> behavior I observe is as follows: I issue "crm resource migrate p_vm"
> on
> >>>> node1 and failover successfully to node2. During this time, node2
> fences
> >>>> node1's DRBD devices (using dopd) and marks them as Outdated.
> Meanwhile
> >>>> node2's DRBD devices are UpToDate. I then shutdown both nodes and then
> >>>> bring them back up. They reconnect to the cluster (with quorum), and
> >>>> node1's DRBD devices are still Outdated as expected and node2's DRBD
> >>>> devices are still UpToDate, as expected. At this point, DRBD starts on
> >>>> both nodes, however node2 will not set DRBD as master:
> >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE
> > (standby)
> >>>> Online: [ node2 node1 ]
> >>>>
> >>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
> >>>> Slaves: [ node1 node2 ]
> >>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
> >>>> Slaves: [ node1 node 2 ]
> >>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
> >>>> Slaves: [ node1 node2 ]
> >>>
> >>> There should really be no interruption of the drbd replication on vm
> >>> migration that activates the dopd ... drbd has its own direct network
> >>> connection?
> >>>
> >>> Please share your ha.cf file and your drbd configuration. Watch out
> for
> >>> drbd messages in your kernel log file, that should give you additional
> >>> information when/why the drbd connection was lost.
> >>>
> >>> Regards,
> >>> Andreas
> >>>
> >>> --
> >>> Need help with Pacemaker?
> >>> http://www.hastexo.com/now
> >>>
> >>>>
> >>>> I am having trouble sorting through the logging information because
> >>>> there is so much of it in /var/log/daemon.log, but I can't find an
> >>>> error message printed about why it will not promote node2. At this
> point
> >>>> the DRBD devices are as follows:
> >>>> node2: cstate = WFConnection dstate=UpToDate
> >>>> node1: cstate = StandAlone dstate=Outdated
> >>>>
> >>>> I don't see any reason why node2 can't become DRBD master, or am I
> >>>> missing something? If I do "drbdadm connect all" on node1, then the
> >>>> cstate on both nodes changes to "Connected" and node2 immediately
> >>>> promotes the DRBD resources to master. Any ideas on why I'm observing
> >>>> this incorrect behavior?
> >>>>
> >>>> Any tips on how I can better filter through the pacemaker/heartbeat
> logs
> >>>> or how to get additional useful debug information?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Andrew
> >>>>
> >>>>
> ------------------------------------------------------------------------
> >>>> *From: *"Andreas Kurz" <andreas at hastexo.com
> > <mailto:andreas at hastexo.com>>
> >>>> *To: *pacemaker at oss.clusterlabs.org
> >> <mailto:*pacemaker at oss.clusterlabs.org>
> >>>> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM
> >>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> >>>> master on failover
> >>>>
> >>>> On 01/25/2012 08:58 PM, Andrew Martin wrote:
> >>>>> Hello,
> >>>>>
> >>>>> Recently I finished configuring a two-node cluster with pacemaker
> 1.1.6
> >>>>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This cluster
> > includes
> >>>>> the following resources:
> >>>>> - primitives for DRBD storage devices
> >>>>> - primitives for mounting the filesystem on the DRBD storage
> >>>>> - primitives for some mount binds
> >>>>> - primitive for starting apache
> >>>>> - primitives for starting samba and nfs servers (following
> instructions
> >>>>> here <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>)
> >>>>> - primitives for exporting nfs shares (ocf:heartbeat:exportfs)
> >>>>
> >>>> not enough information ... please share at least your complete cluster
> >>>> configuration
> >>>>
> >>>> Regards,
> >>>> Andreas
> >>>>
> >>>> --
> >>>> Need help with Pacemaker?
> >>>> http://www.hastexo.com/now
> >>>>
> >>>>>
> >>>>> Perhaps this is best described through the output of crm_mon:
> >>>>> Online: [ node1 node2 ]
> >>>>>
> >>>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged)
> >>>>> p_drbd_mount1:0 (ocf::linbit:drbd): Started node2
> >>> (unmanaged)
> >>>>> p_drbd_mount1:1 (ocf::linbit:drbd): Started node1
> >>>>> (unmanaged) FAILED
> >>>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
> >>>>> p_drbd_mount2:0 (ocf::linbit:drbd): Master node1
> >>>>> (unmanaged) FAILED
> >>>>> Slaves: [ node2 ]
> >>>>> Resource Group: g_core
> >>>>> p_fs_mount1 (ocf::heartbeat:Filesystem): Started node1
> >>>>> p_fs_mount2 (ocf::heartbeat:Filesystem): Started node1
> >>>>> p_ip_nfs (ocf::heartbeat:IPaddr2): Started node1
> >>>>> Resource Group: g_apache
> >>>>> p_fs_mountbind1 (ocf::heartbeat:Filesystem): Started node1
> >>>>> p_fs_mountbind2 (ocf::heartbeat:Filesystem): Started node1
> >>>>> p_fs_mountbind3 (ocf::heartbeat:Filesystem): Started node1
> >>>>> p_fs_varwww (ocf::heartbeat:Filesystem): Started node1
> >>>>> p_apache (ocf::heartbeat:apache): Started node1
> >>>>> Resource Group: g_fileservers
> >>>>> p_lsb_smb (lsb:smbd): Started node1
> >>>>> p_lsb_nmb (lsb:nmbd): Started node1
> >>>>> p_lsb_nfsserver (lsb:nfs-kernel-server): Started node1
> >>>>> p_exportfs_mount1 (ocf::heartbeat:exportfs): Started
> node1
> >>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs): Started
> > node1
> >>>>>
> >>>>> I have read through the Pacemaker Explained
> >>>>>
> >>>>
> >>>
> > <
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained
> >
> >>>>> documentation, however could not find a way to further debug these
> >>>>> problems. First, I put node1 into standby mode to attempt failover to
> >>>>> the other node (node2). Node2 appeared to start the transition to
> >>>>> master, however it failed to promote the DRBD resources to master
> (the
> >>>>> first step). I have attached a copy of this session in commands.log
> and
> >>>>> additional excerpts from /var/log/syslog during important steps. I
> have
> >>>>> attempted everything I can think of to try and start the DRBD
> resource
> >>>>> (e.g. start/stop/promote/manage/cleanup under crm resource,
> restarting
> >>>>> heartbeat) but cannot bring it out of the slave state. However, if
> > I set
> >>>>> it to unmanaged and then run drbdadm primary all in the terminal,
> >>>>> pacemaker is satisfied and continues starting the rest of the
> > resources.
> >>>>> It then failed when attempting to mount the filesystem for mount2,
> the
> >>>>> p_fs_mount2 resource. I attempted to mount the filesystem myself
> > and was
> >>>>> successful. I then unmounted it and ran cleanup on p_fs_mount2 and
> then
> >>>>> it mounted. The rest of the resources started as expected until the
> >>>>> p_exportfs_mount2 resource, which failed as follows:
> >>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs): started node2
> >>>>> (unmanaged) FAILED
> >>>>>
> >>>>> I ran cleanup on this and it started, however when running this test
> >>>>> earlier today no command could successfully start this exportfs
> >> resource.
> >>>>>
> >>>>> How can I configure pacemaker to better resolve these problems and be
> >>>>> able to bring the node up successfully on its own? What can I check
> to
> >>>>> determine why these failures are occuring? /var/log/syslog did not
> seem
> >>>>> to contain very much useful information regarding why the failures
> >>>> occurred.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Andrew
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> This body part will be downloaded on demand.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> <mailto:Pacemaker at oss.clusterlabs.org>
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> <mailto:Pacemaker at oss.clusterlabs.org>
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> <mailto:Pacemaker at oss.clusterlabs.org>
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> <mailto:Pacemaker at oss.clusterlabs.org>
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> <mailto:Pacemaker at oss.clusterlabs.org>
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
--
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120330/34af1c3c/attachment.htm>
More information about the Pacemaker
mailing list