[Pacemaker] Nodes will not promote DRBD resources to master on failover
Andreas Kurz
andreas at hastexo.com
Tue Apr 10 12:28:15 CEST 2012
On 04/10/2012 06:17 AM, Andrew Martin wrote:
> Hi Andreas,
>
> Yes, I attempted to generalize hostnames and usernames/passwords in the
> archive. Sorry for making it more confusing :(
>
> I completely purged pacemaker from all 3 nodes and reinstalled
> everything. I then completely rebuild the CIB by manually adding in each
> primitive/constraint one at a time and testing along the way. After
> doing this DRBD appears to be working at least somewhat better - the
> ocf:linbit:drbd devices are started and managed by pacemaker. However,
> if for example a node is STONITHed when it comes back up it will not
> restart the ocf:linbit:drbd resources until I manually load the DRBD
> kernel module, bring the DRBD devices up (drbdadm up all), and cleanup
> the resources (e.g. crm resource cleanup ms_drbd_vmstore). Is it
> possible that the DRBD kernel module needs to be loaded at boot time,
> independent of pacemaker?
No, this is done by the drbd OCF script on start.
>
> Here's the new CIB (mostly the same as before):
> http://pastebin.com/MxrqBXMp
>
> Typically quorumnode stays in the OFFLINE (standby) state, though
> occasionally it changes to pending. I have just tried
> cleaning /var/lib/heartbeat/crm on quorumnode again so we will see if
> that helps keep it in the OFFLINE (standby) state. I have it explicitly
> set to standby in the CIB configuration and also created a rule to
> prevent some of the resources from running on it?
> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \
> attributes standby="on"
> ...
The node should be in "ONLINE (standby)" state if you start heartbeat
and pacemaker is enabled with "crm yes" or "crm respawn"in ha.cf
> location loc_not_on_quorumnode g_vm -inf: quorumnode
>
> Would it be wise to create additional constraints to prevent all
> resources (including each ms_drbd resource) from running on it, even
> though this should be implied by standby?
There is no need for that. A node in standby will never run resources
and if there is no DRBD and installed on that node your resources won't
start anyways.
>
> Below is a portion of the log from when I started a node yet DRBD failed
> to start. As you can see it thinks the DRBD device is operating
> correctly as it proceeds to starting subsequent resources, e.g.
> Apr 9 20:22:55 node1 Filesystem[2939]: [2956]: WARNING: Couldn't find
> device [/dev/drbd0]. Expected /dev/??? to exist
> http://pastebin.com/zTCHPtWy
The only thing i can read from that log fragments is, that probes are
running ... not enough information. Really interesting would be logs
from the DC.
>
> After seeing these messages in the log I run
> # service drbd start
> # drbdadm up all
> # crm resource cleanup ms_drbd_vmstore
> # crm resource cleanup ms_drbd_mount1
> # crm resource clenaup ms_drbd_mount2
That should all not be needed ... what is the output of "crm_mon -1frA"
before you do all that cleanups?
> After this sequence of commands the DRBD resources appear to be
> functioning normally and the subsequent resources start. Any ideas on
> why DRBD is not being started as expected, or why the cluster is
> continuing with starting resources that according to the o_drbd-fs-vm
> constraint should not start until DRBD is master?
No idea, maybe creating a crm_report archive and sending it to the list
can shed some light on that problem.
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
>
> Thanks,
>
> Andrew
> ------------------------------------------------------------------------
> *From: *"Andreas Kurz" <andreas at hastexo.com>
> *To: *pacemaker at oss.clusterlabs.org
> *Sent: *Monday, April 2, 2012 6:33:44 PM
> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> master on failover
>
> On 04/02/2012 05:47 PM, Andrew Martin wrote:
>> Hi Andreas,
>>
>> Here is the crm_report:
>> http://dl.dropbox.com/u/2177298/pcmk-Mon-02-Apr-2012.bz2
>
> You tried to do some obfuscation on parts of that archive? ... doesn't
> really make it easier to debug ....
>
> Does the third node ever change its state?
>
> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending
>
> Looking at the logs and the transition graph says it aborts due to
> un-runable operations on that node which seems to be related to it's
> pending state.
>
> Try to get that node up (or down) completely ... maybe a fresh
> start-over with a clean /var/lib/heartbeat/crm directory is sufficient.
>
> Regards,
> Andreas
>
>>
>> Hi Emmanuel,
>>
>> Here is the configuration:
>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \
>> attributes standby="off"
>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \
>> attributes standby="off"
>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \
>> attributes standby="on"
>> primitive p_drbd_mount2 ocf:linbit:drbd \
>> params drbd_resource="mount2" \
>> op start interval="0" timeout="240" \
>> op stop interval="0" timeout="100" \
>> op monitor interval="10" role="Master" timeout="20" start-delay="1m" \
>> op monitor interval="20" role="Slave" timeout="20" start-delay="1m"
>> primitive p_drbd_mount1 ocf:linbit:drbd \
>> params drbd_resource="mount1" \
>> op start interval="0" timeout="240" \
>> op stop interval="0" timeout="100" \
>> op monitor interval="10" role="Master" timeout="20" start-delay="1m" \
>> op monitor interval="20" role="Slave" timeout="20" start-delay="1m"
>> primitive p_drbd_vmstore ocf:linbit:drbd \
>> params drbd_resource="vmstore" \
>> op start interval="0" timeout="240" \
>> op stop interval="0" timeout="100" \
>> op monitor interval="10" role="Master" timeout="20" start-delay="1m" \
>> op monitor interval="20" role="Slave" timeout="20" start-delay="1m"
>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \
>> params device="/dev/drbd0" directory="/mnt/storage/vmstore"
> fstype="ext4" \
>> op start interval="0" timeout="60s" \
>> op stop interval="0" timeout="60s" \
>> op monitor interval="20s" timeout="40s"
>> primitive p_libvirt-bin upstart:libvirt-bin \
>> op monitor interval="30"
>> primitive p_ping ocf:pacemaker:ping \
>> params name="p_ping" host_list="192.168.3.1 192.168.3.2"
> multiplier="1000" \
>> op monitor interval="20s"
>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \
>> params email="me at example.com" \
>> params subject="Pacemaker Change" \
>> op start interval="0" timeout="10" \
>> op stop interval="0" timeout="10" \
>> op monitor interval="10" timeout="10"
>> primitive p_vm ocf:heartbeat:VirtualDomain \
>> params config="/mnt/storage/vmstore/config/vm.xml" \
>> meta allow-migrate="false" \
>> op start interval="0" timeout="180" \
>> op stop interval="0" timeout="180" \
>> op monitor interval="10" timeout="30"
>> primitive stonith-node1 stonith:external/tripplitepdu \
>> params pdu_ipaddr="192.168.3.100" pdu_port="1" pdu_username="xxx"
>> pdu_password="xxx" hostname_to_stonith="node1"
>> primitive stonith-node2 stonith:external/tripplitepdu \
>> params pdu_ipaddr="192.168.3.100" pdu_port="2" pdu_username="xxx"
>> pdu_password="xxx" hostname_to_stonith="node2"
>> group g_daemons p_libvirt-bin
>> group g_vm p_fs_vmstore p_vm
>> ms ms_drbd_mount2 p_drbd_mount2 \
>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>> notify="true"
>> ms ms_drbd_mount1 p_drbd_mount1 \
>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>> notify="true"
>> ms ms_drbd_vmstore p_drbd_vmstore \
>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>> notify="true"
>> clone cl_daemons g_daemons
>> clone cl_ping p_ping \
>> meta interleave="true"
>> clone cl_sysadmin_notify p_sysadmin_notify \
>> meta target-role="Started"
>> location l-st-node1 stonith-node1 -inf: node1
>> location l-st-node2 stonith-node2 -inf: node2
>> location l_run_on_most_connected p_vm \
>> rule $id="l_run_on_most_connected-rule" p_ping: defined p_ping
>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote
>> ms_drbd_mount2:promote cl_daemons:start g_vm:start
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>> cluster-infrastructure="Heartbeat" \
>> stonith-enabled="true" \
>> no-quorum-policy="freeze" \
>> last-lrm-refresh="1333041002" \
>> cluster-recheck-interval="5m" \
>> crmd-integration-timeout="3m" \
>> shutdown-escalation="5m"
>>
>> Thanks,
>>
>> Andrew
>>
>>
>> ------------------------------------------------------------------------
>> *From: *"emmanuel segura" <emi2fast at gmail.com>
>> *To: *"The Pacemaker cluster resource manager"
>> <pacemaker at oss.clusterlabs.org>
>> *Sent: *Monday, April 2, 2012 9:43:20 AM
>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>> master on failover
>>
>> Sorry Andrew
>>
>> Can you post me your crm configure show again?
>>
>> Thanks
>>
>> Il giorno 30 marzo 2012 18:53, Andrew Martin <amartin at xes-inc.com
>> <mailto:amartin at xes-inc.com>> ha scritto:
>>
>> Hi Emmanuel,
>>
>> Thanks, that is a good idea. I updated the colocation contraint as
>> you described. After, the cluster remains in this state (with the
>> filesystem not mounted and the VM not started):
>> Online: [ node2 node1 ]
>>
>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>> Masters: [ node1 ]
>> Slaves: [ node2 ]
>> Master/Slave Set: ms_drbd_tools [p_drbd_mount1]
>> Masters: [ node1 ]
>> Slaves: [ node2 ]
>> Master/Slave Set: ms_drbd_crm [p_drbd_mount2]
>> Masters: [ node1 ]
>> Slaves: [ node2 ]
>> Clone Set: cl_daemons [g_daemons]
>> Started: [ node2 node1 ]
>> Stopped: [ g_daemons:2 ]
>> stonith-node1 (stonith:external/tripplitepdu): Started node2
>> stonith-node2 (stonith:external/tripplitepdu): Started node1
>>
>> I noticed that Pacemaker had not issued "drbdadm connect" for any of
>> the DRBD resources on node2
>> # service drbd status
>> drbd driver loaded OK; device status:
>> version: 8.3.7 (api:88/proto:86-91)
>> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
>> root at node2, 2012-02-02 12:29:26
>> m:res cs ro ds p
>> mounted fstype
>> 0:vmstore StandAlone Secondary/Unknown Outdated/DUnknown r----
>> 1:mount1 StandAlone Secondary/Unknown Outdated/DUnknown r----
>> 2:mount2 StandAlone Secondary/Unknown Outdated/DUnknown r----
>> # drbdadm cstate all
>> StandAlone
>> StandAlone
>> StandAlone
>>
>> After manually issuing "drbdadm connect all" on node2 the rest of
>> the resources eventually started (several minutes later) on node1:
>> Online: [ node2 node1 ]
>>
>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>> Masters: [ node1 ]
>> Slaves: [ node2 ]
>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>> Masters: [ node1 ]
>> Slaves: [ node2 ]
>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>> Masters: [ node1 ]
>> Slaves: [ node2 ]
>> Resource Group: g_vm
>> p_fs_vmstore (ocf::heartbeat:Filesystem): Started node1
>> p_vm (ocf::heartbeat:VirtualDomain): Started node1
>> Clone Set: cl_daemons [g_daemons]
>> Started: [ node2 node1 ]
>> Stopped: [ g_daemons:2 ]
>> Clone Set: cl_sysadmin_notify [p_sysadmin_notify]
>> Started: [ node2 node1 ]
>> Stopped: [ p_sysadmin_notify:2 ]
>> stonith-node1 (stonith:external/tripplitepdu): Started node2
>> stonith-node2 (stonith:external/tripplitepdu): Started node1
>> Clone Set: cl_ping [p_ping]
>> Started: [ node2 node1 ]
>> Stopped: [ p_ping:2 ]
>>
>> The DRBD devices on node1 were all UpToDate, so it doesn't seem
>> right that it would need to wait for node2 to be connected before it
>> could continue promoting additional resources. I then restarted
>> heartbeat on node2 to see if it would automatically connect the DRBD
>> devices this time. After restarting it, the DRBD devices are not
>> even configured:
>> # service drbd status
>> drbd driver loaded OK; device status:
>> version: 8.3.7 (api:88/proto:86-91)
>> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
>> root at webapps2host, 2012-02-02 12:29:26
>> m:res cs ro ds p mounted fstype
>> 0:vmstore Unconfigured
>> 1:mount1 Unconfigured
>> 2:mount2 Unconfigured
>>
>> Looking at the log I found this part about the drbd primitives:
>> Mar 30 11:10:32 node2 lrmd: [10702]: info: operation monitor[2] on
>> p_drbd_vmstore:1 for client 10705: pid 11065 exited with return code 7
>> Mar 30 11:10:32 node2 crmd: [10705]: info: process_lrm_event: LRM
>> operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11,
>> confirmed=true) not running
>> Mar 30 11:10:32 node2 lrmd: [10702]: info: operation monitor[4] on
>> p_drbd_mount2:1 for client 10705: pid 11069 exited with return code 7
>> Mar 30 11:10:32 node2 crmd: [10705]: info: process_lrm_event: LRM
>> operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=12,
>> confirmed=true) not running
>> Mar 30 11:10:32 node2 lrmd: [10702]: info: operation monitor[3] on
>> p_drbd_mount1:1 for client 10705: pid 11066 exited with return code 7
>> Mar 30 11:10:32 node2 crmd: [10705]: info: process_lrm_event: LRM
>> operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=13,
>> confirmed=true) not running
>>
>> I am not sure what exit code 7 is - is it possible to manually run
>> the monitor code or somehow obtain more debug about this? Here is
>> the complete log after restarting heartbeat on node2:
>> http://pastebin.com/KsHKi3GW
>>
>> Thanks,
>>
>> Andrew
>>
>>
> ------------------------------------------------------------------------
>> *From: *"emmanuel segura" <emi2fast at gmail.com
>> <mailto:emi2fast at gmail.com>>
>> *To: *"The Pacemaker cluster resource manager"
>> <pacemaker at oss.clusterlabs.org <mailto:pacemaker at oss.clusterlabs.org>>
>> *Sent: *Friday, March 30, 2012 10:26:48 AM
>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>> master on failover
>>
>> I think this constrain it's wrong
>> ==================================================
>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
>> ===================================================
>>
>> change to
>> ======================================================
>> colocation c_drbd_libvirt_vm inf: g_vm ms_drbd_vmstore:Master
>> ms_drbd_mount1:Master ms_drbd_mount2:Master
>> =======================================================
>>
>> Il giorno 30 marzo 2012 17:16, Andrew Martin <amartin at xes-inc.com
>> <mailto:amartin at xes-inc.com>> ha scritto:
>>
>> Hi Emmanuel,
>>
>> Here is the output of crm configure show:
>> http://pastebin.com/NA1fZ8dL
>>
>> Thanks,
>>
>> Andrew
>>
>>
> ------------------------------------------------------------------------
>> *From: *"emmanuel segura" <emi2fast at gmail.com
>> <mailto:emi2fast at gmail.com>>
>> *To: *"The Pacemaker cluster resource manager"
>> <pacemaker at oss.clusterlabs.org
>> <mailto:pacemaker at oss.clusterlabs.org>>
>> *Sent: *Friday, March 30, 2012 9:43:45 AM
>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources
>> to master on failover
>>
>> can you show me?
>>
>> crm configure show
>>
>> Il giorno 30 marzo 2012 16:10, Andrew Martin
>> <amartin at xes-inc.com <mailto:amartin at xes-inc.com>> ha scritto:
>>
>> Hi Andreas,
>>
>> Here is a copy of my complete CIB:
>> http://pastebin.com/v5wHVFuy
>>
>> I'll work on generating a report using crm_report as well.
>>
>> Thanks,
>>
>> Andrew
>>
>>
> ------------------------------------------------------------------------
>> *From: *"Andreas Kurz" <andreas at hastexo.com
>> <mailto:andreas at hastexo.com>>
>> *To: *pacemaker at oss.clusterlabs.org
>> <mailto:pacemaker at oss.clusterlabs.org>
>> *Sent: *Friday, March 30, 2012 4:41:16 AM
>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD
>> resources to master on failover
>>
>> On 03/28/2012 04:56 PM, Andrew Martin wrote:
>> > Hi Andreas,
>> >
>> > I disabled the DRBD init script and then restarted the
>> slave node
>> > (node2). After it came back up, DRBD did not start:
>> > Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4):
>> pending
>> > Online: [ node2 node1 ]
>> >
>> > Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>> > Masters: [ node1 ]
>> > Stopped: [ p_drbd_vmstore:1 ]
>> > Master/Slave Set: ms_drbd_mount1 [p_drbd_tools]
>> > Masters: [ node1 ]
>> > Stopped: [ p_drbd_mount1:1 ]
>> > Master/Slave Set: ms_drbd_mount2 [p_drbdmount2]
>> > Masters: [ node1 ]
>> > Stopped: [ p_drbd_mount2:1 ]
>> > ...
>> >
>> > root at node2:~# service drbd status
>> > drbd not loaded
>>
>> Yes, expected unless Pacemaker starts DRBD
>>
>> >
>> > Is there something else I need to change in the CIB to
>> ensure that DRBD
>> > is started? All of my DRBD devices are configured like this:
>> > primitive p_drbd_mount2 ocf:linbit:drbd \
>> > params drbd_resource="mount2" \
>> > op monitor interval="15" role="Master" \
>> > op monitor interval="30" role="Slave"
>> > ms ms_drbd_mount2 p_drbd_mount2 \
>> > meta master-max="1" master-node-max="1"
> clone-max="2"
>> > clone-node-max="1" notify="true"
>>
>> That should be enough ... unable to say more without seeing
>> the complete
>> configuration ... too much fragments of information ;-)
>>
>> Please provide (e.g. pastebin) your complete cib (cibadmin
>> -Q) when
>> cluster is in that state ... or even better create a
>> crm_report archive
>>
>> >
>> > Here is the output from the syslog (grep -i drbd
>> /var/log/syslog):
>> > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op:
>> Performing
>> > key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
>> > op=p_drbd_vmstore:1_monitor_0 )
>> > Mar 28 09:24:47 node2 lrmd: [3210]: info:
>> rsc:p_drbd_vmstore:1 probe[2]
>> > (pid 3455)
>> > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op:
>> Performing
>> > key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
>> > op=p_drbd_mount1:1_monitor_0 )
>> > Mar 28 09:24:48 node2 lrmd: [3210]: info:
>> rsc:p_drbd_mount1:1 probe[3]
>> > (pid 3456)
>> > Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op:
>> Performing
>> > key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
>> > op=p_drbd_mount2:1_monitor_0 )
>> > Mar 28 09:24:48 node2 lrmd: [3210]: info:
>> rsc:p_drbd_mount2:1 probe[4]
>> > (pid 3457)
>> > Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING:
>> Couldn't find
>> > device [/dev/drbd0]. Expected /dev/??? to exist
>> > Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked:
>> > crm_attribute -N node2 -n master-p_drbd_mount2:1 -l
> reboot -D
>> > Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked:
>> > crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l
> reboot -D
>> > Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked:
>> > crm_attribute -N node2 -n master-p_drbd_mount1:1 -l
> reboot -D
>> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation
>> monitor[4] on
>> > p_drbd_mount2:1 for client 3213: pid 3457 exited with
>> return code 7
>> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation
>> monitor[2] on
>> > p_drbd_vmstore:1 for client 3213: pid 3455 exited with
>> return code 7
>> > Mar 28 09:24:48 node2 crmd: [3213]: info:
>> process_lrm_event: LRM
>> > operation p_drbd_mount2:1_monitor_0 (call=4, rc=7,
>> cib-update=10,
>> > confirmed=true) not running
>> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation
>> monitor[3] on
>> > p_drbd_mount1:1 for client 3213: pid 3456 exited with
>> return code 7
>> > Mar 28 09:24:48 node2 crmd: [3213]: info:
>> process_lrm_event: LRM
>> > operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7,
>> cib-update=11,
>> > confirmed=true) not running
>> > Mar 28 09:24:48 node2 crmd: [3213]: info:
>> process_lrm_event: LRM
>> > operation p_drbd_mount1:1_monitor_0 (call=3, rc=7,
>> cib-update=12,
>> > confirmed=true) not running
>>
>> No errors, just probing ... so for any reason Pacemaker does
>> not like to
>> start it ... use crm_simulate to find out why ... or provide
>> information
>> as requested above.
>>
>> Regards,
>> Andreas
>>
>> --
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>> >
>> > Thanks,
>> >
>> > Andrew
>> >
>> >
>>
> ------------------------------------------------------------------------
>> > *From: *"Andreas Kurz" <andreas at hastexo.com
>> <mailto:andreas at hastexo.com>>
>> > *To: *pacemaker at oss.clusterlabs.org
>> <mailto:pacemaker at oss.clusterlabs.org>
>> > *Sent: *Wednesday, March 28, 2012 9:03:06 AM
>> > *Subject: *Re: [Pacemaker] Nodes will not promote DRBD
>> resources to
>> > master on failover
>> >
>> > On 03/28/2012 03:47 PM, Andrew Martin wrote:
>> >> Hi Andreas,
>> >>
>> >>> hmm ... what is that fence-peer script doing? If you
>> want to use
>> >>> resource-level fencing with the help of dopd, activate the
>> >>> drbd-peer-outdater script in the line above ... and
>> double check if the
>> >>> path is correct
>> >> fence-peer is just a wrapper for drbd-peer-outdater that
>> does some
>> >> additional logging. In my testing dopd has been working
> well.
>> >
>> > I see
>> >
>> >>
>> >>>> I am thinking of making the following changes to the
>> CIB (as per the
>> >>>> official DRBD
>> >>>> guide
>> >>
>> >
>>
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html)
>> in
>> >>>> order to add the DRBD lsb service and require that it
>> start before the
>> >>>> ocf:linbit:drbd resources. Does this look correct?
>> >>>
>> >>> Where did you read that? No, deactivate the startup of
>> DRBD on system
>> >>> boot and let Pacemaker manage it completely.
>> >>>
>> >>>> primitive p_drbd-init lsb:drbd op monitor interval="30"
>> >>>> colocation c_drbd_together inf:
>> >>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
>> >>>> ms_drbd_mount2:Master
>> >>>> order drbd_init_first inf: ms_drbd_vmstore:promote
>> >>>> ms_drbd_mount1:promote ms_drbd_mount2:promote
>> p_drbd-init:start
>> >>>>
>> >>>> This doesn't seem to require that drbd be also running
>> on the node where
>> >>>> the ocf:linbit:drbd resources are slave (which it would
>> need to do to be
>> >>>> a DRBD SyncTarget) - how can I ensure that drbd is
>> running everywhere?
>> >>>> (clone cl_drbd p_drbd-init ?)
>> >>>
>> >>> This is really not needed.
>> >> I was following the official DRBD Users Guide:
>> >>
>>
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html
>> >>
>> >> If I am understanding your previous message correctly, I
>> do not need to
>> >> add a lsb primitive for the drbd daemon? It will be
>> >> started/stopped/managed automatically by my
>> ocf:linbit:drbd resources
>> >> (and I can remove the /etc/rc* symlinks)?
>> >
>> > Yes, you don't need that LSB script when using Pacemaker
>> and should not
>> > let init start it.
>> >
>> > Regards,
>> > Andreas
>> >
>> > --
>> > Need help with Pacemaker?
>> > http://www.hastexo.com/now
>> >
>> >>
>> >> Thanks,
>> >>
>> >> Andrew
>> >>
>> >>
>>
> ------------------------------------------------------------------------
>> >> *From: *"Andreas Kurz" <andreas at hastexo.com
>> <mailto:andreas at hastexo.com> <mailto:andreas at hastexo.com
>> <mailto:andreas at hastexo.com>>>
>> >> *To: *pacemaker at oss.clusterlabs.org
>> <mailto:pacemaker at oss.clusterlabs.org>
>> <mailto:pacemaker at oss.clusterlabs.org
>> <mailto:pacemaker at oss.clusterlabs.org>>
>> >> *Sent: *Wednesday, March 28, 2012 7:27:34 AM
>> >> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD
>> resources to
>> >> master on failover
>> >>
>> >> On 03/28/2012 12:13 AM, Andrew Martin wrote:
>> >>> Hi Andreas,
>> >>>
>> >>> Thanks, I've updated the colocation rule to be in the
>> correct order. I
>> >>> also enabled the STONITH resource (this was temporarily
>> disabled before
>> >>> for some additional testing). DRBD has its own network
>> connection over
>> >>> the br1 interface (192.168.5.0/24
>> <http://192.168.5.0/24> network), a direct crossover cable
>> >>> between node1 and node2:
>> >>> global { usage-count no; }
>> >>> common {
>> >>> syncer { rate 110M; }
>> >>> }
>> >>> resource vmstore {
>> >>> protocol C;
>> >>> startup {
>> >>> wfc-timeout 15;
>> >>> degr-wfc-timeout 60;
>> >>> }
>> >>> handlers {
>> >>> #fence-peer
>> "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
>> >>> fence-peer "/usr/local/bin/fence-peer";
>> >>
>> >> hmm ... what is that fence-peer script doing? If you want
>> to use
>> >> resource-level fencing with the help of dopd, activate the
>> >> drbd-peer-outdater script in the line above ... and
>> double check if the
>> >> path is correct
>> >>
>> >>> split-brain
>> "/usr/lib/drbd/notify-split-brain.sh
>> >>> me at example.com <mailto:me at example.com>
>> <mailto:me at example.com <mailto:me at example.com>>";
>> >>> }
>> >>> net {
>> >>> after-sb-0pri discard-zero-changes;
>> >>> after-sb-1pri discard-secondary;
>> >>> after-sb-2pri disconnect;
>> >>> cram-hmac-alg md5;
>> >>> shared-secret "xxxxx";
>> >>> }
>> >>> disk {
>> >>> fencing resource-only;
>> >>> }
>> >>> on node1 {
>> >>> device /dev/drbd0;
>> >>> disk /dev/sdb1;
>> >>> address 192.168.5.10:7787
>> <http://192.168.5.10:7787>;
>> >>> meta-disk internal;
>> >>> }
>> >>> on node2 {
>> >>> device /dev/drbd0;
>> >>> disk /dev/sdf1;
>> >>> address 192.168.5.11:7787
>> <http://192.168.5.11:7787>;
>> >>> meta-disk internal;
>> >>> }
>> >>> }
>> >>> # and similar for mount1 and mount2
>> >>>
>> >>> Also, here is my ha.cf <http://ha.cf>. It uses both the
>> direct link between the nodes
>> >>> (br1) and the shared LAN network on br0 for communicating:
>> >>> autojoin none
>> >>> mcast br0 239.0.0.43 694 1 0
>> >>> bcast br1
>> >>> warntime 5
>> >>> deadtime 15
>> >>> initdead 60
>> >>> keepalive 2
>> >>> node node1
>> >>> node node2
>> >>> node quorumnode
>> >>> crm respawn
>> >>> respawn hacluster /usr/lib/heartbeat/dopd
>> >>> apiauth dopd gid=haclient uid=hacluster
>> >>>
>> >>> I am thinking of making the following changes to the CIB
>> (as per the
>> >>> official DRBD
>> >>> guide
>> >>
>> >
>>
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html)
>> in
>> >>> order to add the DRBD lsb service and require that it
>> start before the
>> >>> ocf:linbit:drbd resources. Does this look correct?
>> >>
>> >> Where did you read that? No, deactivate the startup of
>> DRBD on system
>> >> boot and let Pacemaker manage it completely.
>> >>
>> >>> primitive p_drbd-init lsb:drbd op monitor interval="30"
>> >>> colocation c_drbd_together inf:
>> >>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
>> >>> ms_drbd_mount2:Master
>> >>> order drbd_init_first inf: ms_drbd_vmstore:promote
>> >>> ms_drbd_mount1:promote ms_drbd_mount2:promote
>> p_drbd-init:start
>> >>>
>> >>> This doesn't seem to require that drbd be also running
>> on the node where
>> >>> the ocf:linbit:drbd resources are slave (which it would
>> need to do to be
>> >>> a DRBD SyncTarget) - how can I ensure that drbd is
>> running everywhere?
>> >>> (clone cl_drbd p_drbd-init ?)
>> >>
>> >> This is really not needed.
>> >>
>> >> Regards,
>> >> Andreas
>> >>
>> >> --
>> >> Need help with Pacemaker?
>> >> http://www.hastexo.com/now
>> >>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Andrew
>> >>>
>>
> ------------------------------------------------------------------------
>> >>> *From: *"Andreas Kurz" <andreas at hastexo.com
>> <mailto:andreas at hastexo.com> <mailto:andreas at hastexo.com
>> <mailto:andreas at hastexo.com>>>
>> >>> *To: *pacemaker at oss.clusterlabs.org
>> <mailto:pacemaker at oss.clusterlabs.org>
>> > <mailto:*pacemaker at oss.clusterlabs.org
>> <mailto:pacemaker at oss.clusterlabs.org>>
>> >>> *Sent: *Monday, March 26, 2012 5:56:22 PM
>> >>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD
>> resources to
>> >>> master on failover
>> >>>
>> >>> On 03/24/2012 08:15 PM, Andrew Martin wrote:
>> >>>> Hi Andreas,
>> >>>>
>> >>>> My complete cluster configuration is as follows:
>> >>>> ============
>> >>>> Last updated: Sat Mar 24 13:51:55 2012
>> >>>> Last change: Sat Mar 24 13:41:55 2012
>> >>>> Stack: Heartbeat
>> >>>> Current DC: node2
>> (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition
>> >>>> with quorum
>> >>>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
>> >>>> 3 Nodes configured, unknown expected votes
>> >>>> 19 Resources configured.
>> >>>> ============
>> >>>>
>> >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4):
>> OFFLINE
>> > (standby)
>> >>>> Online: [ node2 node1 ]
>> >>>>
>> >>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>> >>>> Masters: [ node2 ]
>> >>>> Slaves: [ node1 ]
>> >>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>> >>>> Masters: [ node2 ]
>> >>>> Slaves: [ node1 ]
>> >>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>> >>>> Masters: [ node2 ]
>> >>>> Slaves: [ node1 ]
>> >>>> Resource Group: g_vm
>> >>>> p_fs_vmstore(ocf::heartbeat:Filesystem):Started
> node2
>> >>>> p_vm(ocf::heartbeat:VirtualDomain):Started node2
>> >>>> Clone Set: cl_daemons [g_daemons]
>> >>>> Started: [ node2 node1 ]
>> >>>> Stopped: [ g_daemons:2 ]
>> >>>> Clone Set: cl_sysadmin_notify [p_sysadmin_notify]
>> >>>> Started: [ node2 node1 ]
>> >>>> Stopped: [ p_sysadmin_notify:2 ]
>> >>>> stonith-node1(stonith:external/tripplitepdu):Started
> node2
>> >>>> stonith-node2(stonith:external/tripplitepdu):Started
> node1
>> >>>> Clone Set: cl_ping [p_ping]
>> >>>> Started: [ node2 node1 ]
>> >>>> Stopped: [ p_ping:2 ]
>> >>>>
>> >>>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \
>> >>>> attributes standby="off"
>> >>>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \
>> >>>> attributes standby="off"
>> >>>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4"
>> quorumnode \
>> >>>> attributes standby="on"
>> >>>> primitive p_drbd_mount2 ocf:linbit:drbd \
>> >>>> params drbd_resource="mount2" \
>> >>>> op monitor interval="15" role="Master" \
>> >>>> op monitor interval="30" role="Slave"
>> >>>> primitive p_drbd_mount1 ocf:linbit:drbd \
>> >>>> params drbd_resource="mount1" \
>> >>>> op monitor interval="15" role="Master" \
>> >>>> op monitor interval="30" role="Slave"
>> >>>> primitive p_drbd_vmstore ocf:linbit:drbd \
>> >>>> params drbd_resource="vmstore" \
>> >>>> op monitor interval="15" role="Master" \
>> >>>> op monitor interval="30" role="Slave"
>> >>>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \
>> >>>> params device="/dev/drbd0" directory="/vmstore"
>> fstype="ext4" \
>> >>>> op start interval="0" timeout="60s" \
>> >>>> op stop interval="0" timeout="60s" \
>> >>>> op monitor interval="20s" timeout="40s"
>> >>>> primitive p_libvirt-bin upstart:libvirt-bin \
>> >>>> op monitor interval="30"
>> >>>> primitive p_ping ocf:pacemaker:ping \
>> >>>> params name="p_ping" host_list="192.168.1.10
>> 192.168.1.11"
>> >>>> multiplier="1000" \
>> >>>> op monitor interval="20s"
>> >>>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \
>> >>>> params email="me at example.com
>> <mailto:me at example.com> <mailto:me at example.com
>> <mailto:me at example.com>>" \
>> >>>> params subject="Pacemaker Change" \
>> >>>> op start interval="0" timeout="10" \
>> >>>> op stop interval="0" timeout="10" \
>> >>>> op monitor interval="10" timeout="10"
>> >>>> primitive p_vm ocf:heartbeat:VirtualDomain \
>> >>>> params config="/vmstore/config/vm.xml" \
>> >>>> meta allow-migrate="false" \
>> >>>> op start interval="0" timeout="120s" \
>> >>>> op stop interval="0" timeout="120s" \
>> >>>> op monitor interval="10" timeout="30"
>> >>>> primitive stonith-node1 stonith:external/tripplitepdu \
>> >>>> params pdu_ipaddr="192.168.1.12" pdu_port="1"
>> pdu_username="xxx"
>> >>>> pdu_password="xxx" hostname_to_stonith="node1"
>> >>>> primitive stonith-node2 stonith:external/tripplitepdu \
>> >>>> params pdu_ipaddr="192.168.1.12" pdu_port="2"
>> pdu_username="xxx"
>> >>>> pdu_password="xxx" hostname_to_stonith="node2"
>> >>>> group g_daemons p_libvirt-bin
>> >>>> group g_vm p_fs_vmstore p_vm
>> >>>> ms ms_drbd_mount2 p_drbd_mount2 \
>> >>>> meta master-max="1" master-node-max="1"
>> clone-max="2"
>> >>>> clone-node-max="1" notify="true"
>> >>>> ms ms_drbd_mount1 p_drbd_mount1 \
>> >>>> meta master-max="1" master-node-max="1"
>> clone-max="2"
>> >>>> clone-node-max="1" notify="true"
>> >>>> ms ms_drbd_vmstore p_drbd_vmstore \
>> >>>> meta master-max="1" master-node-max="1"
>> clone-max="2"
>> >>>> clone-node-max="1" notify="true"
>> >>>> clone cl_daemons g_daemons
>> >>>> clone cl_ping p_ping \
>> >>>> meta interleave="true"
>> >>>> clone cl_sysadmin_notify p_sysadmin_notify
>> >>>> location l-st-node1 stonith-node1 -inf: node1
>> >>>> location l-st-node2 stonith-node2 -inf: node2
>> >>>> location l_run_on_most_connected p_vm \
>> >>>> rule $id="l_run_on_most_connected-rule" p_ping:
>> defined p_ping
>> >>>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
>> >>>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
>> >>>
>> >>> As Emmanuel already said, g_vm has to be in the first
>> place in this
>> >>> collocation constraint .... g_vm must be colocated with
>> the drbd masters.
>> >>>
>> >>>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote
>> ms_drbd_mount1:promote
>> >>>> ms_drbd_mount2:promote cl_daemons:start g_vm:start
>> >>>> property $id="cib-bootstrap-options" \
>> >>>>
>> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>> >>>> cluster-infrastructure="Heartbeat" \
>> >>>> stonith-enabled="false" \
>> >>>> no-quorum-policy="stop" \
>> >>>> last-lrm-refresh="1332539900" \
>> >>>> cluster-recheck-interval="5m" \
>> >>>> crmd-integration-timeout="3m" \
>> >>>> shutdown-escalation="5m"
>> >>>>
>> >>>> The STONITH plugin is a custom plugin I wrote for the
>> Tripp-Lite
>> >>>> PDUMH20ATNET that I'm using as the STONITH device:
>> >>>>
>>
> http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf
>> >>>
>> >>> And why don't using it? .... stonith-enabled="false"
>> >>>
>> >>>>
>> >>>> As you can see, I left the DRBD service to be started
>> by the operating
>> >>>> system (as an lsb script at boot time) however
>> Pacemaker controls
>> >>>> actually bringing up/taking down the individual DRBD
>> devices.
>> >>>
>> >>> Don't start drbd on system boot, give Pacemaker the full
>> control.
>> >>>
>> >>> The
>> >>>> behavior I observe is as follows: I issue "crm resource
>> migrate p_vm" on
>> >>>> node1 and failover successfully to node2. During this
>> time, node2 fences
>> >>>> node1's DRBD devices (using dopd) and marks them as
>> Outdated. Meanwhile
>> >>>> node2's DRBD devices are UpToDate. I then shutdown both
>> nodes and then
>> >>>> bring them back up. They reconnect to the cluster (with
>> quorum), and
>> >>>> node1's DRBD devices are still Outdated as expected and
>> node2's DRBD
>> >>>> devices are still UpToDate, as expected. At this point,
>> DRBD starts on
>> >>>> both nodes, however node2 will not set DRBD as master:
>> >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4):
>> OFFLINE
>> > (standby)
>> >>>> Online: [ node2 node1 ]
>> >>>>
>> >>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>> >>>> Slaves: [ node1 node2 ]
>> >>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>> >>>> Slaves: [ node1 node 2 ]
>> >>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>> >>>> Slaves: [ node1 node2 ]
>> >>>
>> >>> There should really be no interruption of the drbd
>> replication on vm
>> >>> migration that activates the dopd ... drbd has its own
>> direct network
>> >>> connection?
>> >>>
>> >>> Please share your ha.cf <http://ha.cf> file and your
>> drbd configuration. Watch out for
>> >>> drbd messages in your kernel log file, that should give
>> you additional
>> >>> information when/why the drbd connection was lost.
>> >>>
>> >>> Regards,
>> >>> Andreas
>> >>>
>> >>> --
>> >>> Need help with Pacemaker?
>> >>> http://www.hastexo.com/now
>> >>>
>> >>>>
>> >>>> I am having trouble sorting through the logging
>> information because
>> >>>> there is so much of it in /var/log/daemon.log, but I
>> can't find an
>> >>>> error message printed about why it will not promote
>> node2. At this point
>> >>>> the DRBD devices are as follows:
>> >>>> node2: cstate = WFConnection dstate=UpToDate
>> >>>> node1: cstate = StandAlone dstate=Outdated
>> >>>>
>> >>>> I don't see any reason why node2 can't become DRBD
>> master, or am I
>> >>>> missing something? If I do "drbdadm connect all" on
>> node1, then the
>> >>>> cstate on both nodes changes to "Connected" and node2
>> immediately
>> >>>> promotes the DRBD resources to master. Any ideas on why
>> I'm observing
>> >>>> this incorrect behavior?
>> >>>>
>> >>>> Any tips on how I can better filter through the
>> pacemaker/heartbeat logs
>> >>>> or how to get additional useful debug information?
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> Andrew
>> >>>>
>> >>>>
>>
> ------------------------------------------------------------------------
>> >>>> *From: *"Andreas Kurz" <andreas at hastexo.com
>> <mailto:andreas at hastexo.com>
>> > <mailto:andreas at hastexo.com <mailto:andreas at hastexo.com>>>
>> >>>> *To: *pacemaker at oss.clusterlabs.org
>> <mailto:pacemaker at oss.clusterlabs.org>
>> >> <mailto:*pacemaker at oss.clusterlabs.org
>> <mailto:pacemaker at oss.clusterlabs.org>>
>> >>>> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM
>> >>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD
>> resources to
>> >>>> master on failover
>> >>>>
>> >>>> On 01/25/2012 08:58 PM, Andrew Martin wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> Recently I finished configuring a two-node cluster
>> with pacemaker 1.1.6
>> >>>>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04.
>> This cluster
>> > includes
>> >>>>> the following resources:
>> >>>>> - primitives for DRBD storage devices
>> >>>>> - primitives for mounting the filesystem on the DRBD
>> storage
>> >>>>> - primitives for some mount binds
>> >>>>> - primitive for starting apache
>> >>>>> - primitives for starting samba and nfs servers
>> (following instructions
>> >>>>> here
>> <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>)
>> >>>>> - primitives for exporting nfs shares
>> (ocf:heartbeat:exportfs)
>> >>>>
>> >>>> not enough information ... please share at least your
>> complete cluster
>> >>>> configuration
>> >>>>
>> >>>> Regards,
>> >>>> Andreas
>> >>>>
>> >>>> --
>> >>>> Need help with Pacemaker?
>> >>>> http://www.hastexo.com/now
>> >>>>
>> >>>>>
>> >>>>> Perhaps this is best described through the output of
>> crm_mon:
>> >>>>> Online: [ node1 node2 ]
>> >>>>>
>> >>>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>> (unmanaged)
>> >>>>> p_drbd_mount1:0 (ocf::linbit:drbd):
>> Started node2
>> >>> (unmanaged)
>> >>>>> p_drbd_mount1:1 (ocf::linbit:drbd):
>> Started node1
>> >>>>> (unmanaged) FAILED
>> >>>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>> >>>>> p_drbd_mount2:0 (ocf::linbit:drbd):
>> Master node1
>> >>>>> (unmanaged) FAILED
>> >>>>> Slaves: [ node2 ]
>> >>>>> Resource Group: g_core
>> >>>>> p_fs_mount1 (ocf::heartbeat:Filesystem):
>> Started node1
>> >>>>> p_fs_mount2 (ocf::heartbeat:Filesystem):
>> Started node1
>> >>>>> p_ip_nfs (ocf::heartbeat:IPaddr2):
>> Started node1
>> >>>>> Resource Group: g_apache
>> >>>>> p_fs_mountbind1 (ocf::heartbeat:Filesystem):
>> Started node1
>> >>>>> p_fs_mountbind2 (ocf::heartbeat:Filesystem):
>> Started node1
>> >>>>> p_fs_mountbind3 (ocf::heartbeat:Filesystem):
>> Started node1
>> >>>>> p_fs_varwww (ocf::heartbeat:Filesystem):
>> Started node1
>> >>>>> p_apache (ocf::heartbeat:apache):
>> Started node1
>> >>>>> Resource Group: g_fileservers
>> >>>>> p_lsb_smb (lsb:smbd): Started node1
>> >>>>> p_lsb_nmb (lsb:nmbd): Started node1
>> >>>>> p_lsb_nfsserver (lsb:nfs-kernel-server):
>> Started node1
>> >>>>> p_exportfs_mount1 (ocf::heartbeat:exportfs):
>> Started node1
>> >>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs):
>> Started
>> > node1
>> >>>>>
>> >>>>> I have read through the Pacemaker Explained
>> >>>>>
>> >>>>
>> >>>
>> >
>>
> <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained>
>> >>>>> documentation, however could not find a way to further
>> debug these
>> >>>>> problems. First, I put node1 into standby mode to
>> attempt failover to
>> >>>>> the other node (node2). Node2 appeared to start the
>> transition to
>> >>>>> master, however it failed to promote the DRBD
>> resources to master (the
>> >>>>> first step). I have attached a copy of this session in
>> commands.log and
>> >>>>> additional excerpts from /var/log/syslog during
>> important steps. I have
>> >>>>> attempted everything I can think of to try and start
>> the DRBD resource
>> >>>>> (e.g. start/stop/promote/manage/cleanup under crm
>> resource, restarting
>> >>>>> heartbeat) but cannot bring it out of the slave state.
>> However, if
>> > I set
>> >>>>> it to unmanaged and then run drbdadm primary all in
>> the terminal,
>> >>>>> pacemaker is satisfied and continues starting the rest
>> of the
>> > resources.
>> >>>>> It then failed when attempting to mount the filesystem
>> for mount2, the
>> >>>>> p_fs_mount2 resource. I attempted to mount the
>> filesystem myself
>> > and was
>> >>>>> successful. I then unmounted it and ran cleanup on
>> p_fs_mount2 and then
>> >>>>> it mounted. The rest of the resources started as
>> expected until the
>> >>>>> p_exportfs_mount2 resource, which failed as follows:
>> >>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs):
>> started node2
>> >>>>> (unmanaged) FAILED
>> >>>>>
>> >>>>> I ran cleanup on this and it started, however when
>> running this test
>> >>>>> earlier today no command could successfully start this
>> exportfs
>> >> resource.
>> >>>>>
>> >>>>> How can I configure pacemaker to better resolve these
>> problems and be
>> >>>>> able to bring the node up successfully on its own?
>> What can I check to
>> >>>>> determine why these failures are occuring?
>> /var/log/syslog did not seem
>> >>>>> to contain very much useful information regarding why
>> the failures
>> >>>> occurred.
>> >>>>>
>> >>>>> Thanks,
>> >>>>>
>> >>>>> Andrew
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> This body part will be downloaded on demand.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> >> <mailto:Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>>
>> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>>>
>> >>>> Project Home: http://www.clusterlabs.org
>> >>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >>>> Bugs: http://bugs.clusterlabs.org
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> >> <mailto:Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>>
>> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>>>
>> >>>> Project Home: http://www.clusterlabs.org
>> >>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >>>> Bugs: http://bugs.clusterlabs.org
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> >> <mailto:Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>>
>> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>>
>> >>> Project Home: http://www.clusterlabs.org
>> >>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >>> Bugs: http://bugs.clusterlabs.org
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> >> <mailto:Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>>
>> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>>
>> >>> Project Home: http://www.clusterlabs.org
>> >>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >>> Bugs: http://bugs.clusterlabs.org
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> >> <mailto:Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>>
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://bugs.clusterlabs.org
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://bugs.clusterlabs.org
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120410/6c9d76a7/attachment-0001.sig>
More information about the Pacemaker
mailing list