[Pacemaker] Nodes will not promote DRBD resources to master on failover

Tue Apr 10 12:28:15 CEST 2012

On 04/10/2012 06:17 AM, Andrew Martin wrote:
> Hi Andreas,
> 
> Yes, I attempted to generalize hostnames and usernames/passwords in the
> archive. Sorry for making it more confusing :( 
> 
> I completely purged pacemaker from all 3 nodes and reinstalled
> everything. I then completely rebuild the CIB by manually adding in each
> primitive/constraint one at a time and testing along the way. After
> doing this DRBD appears to be working at least somewhat better - the
> ocf:linbit:drbd devices are started and managed by pacemaker. However,
> if for example a node is STONITHed when it comes back up it will not
> restart the ocf:linbit:drbd resources until I manually load the DRBD
> kernel module, bring the DRBD devices up (drbdadm up all), and cleanup
> the resources (e.g. crm resource cleanup ms_drbd_vmstore). Is it
> possible that the DRBD kernel module needs to be loaded at boot time,
> independent of pacemaker?

No, this is done by the drbd OCF script on start.

> 
> Here's the new CIB (mostly the same as before):
> http://pastebin.com/MxrqBXMp
> 
> Typically quorumnode stays in the OFFLINE (standby) state, though
> occasionally it changes to pending. I have just tried
> cleaning /var/lib/heartbeat/crm on quorumnode again so we will see if
> that helps keep it in the OFFLINE (standby) state. I have it explicitly
> set to standby in the CIB configuration and also created a rule to
> prevent some of the resources from running on it?
> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \
>         attributes standby="on"
> ...

The node should be in "ONLINE (standby)" state if you start heartbeat
and pacemaker is enabled with "crm yes" or "crm respawn"in ha.cf

> location loc_not_on_quorumnode g_vm -inf: quorumnode
> 
> Would it be wise to create additional constraints to prevent all
> resources (including each ms_drbd resource) from running on it, even
> though this should be implied by standby?

There is no need for that. A node in standby will never run resources
and if there is no DRBD and installed on that node your resources won't
start anyways.

> 
> Below is a portion of the log from when I started a node yet DRBD failed
> to start. As you can see it thinks the DRBD device is operating
> correctly as it proceeds to starting subsequent resources, e.g.
> Apr  9 20:22:55 node1 Filesystem[2939]: [2956]: WARNING: Couldn't find
> device [/dev/drbd0]. Expected /dev/??? to exist
> http://pastebin.com/zTCHPtWy

The only thing i can read from that log fragments is, that probes are
running ... not enough information. Really interesting would be logs
from the DC.

> 
> After seeing these messages in the log I run
> # service drbd start
> # drbdadm up all
> # crm resource cleanup ms_drbd_vmstore
> # crm resource cleanup ms_drbd_mount1
> # crm resource clenaup ms_drbd_mount2

That should all not be needed ... what is the output of "crm_mon -1frA"
before you do all that cleanups?

> After this sequence of commands the DRBD resources appear to be
> functioning normally and the subsequent resources start. Any ideas on
> why DRBD is not being started as expected, or why the cluster is
> continuing with starting resources that according to the o_drbd-fs-vm
> constraint should not start until DRBD is master?

No idea, maybe creating a crm_report archive and sending it to the list
can shed some light on that problem.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Thanks,
> 
> Andrew
> ------------------------------------------------------------------------
> *From: *"Andreas Kurz" <andreas at hastexo.com>
> *To: *pacemaker at oss.clusterlabs.org
> *Sent: *Monday, April 2, 2012 6:33:44 PM
> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> master on failover
> 
> On 04/02/2012 05:47 PM, Andrew Martin wrote:
>> Hi Andreas,
>>
>> Here is the crm_report:
>> http://dl.dropbox.com/u/2177298/pcmk-Mon-02-Apr-2012.bz2
> 
> You tried to do some obfuscation on parts of that archive? ... doesn't
> really make it easier to debug ....
> 
> Does the third node ever change its state?
> 
> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending
> 
> Looking at the logs and the transition graph says it aborts due to
> un-runable operations on that node which seems to be related to it's
> pending state.
> 
> Try to get that node up (or down) completely ... maybe a fresh
> start-over with a clean /var/lib/heartbeat/crm directory is sufficient.
> 
> Regards,
> Andreas
> 
>>
>> Hi Emmanuel,
>>
>> Here is the configuration:
>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \
>> attributes standby="off"
>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \
>> attributes standby="off"
>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \
>> attributes standby="on"
>> primitive p_drbd_mount2 ocf:linbit:drbd \
>> params drbd_resource="mount2" \
>> op start interval="0" timeout="240" \
>> op stop interval="0" timeout="100" \
>> op monitor interval="10" role="Master" timeout="20" start-delay="1m" \
>> op monitor interval="20" role="Slave" timeout="20" start-delay="1m"
>> primitive p_drbd_mount1 ocf:linbit:drbd \
>> params drbd_resource="mount1" \
>> op start interval="0" timeout="240" \
>> op stop interval="0" timeout="100" \
>> op monitor interval="10" role="Master" timeout="20" start-delay="1m" \
>> op monitor interval="20" role="Slave" timeout="20" start-delay="1m"
>> primitive p_drbd_vmstore ocf:linbit:drbd \
>> params drbd_resource="vmstore" \
>> op start interval="0" timeout="240" \
>> op stop interval="0" timeout="100" \
>> op monitor interval="10" role="Master" timeout="20" start-delay="1m" \
>> op monitor interval="20" role="Slave" timeout="20" start-delay="1m"
>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \
>> params device="/dev/drbd0" directory="/mnt/storage/vmstore"
> fstype="ext4" \
>> op start interval="0" timeout="60s" \
>> op stop interval="0" timeout="60s" \
>> op monitor interval="20s" timeout="40s"
>> primitive p_libvirt-bin upstart:libvirt-bin \
>> op monitor interval="30"
>> primitive p_ping ocf:pacemaker:ping \
>> params name="p_ping" host_list="192.168.3.1 192.168.3.2"
> multiplier="1000" \
>> op monitor interval="20s"
>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \
>> params email="me at example.com" \
>> params subject="Pacemaker Change" \
>> op start interval="0" timeout="10" \
>> op stop interval="0" timeout="10" \
>> op monitor interval="10" timeout="10"
>> primitive p_vm ocf:heartbeat:VirtualDomain \
>> params config="/mnt/storage/vmstore/config/vm.xml" \
>> meta allow-migrate="false" \
>> op start interval="0" timeout="180" \
>> op stop interval="0" timeout="180" \
>> op monitor interval="10" timeout="30"
>> primitive stonith-node1 stonith:external/tripplitepdu \
>> params pdu_ipaddr="192.168.3.100" pdu_port="1" pdu_username="xxx"
>> pdu_password="xxx" hostname_to_stonith="node1"
>> primitive stonith-node2 stonith:external/tripplitepdu \
>> params pdu_ipaddr="192.168.3.100" pdu_port="2" pdu_username="xxx"
>> pdu_password="xxx" hostname_to_stonith="node2"
>> group g_daemons p_libvirt-bin
>> group g_vm p_fs_vmstore p_vm
>> ms ms_drbd_mount2 p_drbd_mount2 \
>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>> notify="true"
>> ms ms_drbd_mount1 p_drbd_mount1 \
>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>> notify="true"
>> ms ms_drbd_vmstore p_drbd_vmstore \
>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>> notify="true"
>> clone cl_daemons g_daemons
>> clone cl_ping p_ping \
>> meta interleave="true"
>> clone cl_sysadmin_notify p_sysadmin_notify \
>> meta target-role="Started"
>> location l-st-node1 stonith-node1 -inf: node1
>> location l-st-node2 stonith-node2 -inf: node2
>> location l_run_on_most_connected p_vm \
>> rule $id="l_run_on_most_connected-rule" p_ping: defined p_ping
>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote
>> ms_drbd_mount2:promote cl_daemons:start g_vm:start
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>> cluster-infrastructure="Heartbeat" \
>> stonith-enabled="true" \
>> no-quorum-policy="freeze" \
>> last-lrm-refresh="1333041002" \
>> cluster-recheck-interval="5m" \
>> crmd-integration-timeout="3m" \
>> shutdown-escalation="5m"
>>
>> Thanks,
>>
>> Andrew
>>
>>
>> ------------------------------------------------------------------------
>> *From: *"emmanuel segura" <emi2fast at gmail.com>
>> *To: *"The Pacemaker cluster resource manager"
>> <pacemaker at oss.clusterlabs.org>
>> *Sent: *Monday, April 2, 2012 9:43:20 AM
>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>> master on        failover
>>
>> Sorry Andrew
>>
>> Can you post me your crm configure show again?
>>
>> Thanks
>>
>> Il giorno 30 marzo 2012 18:53, Andrew Martin <amartin at xes-inc.com
>> <mailto:amartin at xes-inc.com>> ha scritto:
>>
>>     Hi Emmanuel,
>>
>>     Thanks, that is a good idea. I updated the colocation contraint as
>>     you described. After, the cluster remains in this state (with the
>>     filesystem not mounted and the VM not started):
>>     Online: [ node2 node1 ]
>>
>>      Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>>          Masters: [ node1 ]
>>          Slaves: [ node2 ]
>>      Master/Slave Set: ms_drbd_tools [p_drbd_mount1]
>>          Masters: [ node1 ]
>>          Slaves: [ node2 ]
>>      Master/Slave Set: ms_drbd_crm [p_drbd_mount2]
>>          Masters: [ node1 ]
>>          Slaves: [ node2 ]
>>      Clone Set: cl_daemons [g_daemons]
>>          Started: [ node2 node1 ]
>>          Stopped: [ g_daemons:2 ]
>>     stonith-node1    (stonith:external/tripplitepdu):        Started node2
>>     stonith-node2    (stonith:external/tripplitepdu):        Started node1
>>
>>     I noticed that Pacemaker had not issued "drbdadm connect" for any of
>>     the DRBD resources on node2
>>     # service drbd status
>>     drbd driver loaded OK; device status:
>>     version: 8.3.7 (api:88/proto:86-91)
>>     GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
>>     root at node2, 2012-02-02 12:29:26
>>     m:res      cs          ro                 ds                 p    
>>      mounted  fstype
>>     0:vmstore  StandAlone  Secondary/Unknown  Outdated/DUnknown  r----
>>     1:mount1    StandAlone  Secondary/Unknown  Outdated/DUnknown  r----
>>     2:mount2      StandAlone  Secondary/Unknown  Outdated/DUnknown  r----
>>     # drbdadm cstate all
>>     StandAlone
>>     StandAlone
>>     StandAlone
>>
>>     After manually issuing "drbdadm connect all" on node2 the rest of
>>     the resources eventually started (several minutes later) on node1:
>>     Online: [ node2 node1 ]
>>
>>      Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>>          Masters: [ node1 ]
>>          Slaves: [ node2 ]
>>      Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>>          Masters: [ node1 ]
>>          Slaves: [ node2 ]
>>      Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>>          Masters: [ node1 ]
>>          Slaves: [ node2 ]
>>      Resource Group: g_vm
>>          p_fs_vmstore       (ocf::heartbeat:Filesystem):    Started node1
>>          p_vm               (ocf::heartbeat:VirtualDomain): Started node1
>>      Clone Set: cl_daemons [g_daemons]
>>          Started: [ node2 node1 ]
>>          Stopped: [ g_daemons:2 ]
>>      Clone Set: cl_sysadmin_notify [p_sysadmin_notify]
>>          Started: [ node2 node1 ]
>>          Stopped: [ p_sysadmin_notify:2 ]
>>     stonith-node1    (stonith:external/tripplitepdu):        Started node2
>>     stonith-node2    (stonith:external/tripplitepdu):        Started node1
>>      Clone Set: cl_ping [p_ping]
>>          Started: [ node2 node1 ]
>>          Stopped: [ p_ping:2 ]
>>
>>     The DRBD devices on node1 were all UpToDate, so it doesn't seem
>>     right that it would need to wait for node2 to be connected before it
>>     could continue promoting additional resources. I then restarted
>>     heartbeat on node2 to see if it would automatically connect the DRBD
>>     devices this time. After restarting it, the DRBD devices are not
>>     even configured:
>>     # service drbd status
>>     drbd driver loaded OK; device status:
>>     version: 8.3.7 (api:88/proto:86-91)
>>     GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
>>     root at webapps2host, 2012-02-02 12:29:26
>>     m:res      cs            ro  ds  p  mounted  fstype
>>     0:vmstore  Unconfigured
>>     1:mount1   Unconfigured
>>     2:mount2   Unconfigured
>>
>>     Looking at the log I found this part about the drbd primitives:
>>     Mar 30 11:10:32 node2 lrmd: [10702]: info: operation monitor[2] on
>>     p_drbd_vmstore:1 for client 10705: pid 11065 exited with return code 7
>>     Mar 30 11:10:32 node2 crmd: [10705]: info: process_lrm_event: LRM
>>     operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11,
>>     confirmed=true) not running
>>     Mar 30 11:10:32 node2 lrmd: [10702]: info: operation monitor[4] on
>>     p_drbd_mount2:1 for client 10705: pid 11069 exited with return code 7
>>     Mar 30 11:10:32 node2 crmd: [10705]: info: process_lrm_event: LRM
>>     operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=12,
>>     confirmed=true) not running
>>     Mar 30 11:10:32 node2 lrmd: [10702]: info: operation monitor[3] on
>>     p_drbd_mount1:1 for client 10705: pid 11066 exited with return code 7
>>     Mar 30 11:10:32 node2 crmd: [10705]: info: process_lrm_event: LRM
>>     operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=13,
>>     confirmed=true) not running
>>
>>     I am not sure what exit code 7 is - is it possible to manually run
>>     the monitor code or somehow obtain more debug about this? Here is
>>     the complete log after restarting heartbeat on node2:
>>     http://pastebin.com/KsHKi3GW
>>
>>     Thanks,
>>
>>     Andrew
>>
>>    
> ------------------------------------------------------------------------
>>     *From: *"emmanuel segura" <emi2fast at gmail.com
>>     <mailto:emi2fast at gmail.com>>
>>     *To: *"The Pacemaker cluster resource manager"
>>     <pacemaker at oss.clusterlabs.org <mailto:pacemaker at oss.clusterlabs.org>>
>>     *Sent: *Friday, March 30, 2012 10:26:48 AM
>>     *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>>     master on        failover
>>
>>     I think this constrain it's wrong
>>     ==================================================
>>     colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
>>     ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
>>     ===================================================
>>
>>     change to
>>     ======================================================
>>     colocation c_drbd_libvirt_vm inf: g_vm ms_drbd_vmstore:Master
>>     ms_drbd_mount1:Master ms_drbd_mount2:Master
>>     =======================================================
>>
>>     Il giorno 30 marzo 2012 17:16, Andrew Martin <amartin at xes-inc.com
>>     <mailto:amartin at xes-inc.com>> ha scritto:
>>
>>         Hi Emmanuel,
>>
>>         Here is the output of crm configure show:
>>         http://pastebin.com/NA1fZ8dL
>>
>>         Thanks,
>>
>>         Andrew
>>
>>        
> ------------------------------------------------------------------------
>>         *From: *"emmanuel segura" <emi2fast at gmail.com
>>         <mailto:emi2fast at gmail.com>>
>>         *To: *"The Pacemaker cluster resource manager"
>>         <pacemaker at oss.clusterlabs.org
>>         <mailto:pacemaker at oss.clusterlabs.org>>
>>         *Sent: *Friday, March 30, 2012 9:43:45 AM
>>         *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources
>>         to master on        failover
>>
>>         can you show me?
>>
>>         crm configure show
>>
>>         Il giorno 30 marzo 2012 16:10, Andrew Martin
>>         <amartin at xes-inc.com <mailto:amartin at xes-inc.com>> ha scritto:
>>
>>             Hi Andreas,
>>
>>             Here is a copy of my complete CIB:
>>             http://pastebin.com/v5wHVFuy
>>
>>             I'll work on generating a report using crm_report as well.
>>
>>             Thanks,
>>
>>             Andrew
>>
>>            
> ------------------------------------------------------------------------
>>             *From: *"Andreas Kurz" <andreas at hastexo.com
>>             <mailto:andreas at hastexo.com>>
>>             *To: *pacemaker at oss.clusterlabs.org
>>             <mailto:pacemaker at oss.clusterlabs.org>
>>             *Sent: *Friday, March 30, 2012 4:41:16 AM
>>             *Subject: *Re: [Pacemaker] Nodes will not promote DRBD
>>             resources to master on failover
>>
>>             On 03/28/2012 04:56 PM, Andrew Martin wrote:
>>             > Hi Andreas,
>>             >
>>             > I disabled the DRBD init script and then restarted the
>>             slave node
>>             > (node2). After it came back up, DRBD did not start:
>>             > Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4):
>>             pending
>>             > Online: [ node2 node1 ]
>>             >
>>             >  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>>             >      Masters: [ node1 ]
>>             >      Stopped: [ p_drbd_vmstore:1 ]
>>             >  Master/Slave Set: ms_drbd_mount1 [p_drbd_tools]
>>             >      Masters: [ node1 ]
>>             >      Stopped: [ p_drbd_mount1:1 ]
>>             >  Master/Slave Set: ms_drbd_mount2 [p_drbdmount2]
>>             >      Masters: [ node1 ]
>>             >      Stopped: [ p_drbd_mount2:1 ]
>>             > ...
>>             >
>>             > root at node2:~# service drbd status
>>             > drbd not loaded
>>
>>             Yes, expected unless Pacemaker starts DRBD
>>
>>             >
>>             > Is there something else I need to change in the CIB to
>>             ensure that DRBD
>>             > is started? All of my DRBD devices are configured like this:
>>             > primitive p_drbd_mount2 ocf:linbit:drbd \
>>             >         params drbd_resource="mount2" \
>>             >         op monitor interval="15" role="Master" \
>>             >         op monitor interval="30" role="Slave"
>>             > ms ms_drbd_mount2 p_drbd_mount2 \
>>             >         meta master-max="1" master-node-max="1"
> clone-max="2"
>>             > clone-node-max="1" notify="true"
>>
>>             That should be enough ... unable to say more without seeing
>>             the complete
>>             configuration ... too much fragments of information ;-)
>>
>>             Please provide (e.g. pastebin) your complete cib (cibadmin
>>             -Q) when
>>             cluster is in that state ... or even better create a
>>             crm_report archive
>>
>>             >
>>             > Here is the output from the syslog (grep -i drbd
>>             /var/log/syslog):
>>             > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op:
>>             Performing
>>             > key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
>>             > op=p_drbd_vmstore:1_monitor_0 )
>>             > Mar 28 09:24:47 node2 lrmd: [3210]: info:
>>             rsc:p_drbd_vmstore:1 probe[2]
>>             > (pid 3455)
>>             > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op:
>>             Performing
>>             > key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
>>             > op=p_drbd_mount1:1_monitor_0 )
>>             > Mar 28 09:24:48 node2 lrmd: [3210]: info:
>>             rsc:p_drbd_mount1:1 probe[3]
>>             > (pid 3456)
>>             > Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op:
>>             Performing
>>             > key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
>>             > op=p_drbd_mount2:1_monitor_0 )
>>             > Mar 28 09:24:48 node2 lrmd: [3210]: info:
>>             rsc:p_drbd_mount2:1 probe[4]
>>             > (pid 3457)
>>             > Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING:
>>             Couldn't find
>>             > device [/dev/drbd0]. Expected /dev/??? to exist
>>             > Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked:
>>             > crm_attribute -N node2 -n master-p_drbd_mount2:1 -l
> reboot -D
>>             > Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked:
>>             > crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l
> reboot -D
>>             > Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked:
>>             > crm_attribute -N node2 -n master-p_drbd_mount1:1 -l
> reboot -D
>>             > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation
>>             monitor[4] on
>>             > p_drbd_mount2:1 for client 3213: pid 3457 exited with
>>             return code 7
>>             > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation
>>             monitor[2] on
>>             > p_drbd_vmstore:1 for client 3213: pid 3455 exited with
>>             return code 7
>>             > Mar 28 09:24:48 node2 crmd: [3213]: info:
>>             process_lrm_event: LRM
>>             > operation p_drbd_mount2:1_monitor_0 (call=4, rc=7,
>>             cib-update=10,
>>             > confirmed=true) not running
>>             > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation
>>             monitor[3] on
>>             > p_drbd_mount1:1 for client 3213: pid 3456 exited with
>>             return code 7
>>             > Mar 28 09:24:48 node2 crmd: [3213]: info:
>>             process_lrm_event: LRM
>>             > operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7,
>>             cib-update=11,
>>             > confirmed=true) not running
>>             > Mar 28 09:24:48 node2 crmd: [3213]: info:
>>             process_lrm_event: LRM
>>             > operation p_drbd_mount1:1_monitor_0 (call=3, rc=7,
>>             cib-update=12,
>>             > confirmed=true) not running
>>
>>             No errors, just probing ... so for any reason Pacemaker does
>>             not like to
>>             start it ... use crm_simulate to find out why ... or provide
>>             information
>>             as requested above.
>>
>>             Regards,
>>             Andreas
>>
>>             --
>>             Need help with Pacemaker?
>>             http://www.hastexo.com/now
>>
>>             >
>>             > Thanks,
>>             >
>>             > Andrew
>>             >
>>             >
>>            
> ------------------------------------------------------------------------
>>             > *From: *"Andreas Kurz" <andreas at hastexo.com
>>             <mailto:andreas at hastexo.com>>
>>             > *To: *pacemaker at oss.clusterlabs.org
>>             <mailto:pacemaker at oss.clusterlabs.org>
>>             > *Sent: *Wednesday, March 28, 2012 9:03:06 AM
>>             > *Subject: *Re: [Pacemaker] Nodes will not promote DRBD
>>             resources to
>>             > master on failover
>>             >
>>             > On 03/28/2012 03:47 PM, Andrew Martin wrote:
>>             >> Hi Andreas,
>>             >>
>>             >>> hmm ... what is that fence-peer script doing? If you
>>             want to use
>>             >>> resource-level fencing with the help of dopd, activate the
>>             >>> drbd-peer-outdater script in the line above ... and
>>             double check if the
>>             >>> path is correct
>>             >> fence-peer is just a wrapper for drbd-peer-outdater that
>>             does some
>>             >> additional logging. In my testing dopd has been working
> well.
>>             >
>>             > I see
>>             >
>>             >>
>>             >>>> I am thinking of making the following changes to the
>>             CIB (as per the
>>             >>>> official DRBD
>>             >>>> guide
>>             >>
>>             >
>>            
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html)
>>             in
>>             >>>> order to add the DRBD lsb service and require that it
>>             start before the
>>             >>>> ocf:linbit:drbd resources. Does this look correct?
>>             >>>
>>             >>> Where did you read that? No, deactivate the startup of
>>             DRBD on system
>>             >>> boot and let Pacemaker manage it completely.
>>             >>>
>>             >>>> primitive p_drbd-init lsb:drbd op monitor interval="30"
>>             >>>> colocation c_drbd_together inf:
>>             >>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
>>             >>>> ms_drbd_mount2:Master
>>             >>>> order drbd_init_first inf: ms_drbd_vmstore:promote
>>             >>>> ms_drbd_mount1:promote ms_drbd_mount2:promote
>>             p_drbd-init:start
>>             >>>>
>>             >>>> This doesn't seem to require that drbd be also running
>>             on the node where
>>             >>>> the ocf:linbit:drbd resources are slave (which it would
>>             need to do to be
>>             >>>> a DRBD SyncTarget) - how can I ensure that drbd is
>>             running everywhere?
>>             >>>> (clone cl_drbd p_drbd-init ?)
>>             >>>
>>             >>> This is really not needed.
>>             >> I was following the official DRBD Users Guide:
>>             >>
>>            
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html
>>             >>
>>             >> If I am understanding your previous message correctly, I
>>             do not need to
>>             >> add a lsb primitive for the drbd daemon? It will be
>>             >> started/stopped/managed automatically by my
>>             ocf:linbit:drbd resources
>>             >> (and I can remove the /etc/rc* symlinks)?
>>             >
>>             > Yes, you don't need that LSB script when using Pacemaker
>>             and should not
>>             > let init start it.
>>             >
>>             > Regards,
>>             > Andreas
>>             >
>>             > --
>>             > Need help with Pacemaker?
>>             > http://www.hastexo.com/now
>>             >
>>             >>
>>             >> Thanks,
>>             >>
>>             >> Andrew
>>             >>
>>             >>
>>            
> ------------------------------------------------------------------------
>>             >> *From: *"Andreas Kurz" <andreas at hastexo.com
>>             <mailto:andreas at hastexo.com> <mailto:andreas at hastexo.com
>>             <mailto:andreas at hastexo.com>>>
>>             >> *To: *pacemaker at oss.clusterlabs.org
>>             <mailto:pacemaker at oss.clusterlabs.org>
>>             <mailto:pacemaker at oss.clusterlabs.org
>>             <mailto:pacemaker at oss.clusterlabs.org>>
>>             >> *Sent: *Wednesday, March 28, 2012 7:27:34 AM
>>             >> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD
>>             resources to
>>             >> master on failover
>>             >>
>>             >> On 03/28/2012 12:13 AM, Andrew Martin wrote:
>>             >>> Hi Andreas,
>>             >>>
>>             >>> Thanks, I've updated the colocation rule to be in the
>>             correct order. I
>>             >>> also enabled the STONITH resource (this was temporarily
>>             disabled before
>>             >>> for some additional testing). DRBD has its own network
>>             connection over
>>             >>> the br1 interface (192.168.5.0/24
>>             <http://192.168.5.0/24> network), a direct crossover cable
>>             >>> between node1 and node2:
>>             >>> global { usage-count no; }
>>             >>> common {
>>             >>>         syncer { rate 110M; }
>>             >>> }
>>             >>> resource vmstore {
>>             >>>         protocol C;
>>             >>>         startup {
>>             >>>                 wfc-timeout  15;
>>             >>>                 degr-wfc-timeout 60;
>>             >>>         }
>>             >>>         handlers {
>>             >>>                 #fence-peer
>>             "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
>>             >>>                 fence-peer "/usr/local/bin/fence-peer";
>>             >>
>>             >> hmm ... what is that fence-peer script doing? If you want
>>             to use
>>             >> resource-level fencing with the help of dopd, activate the
>>             >> drbd-peer-outdater script in the line above ... and
>>             double check if the
>>             >> path is correct
>>             >>
>>             >>>                 split-brain
>>             "/usr/lib/drbd/notify-split-brain.sh
>>             >>> me at example.com <mailto:me at example.com>
>>             <mailto:me at example.com <mailto:me at example.com>>";
>>             >>>         }
>>             >>>         net {
>>             >>>                 after-sb-0pri discard-zero-changes;
>>             >>>                 after-sb-1pri discard-secondary;
>>             >>>                 after-sb-2pri disconnect;
>>             >>>                 cram-hmac-alg md5;
>>             >>>                 shared-secret "xxxxx";
>>             >>>         }
>>             >>>         disk {
>>             >>>                 fencing resource-only;
>>             >>>         }
>>             >>>         on node1 {
>>             >>>                 device /dev/drbd0;
>>             >>>                 disk /dev/sdb1;
>>             >>>                 address 192.168.5.10:7787
>>             <http://192.168.5.10:7787>;
>>             >>>                 meta-disk internal;
>>             >>>         }
>>             >>>         on node2 {
>>             >>>                 device /dev/drbd0;
>>             >>>                 disk /dev/sdf1;
>>             >>>                 address 192.168.5.11:7787
>>             <http://192.168.5.11:7787>;
>>             >>>                 meta-disk internal;
>>             >>>         }
>>             >>> }
>>             >>> # and similar for mount1 and mount2
>>             >>>
>>             >>> Also, here is my ha.cf <http://ha.cf>. It uses both the
>>             direct link between the nodes
>>             >>> (br1) and the shared LAN network on br0 for communicating:
>>             >>> autojoin none
>>             >>> mcast br0 239.0.0.43 694 1 0
>>             >>> bcast br1
>>             >>> warntime 5
>>             >>> deadtime 15
>>             >>> initdead 60
>>             >>> keepalive 2
>>             >>> node node1
>>             >>> node node2
>>             >>> node quorumnode
>>             >>> crm respawn
>>             >>> respawn hacluster /usr/lib/heartbeat/dopd
>>             >>> apiauth dopd gid=haclient uid=hacluster
>>             >>>
>>             >>> I am thinking of making the following changes to the CIB
>>             (as per the
>>             >>> official DRBD
>>             >>> guide
>>             >>
>>             >
>>            
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html)
>>             in
>>             >>> order to add the DRBD lsb service and require that it
>>             start before the
>>             >>> ocf:linbit:drbd resources. Does this look correct?
>>             >>
>>             >> Where did you read that? No, deactivate the startup of
>>             DRBD on system
>>             >> boot and let Pacemaker manage it completely.
>>             >>
>>             >>> primitive p_drbd-init lsb:drbd op monitor interval="30"
>>             >>> colocation c_drbd_together inf:
>>             >>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
>>             >>> ms_drbd_mount2:Master
>>             >>> order drbd_init_first inf: ms_drbd_vmstore:promote
>>             >>> ms_drbd_mount1:promote ms_drbd_mount2:promote
>>             p_drbd-init:start
>>             >>>
>>             >>> This doesn't seem to require that drbd be also running
>>             on the node where
>>             >>> the ocf:linbit:drbd resources are slave (which it would
>>             need to do to be
>>             >>> a DRBD SyncTarget) - how can I ensure that drbd is
>>             running everywhere?
>>             >>> (clone cl_drbd p_drbd-init ?)
>>             >>
>>             >> This is really not needed.
>>             >>
>>             >> Regards,
>>             >> Andreas
>>             >>
>>             >> --
>>             >> Need help with Pacemaker?
>>             >> http://www.hastexo.com/now
>>             >>
>>             >>>
>>             >>> Thanks,
>>             >>>
>>             >>> Andrew
>>             >>>
>>            
> ------------------------------------------------------------------------
>>             >>> *From: *"Andreas Kurz" <andreas at hastexo.com
>>             <mailto:andreas at hastexo.com> <mailto:andreas at hastexo.com
>>             <mailto:andreas at hastexo.com>>>
>>             >>> *To: *pacemaker at oss.clusterlabs.org
>>             <mailto:pacemaker at oss.clusterlabs.org>
>>             > <mailto:*pacemaker at oss.clusterlabs.org
>>             <mailto:pacemaker at oss.clusterlabs.org>>
>>             >>> *Sent: *Monday, March 26, 2012 5:56:22 PM
>>             >>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD
>>             resources to
>>             >>> master on failover
>>             >>>
>>             >>> On 03/24/2012 08:15 PM, Andrew Martin wrote:
>>             >>>> Hi Andreas,
>>             >>>>
>>             >>>> My complete cluster configuration is as follows:
>>             >>>> ============
>>             >>>> Last updated: Sat Mar 24 13:51:55 2012
>>             >>>> Last change: Sat Mar 24 13:41:55 2012
>>             >>>> Stack: Heartbeat
>>             >>>> Current DC: node2
>>             (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition
>>             >>>> with quorum
>>             >>>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
>>             >>>> 3 Nodes configured, unknown expected votes
>>             >>>> 19 Resources configured.
>>             >>>> ============
>>             >>>>
>>             >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4):
>>             OFFLINE
>>             > (standby)
>>             >>>> Online: [ node2 node1 ]
>>             >>>>
>>             >>>>  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>>             >>>>      Masters: [ node2 ]
>>             >>>>      Slaves: [ node1 ]
>>             >>>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>>             >>>>      Masters: [ node2 ]
>>             >>>>      Slaves: [ node1 ]
>>             >>>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>>             >>>>      Masters: [ node2 ]
>>             >>>>      Slaves: [ node1 ]
>>             >>>>  Resource Group: g_vm
>>             >>>>      p_fs_vmstore(ocf::heartbeat:Filesystem):Started
> node2
>>             >>>>      p_vm(ocf::heartbeat:VirtualDomain):Started node2
>>             >>>>  Clone Set: cl_daemons [g_daemons]
>>             >>>>      Started: [ node2 node1 ]
>>             >>>>      Stopped: [ g_daemons:2 ]
>>             >>>>  Clone Set: cl_sysadmin_notify [p_sysadmin_notify]
>>             >>>>      Started: [ node2 node1 ]
>>             >>>>      Stopped: [ p_sysadmin_notify:2 ]
>>             >>>>  stonith-node1(stonith:external/tripplitepdu):Started
> node2
>>             >>>>  stonith-node2(stonith:external/tripplitepdu):Started
> node1
>>             >>>>  Clone Set: cl_ping [p_ping]
>>             >>>>      Started: [ node2 node1 ]
>>             >>>>      Stopped: [ p_ping:2 ]
>>             >>>>
>>             >>>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \
>>             >>>>         attributes standby="off"
>>             >>>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \
>>             >>>>         attributes standby="off"
>>             >>>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4"
>>             quorumnode \
>>             >>>>         attributes standby="on"
>>             >>>> primitive p_drbd_mount2 ocf:linbit:drbd \
>>             >>>>         params drbd_resource="mount2" \
>>             >>>>         op monitor interval="15" role="Master" \
>>             >>>>         op monitor interval="30" role="Slave"
>>             >>>> primitive p_drbd_mount1 ocf:linbit:drbd \
>>             >>>>         params drbd_resource="mount1" \
>>             >>>>         op monitor interval="15" role="Master" \
>>             >>>>         op monitor interval="30" role="Slave"
>>             >>>> primitive p_drbd_vmstore ocf:linbit:drbd \
>>             >>>>         params drbd_resource="vmstore" \
>>             >>>>         op monitor interval="15" role="Master" \
>>             >>>>         op monitor interval="30" role="Slave"
>>             >>>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \
>>             >>>>         params device="/dev/drbd0" directory="/vmstore"
>>             fstype="ext4" \
>>             >>>>         op start interval="0" timeout="60s" \
>>             >>>>         op stop interval="0" timeout="60s" \
>>             >>>>         op monitor interval="20s" timeout="40s"
>>             >>>> primitive p_libvirt-bin upstart:libvirt-bin \
>>             >>>>         op monitor interval="30"
>>             >>>> primitive p_ping ocf:pacemaker:ping \
>>             >>>>         params name="p_ping" host_list="192.168.1.10
>>             192.168.1.11"
>>             >>>> multiplier="1000" \
>>             >>>>         op monitor interval="20s"
>>             >>>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \
>>             >>>>         params email="me at example.com
>>             <mailto:me at example.com> <mailto:me at example.com
>>             <mailto:me at example.com>>" \
>>             >>>>         params subject="Pacemaker Change" \
>>             >>>>         op start interval="0" timeout="10" \
>>             >>>>         op stop interval="0" timeout="10" \
>>             >>>>         op monitor interval="10" timeout="10"
>>             >>>> primitive p_vm ocf:heartbeat:VirtualDomain \
>>             >>>>         params config="/vmstore/config/vm.xml" \
>>             >>>>         meta allow-migrate="false" \
>>             >>>>         op start interval="0" timeout="120s" \
>>             >>>>         op stop interval="0" timeout="120s" \
>>             >>>>         op monitor interval="10" timeout="30"
>>             >>>> primitive stonith-node1 stonith:external/tripplitepdu \
>>             >>>>         params pdu_ipaddr="192.168.1.12" pdu_port="1"
>>             pdu_username="xxx"
>>             >>>> pdu_password="xxx" hostname_to_stonith="node1"
>>             >>>> primitive stonith-node2 stonith:external/tripplitepdu \
>>             >>>>         params pdu_ipaddr="192.168.1.12" pdu_port="2"
>>             pdu_username="xxx"
>>             >>>> pdu_password="xxx" hostname_to_stonith="node2"
>>             >>>> group g_daemons p_libvirt-bin
>>             >>>> group g_vm p_fs_vmstore p_vm
>>             >>>> ms ms_drbd_mount2 p_drbd_mount2 \
>>             >>>>         meta master-max="1" master-node-max="1"
>>             clone-max="2"
>>             >>>> clone-node-max="1" notify="true"
>>             >>>> ms ms_drbd_mount1 p_drbd_mount1 \
>>             >>>>         meta master-max="1" master-node-max="1"
>>             clone-max="2"
>>             >>>> clone-node-max="1" notify="true"
>>             >>>> ms ms_drbd_vmstore p_drbd_vmstore \
>>             >>>>         meta master-max="1" master-node-max="1"
>>             clone-max="2"
>>             >>>> clone-node-max="1" notify="true"
>>             >>>> clone cl_daemons g_daemons
>>             >>>> clone cl_ping p_ping \
>>             >>>>         meta interleave="true"
>>             >>>> clone cl_sysadmin_notify p_sysadmin_notify
>>             >>>> location l-st-node1 stonith-node1 -inf: node1
>>             >>>> location l-st-node2 stonith-node2 -inf: node2
>>             >>>> location l_run_on_most_connected p_vm \
>>             >>>>         rule $id="l_run_on_most_connected-rule" p_ping:
>>             defined p_ping
>>             >>>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
>>             >>>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
>>             >>>
>>             >>> As Emmanuel already said, g_vm has to be in the first
>>             place in this
>>             >>> collocation constraint .... g_vm must be colocated with
>>             the drbd masters.
>>             >>>
>>             >>>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote
>>             ms_drbd_mount1:promote
>>             >>>> ms_drbd_mount2:promote cl_daemons:start g_vm:start
>>             >>>> property $id="cib-bootstrap-options" \
>>             >>>>        
>>             dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>>             >>>>         cluster-infrastructure="Heartbeat" \
>>             >>>>         stonith-enabled="false" \
>>             >>>>         no-quorum-policy="stop" \
>>             >>>>         last-lrm-refresh="1332539900" \
>>             >>>>         cluster-recheck-interval="5m" \
>>             >>>>         crmd-integration-timeout="3m" \
>>             >>>>         shutdown-escalation="5m"
>>             >>>>
>>             >>>> The STONITH plugin is a custom plugin I wrote for the
>>             Tripp-Lite
>>             >>>> PDUMH20ATNET that I'm using as the STONITH device:
>>             >>>>
>>            
> http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf
>>             >>>
>>             >>> And why don't using it? .... stonith-enabled="false"
>>             >>>
>>             >>>>
>>             >>>> As you can see, I left the DRBD service to be started
>>             by the operating
>>             >>>> system (as an lsb script at boot time) however
>>             Pacemaker controls
>>             >>>> actually bringing up/taking down the individual DRBD
>>             devices.
>>             >>>
>>             >>> Don't start drbd on system boot, give Pacemaker the full
>>             control.
>>             >>>
>>             >>> The
>>             >>>> behavior I observe is as follows: I issue "crm resource
>>             migrate p_vm" on
>>             >>>> node1 and failover successfully to node2. During this
>>             time, node2 fences
>>             >>>> node1's DRBD devices (using dopd) and marks them as
>>             Outdated. Meanwhile
>>             >>>> node2's DRBD devices are UpToDate. I then shutdown both
>>             nodes and then
>>             >>>> bring them back up. They reconnect to the cluster (with
>>             quorum), and
>>             >>>> node1's DRBD devices are still Outdated as expected and
>>             node2's DRBD
>>             >>>> devices are still UpToDate, as expected. At this point,
>>             DRBD starts on
>>             >>>> both nodes, however node2 will not set DRBD as master:
>>             >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4):
>>             OFFLINE
>>             > (standby)
>>             >>>> Online: [ node2 node1 ]
>>             >>>>
>>             >>>>  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>>             >>>>      Slaves: [ node1 node2 ]
>>             >>>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>>             >>>>      Slaves: [ node1 node 2 ]
>>             >>>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>>             >>>>      Slaves: [ node1 node2 ]
>>             >>>
>>             >>> There should really be no interruption of the drbd
>>             replication on vm
>>             >>> migration that activates the dopd ... drbd has its own
>>             direct network
>>             >>> connection?
>>             >>>
>>             >>> Please share your ha.cf <http://ha.cf> file and your
>>             drbd configuration. Watch out for
>>             >>> drbd messages in your kernel log file, that should give
>>             you additional
>>             >>> information when/why the drbd connection was lost.
>>             >>>
>>             >>> Regards,
>>             >>> Andreas
>>             >>>
>>             >>> --
>>             >>> Need help with Pacemaker?
>>             >>> http://www.hastexo.com/now
>>             >>>
>>             >>>>
>>             >>>> I am having trouble sorting through the logging
>>             information because
>>             >>>> there is so much of it in /var/log/daemon.log, but I
>>             can't  find an
>>             >>>> error message printed about why it will not promote
>>             node2. At this point
>>             >>>> the DRBD devices are as follows:
>>             >>>> node2: cstate = WFConnection dstate=UpToDate
>>             >>>> node1: cstate = StandAlone dstate=Outdated
>>             >>>>
>>             >>>> I don't see any reason why node2 can't become DRBD
>>             master, or am I
>>             >>>> missing something? If I do "drbdadm connect all" on
>>             node1, then the
>>             >>>> cstate on both nodes changes to "Connected" and node2
>>             immediately
>>             >>>> promotes the DRBD resources to master. Any ideas on why
>>             I'm observing
>>             >>>> this incorrect behavior?
>>             >>>>
>>             >>>> Any tips on how I can better filter through the
>>             pacemaker/heartbeat logs
>>             >>>> or how to get additional useful debug information?
>>             >>>>
>>             >>>> Thanks,
>>             >>>>
>>             >>>> Andrew
>>             >>>>
>>             >>>>
>>            
> ------------------------------------------------------------------------
>>             >>>> *From: *"Andreas Kurz" <andreas at hastexo.com
>>             <mailto:andreas at hastexo.com>
>>             > <mailto:andreas at hastexo.com <mailto:andreas at hastexo.com>>>
>>             >>>> *To: *pacemaker at oss.clusterlabs.org
>>             <mailto:pacemaker at oss.clusterlabs.org>
>>             >> <mailto:*pacemaker at oss.clusterlabs.org
>>             <mailto:pacemaker at oss.clusterlabs.org>>
>>             >>>> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM
>>             >>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD
>>             resources to
>>             >>>> master on failover
>>             >>>>
>>             >>>> On 01/25/2012 08:58 PM, Andrew Martin wrote:
>>             >>>>> Hello,
>>             >>>>>
>>             >>>>> Recently I finished configuring a two-node cluster
>>             with pacemaker 1.1.6
>>             >>>>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04.
>>             This cluster
>>             > includes
>>             >>>>> the following resources:
>>             >>>>> - primitives for DRBD storage devices
>>             >>>>> - primitives for mounting the filesystem on the DRBD
>>             storage
>>             >>>>> - primitives for some mount binds
>>             >>>>> - primitive for starting apache
>>             >>>>> - primitives for starting samba and nfs servers
>>             (following instructions
>>             >>>>> here
>>             <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>)
>>             >>>>> - primitives for exporting nfs shares
>>             (ocf:heartbeat:exportfs)
>>             >>>>
>>             >>>> not enough information ... please share at least your
>>             complete cluster
>>             >>>> configuration
>>             >>>>
>>             >>>> Regards,
>>             >>>> Andreas
>>             >>>>
>>             >>>> --
>>             >>>> Need help with Pacemaker?
>>             >>>> http://www.hastexo.com/now
>>             >>>>
>>             >>>>>
>>             >>>>> Perhaps this is best described through the output of
>>             crm_mon:
>>             >>>>> Online: [ node1 node2 ]
>>             >>>>>
>>             >>>>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>>             (unmanaged)
>>             >>>>>      p_drbd_mount1:0     (ocf::linbit:drbd):    
>>             Started node2
>>             >>> (unmanaged)
>>             >>>>>      p_drbd_mount1:1     (ocf::linbit:drbd):    
>>             Started node1
>>             >>>>> (unmanaged) FAILED
>>             >>>>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>>             >>>>>      p_drbd_mount2:0       (ocf::linbit:drbd):    
>>             Master node1
>>             >>>>> (unmanaged) FAILED
>>             >>>>>      Slaves: [ node2 ]
>>             >>>>>  Resource Group: g_core
>>             >>>>>      p_fs_mount1 (ocf::heartbeat:Filesystem):  
>>              Started node1
>>             >>>>>      p_fs_mount2   (ocf::heartbeat:Filesystem):  
>>              Started node1
>>             >>>>>      p_ip_nfs   (ocf::heartbeat:IPaddr2):      
>>             Started node1
>>             >>>>>  Resource Group: g_apache
>>             >>>>>      p_fs_mountbind1    (ocf::heartbeat:Filesystem):  
>>              Started node1
>>             >>>>>      p_fs_mountbind2    (ocf::heartbeat:Filesystem):  
>>              Started node1
>>             >>>>>      p_fs_mountbind3    (ocf::heartbeat:Filesystem):  
>>              Started node1
>>             >>>>>      p_fs_varwww        (ocf::heartbeat:Filesystem):  
>>              Started node1
>>             >>>>>      p_apache   (ocf::heartbeat:apache):      
>>              Started node1
>>             >>>>>  Resource Group: g_fileservers
>>             >>>>>      p_lsb_smb  (lsb:smbd):     Started node1
>>             >>>>>      p_lsb_nmb  (lsb:nmbd):     Started node1
>>             >>>>>      p_lsb_nfsserver    (lsb:nfs-kernel-server):      
>>              Started node1
>>             >>>>>      p_exportfs_mount1   (ocf::heartbeat:exportfs):  
>>                Started node1
>>             >>>>>      p_exportfs_mount2     (ocf::heartbeat:exportfs):
>>                  Started
>>             > node1
>>             >>>>>
>>             >>>>> I have read through the Pacemaker Explained
>>             >>>>>
>>             >>>>
>>             >>>
>>             >
>>            
> <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained>
>>             >>>>> documentation, however could not find a way to further
>>             debug these
>>             >>>>> problems. First, I put node1 into standby mode to
>>             attempt failover to
>>             >>>>> the other node (node2). Node2 appeared to start the
>>             transition to
>>             >>>>> master, however it failed to promote the DRBD
>>             resources to master (the
>>             >>>>> first step). I have attached a copy of this session in
>>             commands.log and
>>             >>>>> additional excerpts from /var/log/syslog during
>>             important steps. I have
>>             >>>>> attempted everything I can think of to try and start
>>             the DRBD resource
>>             >>>>> (e.g. start/stop/promote/manage/cleanup under crm
>>             resource, restarting
>>             >>>>> heartbeat) but cannot bring it out of the slave state.
>>             However, if
>>             > I set
>>             >>>>> it to unmanaged and then run drbdadm primary all in
>>             the terminal,
>>             >>>>> pacemaker is satisfied and continues starting the rest
>>             of the
>>             > resources.
>>             >>>>> It then failed when attempting to mount the filesystem
>>             for mount2, the
>>             >>>>> p_fs_mount2 resource. I attempted to mount the
>>             filesystem myself
>>             > and was
>>             >>>>> successful. I then unmounted it and ran cleanup on
>>             p_fs_mount2 and then
>>             >>>>> it mounted. The rest of the resources started as
>>             expected until the
>>             >>>>> p_exportfs_mount2 resource, which failed as follows:
>>             >>>>> p_exportfs_mount2     (ocf::heartbeat:exportfs):    
>>              started node2
>>             >>>>> (unmanaged) FAILED
>>             >>>>>
>>             >>>>> I ran cleanup on this and it started, however when
>>             running this test
>>             >>>>> earlier today no command could successfully start this
>>             exportfs
>>             >> resource.
>>             >>>>>
>>             >>>>> How can I configure pacemaker to better resolve these
>>             problems and be
>>             >>>>> able to bring the node up successfully on its own?
>>             What can I check to
>>             >>>>> determine why these failures are occuring?
>>             /var/log/syslog did not seem
>>             >>>>> to contain very much useful information regarding why
>>             the failures
>>             >>>> occurred.
>>             >>>>>
>>             >>>>> Thanks,
>>             >>>>>
>>             >>>>> Andrew
>>             >>>>>
>>             >>>>>
>>             >>>>>
>>             >>>>>
>>             >>>>> This body part will be downloaded on demand.
>>             >>>>
>>             >>>>
>>             >>>>
>>             >>>>
>>             >>>>
>>             >>>> _______________________________________________
>>             >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>
>>             >> <mailto:Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>>
>>             >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>             >>>>
>>             >>>> Project Home: http://www.clusterlabs.org
>>             >>>> Getting started:
>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>             >>>> Bugs: http://bugs.clusterlabs.org
>>             >>>>
>>             >>>>
>>             >>>>
>>             >>>> _______________________________________________
>>             >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>
>>             >> <mailto:Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>>
>>             >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>             >>>>
>>             >>>> Project Home: http://www.clusterlabs.org
>>             >>>> Getting started:
>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>             >>>> Bugs: http://bugs.clusterlabs.org
>>             >>>
>>             >>>
>>             >>>
>>             >>> _______________________________________________
>>             >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>
>>             >> <mailto:Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>>
>>             >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>             >>>
>>             >>> Project Home: http://www.clusterlabs.org
>>             >>> Getting started:
>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>             >>> Bugs: http://bugs.clusterlabs.org
>>             >>>
>>             >>>
>>             >>>
>>             >>> _______________________________________________
>>             >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>
>>             >> <mailto:Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>>
>>             >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>             >>>
>>             >>> Project Home: http://www.clusterlabs.org
>>             >>> Getting started:
>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>             >>> Bugs: http://bugs.clusterlabs.org
>>             >>
>>             >>
>>             >>
>>             >> _______________________________________________
>>             >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>
>>             >> <mailto:Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>>
>>             >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>             >>
>>             >> Project Home: http://www.clusterlabs.org
>>             >> Getting started:
>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>             >> Bugs: http://bugs.clusterlabs.org
>>             >>
>>             >>
>>             >>
>>             >> _______________________________________________
>>             >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>
>>             >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>             >>
>>             >> Project Home: http://www.clusterlabs.org
>>             >> Getting started:
>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>             >> Bugs: http://bugs.clusterlabs.org
>>             >
>>             >
>>             >
>>             >
>>             > _______________________________________________
>>             > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>
>>             > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>             >
>>             > Project Home: http://www.clusterlabs.org
>>             > Getting started:
>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>             > Bugs: http://bugs.clusterlabs.org
>>             >
>>             >
>>             >
>>             > _______________________________________________
>>             > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>
>>             > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>             >
>>             > Project Home: http://www.clusterlabs.org
>>             > Getting started:
>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>             > Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>             _______________________________________________
>>             Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>
>>             http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>             Project Home: http://www.clusterlabs.org
>>             Getting started:
>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>             Bugs: http://bugs.clusterlabs.org
>>
>>
>>             _______________________________________________
>>             Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>             <mailto:Pacemaker at oss.clusterlabs.org>
>>             http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>             Project Home: http://www.clusterlabs.org
>>             Getting started:
>>             http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>             Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>>         --
>>         esta es mi vida e me la vivo hasta que dios quiera
>>
>>         _______________________________________________
>>         Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>         <mailto:Pacemaker at oss.clusterlabs.org>
>>         http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>         Project Home: http://www.clusterlabs.org
>>         Getting started:
>>         http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>         Bugs: http://bugs.clusterlabs.org
>>
>>
>>         _______________________________________________
>>         Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>         <mailto:Pacemaker at oss.clusterlabs.org>
>>         http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>         Project Home: http://www.clusterlabs.org
>>         Getting started:
>>         http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>         Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>>     --
>>     esta es mi vida e me la vivo hasta que dios quiera
>>
>>     _______________________________________________
>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>     http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>     Project Home: http://www.clusterlabs.org
>>     Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>     Bugs: http://bugs.clusterlabs.org
>>
>>
>>     _______________________________________________
>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>     http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>     Project Home: http://www.clusterlabs.org
>>     Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>     Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> -- 
> Need help with Pacemaker?
> http://www.hastexo.com/now
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120410/6c9d76a7/attachment-0001.sig>