[Pacemaker] Nodes will not promote DRBD resources to master on failover
Andrew Martin
amartin at xes-inc.com
Fri Mar 30 15:16:45 UTC 2012
Hi Emmanuel,
Here is the output of crm configure show:
http://pastebin.com/NA1fZ8dL
Thanks,
Andrew
----- Original Message -----
From: "emmanuel segura" <emi2fast at gmail.com>
To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
Sent: Friday, March 30, 2012 9:43:45 AM
Subject: Re: [Pacemaker] Nodes will not promote DRBD resources to master on failover
can you show me?
crm configure show
Il giorno 30 marzo 2012 16:10, Andrew Martin < amartin at xes-inc.com > ha scritto:
Hi Andreas,
Here is a copy of my complete CIB:
http://pastebin.com/v5wHVFuy
I'll work on generating a report using crm_report as well.
Thanks,
Andrew
From: "Andreas Kurz" < andreas at hastexo.com >
To: pacemaker at oss.clusterlabs.org
Sent: Friday, March 30, 2012 4:41:16 AM
Subject: Re: [Pacemaker] Nodes will not promote DRBD resources to master on failover
On 03/28/2012 04:56 PM, Andrew Martin wrote:
> Hi Andreas,
>
> I disabled the DRBD init script and then restarted the slave node
> (node2). After it came back up, DRBD did not start:
> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending
> Online: [ node2 node1 ]
>
> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
> Masters: [ node1 ]
> Stopped: [ p_drbd_vmstore:1 ]
> Master/Slave Set: ms_drbd_mount1 [p_drbd_tools]
> Masters: [ node1 ]
> Stopped: [ p_drbd_mount1:1 ]
> Master/Slave Set: ms_drbd_mount2 [p_drbdmount2]
> Masters: [ node1 ]
> Stopped: [ p_drbd_mount2:1 ]
> ...
>
> root at node2:~# service drbd status
> drbd not loaded
Yes, expected unless Pacemaker starts DRBD
>
> Is there something else I need to change in the CIB to ensure that DRBD
> is started? All of my DRBD devices are configured like this:
> primitive p_drbd_mount2 ocf:linbit:drbd \
> params drbd_resource="mount2" \
> op monitor interval="15" role="Master" \
> op monitor interval="30" role="Slave"
> ms ms_drbd_mount2 p_drbd_mount2 \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
That should be enough ... unable to say more without seeing the complete
configuration ... too much fragments of information ;-)
Please provide (e.g. pastebin) your complete cib (cibadmin -Q) when
cluster is in that state ... or even better create a crm_report archive
>
> Here is the output from the syslog (grep -i drbd /var/log/syslog):
> Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
> key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
> op=p_drbd_vmstore:1_monitor_0 )
> Mar 28 09:24:47 node2 lrmd: [3210]: info: rsc:p_drbd_vmstore:1 probe[2]
> (pid 3455)
> Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
> key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
> op=p_drbd_mount1:1_monitor_0 )
> Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount1:1 probe[3]
> (pid 3456)
> Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
> key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
> op=p_drbd_mount2:1_monitor_0 )
> Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount2:1 probe[4]
> (pid 3457)
> Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING: Couldn't find
> device [/dev/drbd0]. Expected /dev/??? to exist
> Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked:
> crm_attribute -N node2 -n master-p_drbd_mount2:1 -l reboot -D
> Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked:
> crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l reboot -D
> Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked:
> crm_attribute -N node2 -n master-p_drbd_mount1:1 -l reboot -D
> Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[4] on
> p_drbd_mount2:1 for client 3213: pid 3457 exited with return code 7
> Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[2] on
> p_drbd_vmstore:1 for client 3213: pid 3455 exited with return code 7
> Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
> operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=10,
> confirmed=true) not running
> Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[3] on
> p_drbd_mount1:1 for client 3213: pid 3456 exited with return code 7
> Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
> operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11,
> confirmed=true) not running
> Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
> operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=12,
> confirmed=true) not running
No errors, just probing ... so for any reason Pacemaker does not like to
start it ... use crm_simulate to find out why ... or provide information
as requested above.
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
>
> Thanks,
>
> Andrew
>
> ------------------------------------------------------------------------
> *From: *"Andreas Kurz" < andreas at hastexo.com >
> *To: * pacemaker at oss.clusterlabs.org
> *Sent: *Wednesday, March 28, 2012 9:03:06 AM
> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> master on failover
>
> On 03/28/2012 03:47 PM, Andrew Martin wrote:
>> Hi Andreas,
>>
>>> hmm ... what is that fence-peer script doing? If you want to use
>>> resource-level fencing with the help of dopd, activate the
>>> drbd-peer-outdater script in the line above ... and double check if the
>>> path is correct
>> fence-peer is just a wrapper for drbd-peer-outdater that does some
>> additional logging. In my testing dopd has been working well.
>
> I see
>
>>
>>>> I am thinking of making the following changes to the CIB (as per the
>>>> official DRBD
>>>> guide
>>
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html ) in
>>>> order to add the DRBD lsb service and require that it start before the
>>>> ocf:linbit:drbd resources. Does this look correct?
>>>
>>> Where did you read that? No, deactivate the startup of DRBD on system
>>> boot and let Pacemaker manage it completely.
>>>
>>>> primitive p_drbd-init lsb:drbd op monitor interval="30"
>>>> colocation c_drbd_together inf:
>>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
>>>> ms_drbd_mount2:Master
>>>> order drbd_init_first inf: ms_drbd_vmstore:promote
>>>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start
>>>>
>>>> This doesn't seem to require that drbd be also running on the node where
>>>> the ocf:linbit:drbd resources are slave (which it would need to do to be
>>>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere?
>>>> (clone cl_drbd p_drbd-init ?)
>>>
>>> This is really not needed.
>> I was following the official DRBD Users Guide:
>> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html
>>
>> If I am understanding your previous message correctly, I do not need to
>> add a lsb primitive for the drbd daemon? It will be
>> started/stopped/managed automatically by my ocf:linbit:drbd resources
>> (and I can remove the /etc/rc* symlinks)?
>
> Yes, you don't need that LSB script when using Pacemaker and should not
> let init start it.
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
>>
>> Thanks,
>>
>> Andrew
>>
>> ------------------------------------------------------------------------
>> *From: *"Andreas Kurz" < andreas at hastexo.com <mailto: andreas at hastexo.com >>
>> *To: * pacemaker at oss.clusterlabs.org <mailto: pacemaker at oss.clusterlabs.org >
>> *Sent: *Wednesday, March 28, 2012 7:27:34 AM
>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>> master on failover
>>
>> On 03/28/2012 12:13 AM, Andrew Martin wrote:
>>> Hi Andreas,
>>>
>>> Thanks, I've updated the colocation rule to be in the correct order. I
>>> also enabled the STONITH resource (this was temporarily disabled before
>>> for some additional testing). DRBD has its own network connection over
>>> the br1 interface ( 192.168.5.0/24 network), a direct crossover cable
>>> between node1 and node2:
>>> global { usage-count no; }
>>> common {
>>> syncer { rate 110M; }
>>> }
>>> resource vmstore {
>>> protocol C;
>>> startup {
>>> wfc-timeout 15;
>>> degr-wfc-timeout 60;
>>> }
>>> handlers {
>>> #fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
>>> fence-peer "/usr/local/bin/fence-peer";
>>
>> hmm ... what is that fence-peer script doing? If you want to use
>> resource-level fencing with the help of dopd, activate the
>> drbd-peer-outdater script in the line above ... and double check if the
>> path is correct
>>
>>> split-brain "/usr/lib/drbd/notify-split-brain.sh
>>> me at example.com <mailto: me at example.com >";
>>> }
>>> net {
>>> after-sb-0pri discard-zero-changes;
>>> after-sb-1pri discard-secondary;
>>> after-sb-2pri disconnect;
>>> cram-hmac-alg md5;
>>> shared-secret "xxxxx";
>>> }
>>> disk {
>>> fencing resource-only;
>>> }
>>> on node1 {
>>> device /dev/drbd0;
>>> disk /dev/sdb1;
>>> address 192.168.5.10:7787 ;
>>> meta-disk internal;
>>> }
>>> on node2 {
>>> device /dev/drbd0;
>>> disk /dev/sdf1;
>>> address 192.168.5.11:7787 ;
>>> meta-disk internal;
>>> }
>>> }
>>> # and similar for mount1 and mount2
>>>
>>> Also, here is my ha.cf . It uses both the direct link between the nodes
>>> (br1) and the shared LAN network on br0 for communicating:
>>> autojoin none
>>> mcast br0 239.0.0.43 694 1 0
>>> bcast br1
>>> warntime 5
>>> deadtime 15
>>> initdead 60
>>> keepalive 2
>>> node node1
>>> node node2
>>> node quorumnode
>>> crm respawn
>>> respawn hacluster /usr/lib/heartbeat/dopd
>>> apiauth dopd gid=haclient uid=hacluster
>>>
>>> I am thinking of making the following changes to the CIB (as per the
>>> official DRBD
>>> guide
>>
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html ) in
>>> order to add the DRBD lsb service and require that it start before the
>>> ocf:linbit:drbd resources. Does this look correct?
>>
>> Where did you read that? No, deactivate the startup of DRBD on system
>> boot and let Pacemaker manage it completely.
>>
>>> primitive p_drbd-init lsb:drbd op monitor interval="30"
>>> colocation c_drbd_together inf:
>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
>>> ms_drbd_mount2:Master
>>> order drbd_init_first inf: ms_drbd_vmstore:promote
>>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start
>>>
>>> This doesn't seem to require that drbd be also running on the node where
>>> the ocf:linbit:drbd resources are slave (which it would need to do to be
>>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere?
>>> (clone cl_drbd p_drbd-init ?)
>>
>> This is really not needed.
>>
>> Regards,
>> Andreas
>>
>> --
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>>>
>>> Thanks,
>>>
>>> Andrew
>>> ------------------------------------------------------------------------
>>> *From: *"Andreas Kurz" < andreas at hastexo.com <mailto: andreas at hastexo.com >>
>>> *To: * pacemaker at oss.clusterlabs.org
> <mailto:* pacemaker at oss.clusterlabs.org >
>>> *Sent: *Monday, March 26, 2012 5:56:22 PM
>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>>> master on failover
>>>
>>> On 03/24/2012 08:15 PM, Andrew Martin wrote:
>>>> Hi Andreas,
>>>>
>>>> My complete cluster configuration is as follows:
>>>> ============
>>>> Last updated: Sat Mar 24 13:51:55 2012
>>>> Last change: Sat Mar 24 13:41:55 2012
>>>> Stack: Heartbeat
>>>> Current DC: node2 (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition
>>>> with quorum
>>>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
>>>> 3 Nodes configured, unknown expected votes
>>>> 19 Resources configured.
>>>> ============
>>>>
>>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE
> (standby)
>>>> Online: [ node2 node1 ]
>>>>
>>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>>>> Masters: [ node2 ]
>>>> Slaves: [ node1 ]
>>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>>>> Masters: [ node2 ]
>>>> Slaves: [ node1 ]
>>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>>>> Masters: [ node2 ]
>>>> Slaves: [ node1 ]
>>>> Resource Group: g_vm
>>>> p_fs_vmstore(ocf::heartbeat:Filesystem):Started node2
>>>> p_vm(ocf::heartbeat:VirtualDomain):Started node2
>>>> Clone Set: cl_daemons [g_daemons]
>>>> Started: [ node2 node1 ]
>>>> Stopped: [ g_daemons:2 ]
>>>> Clone Set: cl_sysadmin_notify [p_sysadmin_notify]
>>>> Started: [ node2 node1 ]
>>>> Stopped: [ p_sysadmin_notify:2 ]
>>>> stonith-node1(stonith:external/tripplitepdu):Started node2
>>>> stonith-node2(stonith:external/tripplitepdu):Started node1
>>>> Clone Set: cl_ping [p_ping]
>>>> Started: [ node2 node1 ]
>>>> Stopped: [ p_ping:2 ]
>>>>
>>>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \
>>>> attributes standby="off"
>>>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \
>>>> attributes standby="off"
>>>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \
>>>> attributes standby="on"
>>>> primitive p_drbd_mount2 ocf:linbit:drbd \
>>>> params drbd_resource="mount2" \
>>>> op monitor interval="15" role="Master" \
>>>> op monitor interval="30" role="Slave"
>>>> primitive p_drbd_mount1 ocf:linbit:drbd \
>>>> params drbd_resource="mount1" \
>>>> op monitor interval="15" role="Master" \
>>>> op monitor interval="30" role="Slave"
>>>> primitive p_drbd_vmstore ocf:linbit:drbd \
>>>> params drbd_resource="vmstore" \
>>>> op monitor interval="15" role="Master" \
>>>> op monitor interval="30" role="Slave"
>>>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \
>>>> params device="/dev/drbd0" directory="/vmstore" fstype="ext4" \
>>>> op start interval="0" timeout="60s" \
>>>> op stop interval="0" timeout="60s" \
>>>> op monitor interval="20s" timeout="40s"
>>>> primitive p_libvirt-bin upstart:libvirt-bin \
>>>> op monitor interval="30"
>>>> primitive p_ping ocf:pacemaker:ping \
>>>> params name="p_ping" host_list="192.168.1.10 192.168.1.11"
>>>> multiplier="1000" \
>>>> op monitor interval="20s"
>>>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \
>>>> params email=" me at example.com <mailto: me at example.com >" \
>>>> params subject="Pacemaker Change" \
>>>> op start interval="0" timeout="10" \
>>>> op stop interval="0" timeout="10" \
>>>> op monitor interval="10" timeout="10"
>>>> primitive p_vm ocf:heartbeat:VirtualDomain \
>>>> params config="/vmstore/config/vm.xml" \
>>>> meta allow-migrate="false" \
>>>> op start interval="0" timeout="120s" \
>>>> op stop interval="0" timeout="120s" \
>>>> op monitor interval="10" timeout="30"
>>>> primitive stonith-node1 stonith:external/tripplitepdu \
>>>> params pdu_ipaddr="192.168.1.12" pdu_port="1" pdu_username="xxx"
>>>> pdu_password="xxx" hostname_to_stonith="node1"
>>>> primitive stonith-node2 stonith:external/tripplitepdu \
>>>> params pdu_ipaddr="192.168.1.12" pdu_port="2" pdu_username="xxx"
>>>> pdu_password="xxx" hostname_to_stonith="node2"
>>>> group g_daemons p_libvirt-bin
>>>> group g_vm p_fs_vmstore p_vm
>>>> ms ms_drbd_mount2 p_drbd_mount2 \
>>>> meta master-max="1" master-node-max="1" clone-max="2"
>>>> clone-node-max="1" notify="true"
>>>> ms ms_drbd_mount1 p_drbd_mount1 \
>>>> meta master-max="1" master-node-max="1" clone-max="2"
>>>> clone-node-max="1" notify="true"
>>>> ms ms_drbd_vmstore p_drbd_vmstore \
>>>> meta master-max="1" master-node-max="1" clone-max="2"
>>>> clone-node-max="1" notify="true"
>>>> clone cl_daemons g_daemons
>>>> clone cl_ping p_ping \
>>>> meta interleave="true"
>>>> clone cl_sysadmin_notify p_sysadmin_notify
>>>> location l-st-node1 stonith-node1 -inf: node1
>>>> location l-st-node2 stonith-node2 -inf: node2
>>>> location l_run_on_most_connected p_vm \
>>>> rule $id="l_run_on_most_connected-rule" p_ping: defined p_ping
>>>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
>>>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
>>>
>>> As Emmanuel already said, g_vm has to be in the first place in this
>>> collocation constraint .... g_vm must be colocated with the drbd masters.
>>>
>>>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote
>>>> ms_drbd_mount2:promote cl_daemons:start g_vm:start
>>>> property $id="cib-bootstrap-options" \
>>>> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>>>> cluster-infrastructure="Heartbeat" \
>>>> stonith-enabled="false" \
>>>> no-quorum-policy="stop" \
>>>> last-lrm-refresh="1332539900" \
>>>> cluster-recheck-interval="5m" \
>>>> crmd-integration-timeout="3m" \
>>>> shutdown-escalation="5m"
>>>>
>>>> The STONITH plugin is a custom plugin I wrote for the Tripp-Lite
>>>> PDUMH20ATNET that I'm using as the STONITH device:
>>>> http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf
>>>
>>> And why don't using it? .... stonith-enabled="false"
>>>
>>>>
>>>> As you can see, I left the DRBD service to be started by the operating
>>>> system (as an lsb script at boot time) however Pacemaker controls
>>>> actually bringing up/taking down the individual DRBD devices.
>>>
>>> Don't start drbd on system boot, give Pacemaker the full control.
>>>
>>> The
>>>> behavior I observe is as follows: I issue "crm resource migrate p_vm" on
>>>> node1 and failover successfully to node2. During this time, node2 fences
>>>> node1's DRBD devices (using dopd) and marks them as Outdated. Meanwhile
>>>> node2's DRBD devices are UpToDate. I then shutdown both nodes and then
>>>> bring them back up. They reconnect to the cluster (with quorum), and
>>>> node1's DRBD devices are still Outdated as expected and node2's DRBD
>>>> devices are still UpToDate, as expected. At this point, DRBD starts on
>>>> both nodes, however node2 will not set DRBD as master:
>>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE
> (standby)
>>>> Online: [ node2 node1 ]
>>>>
>>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>>>> Slaves: [ node1 node2 ]
>>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>>>> Slaves: [ node1 node 2 ]
>>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>>>> Slaves: [ node1 node2 ]
>>>
>>> There should really be no interruption of the drbd replication on vm
>>> migration that activates the dopd ... drbd has its own direct network
>>> connection?
>>>
>>> Please share your ha.cf file and your drbd configuration. Watch out for
>>> drbd messages in your kernel log file, that should give you additional
>>> information when/why the drbd connection was lost.
>>>
>>> Regards,
>>> Andreas
>>>
>>> --
>>> Need help with Pacemaker?
>>> http://www.hastexo.com/now
>>>
>>>>
>>>> I am having trouble sorting through the logging information because
>>>> there is so much of it in /var/log/daemon.log, but I can't find an
>>>> error message printed about why it will not promote node2. At this point
>>>> the DRBD devices are as follows:
>>>> node2: cstate = WFConnection dstate=UpToDate
>>>> node1: cstate = StandAlone dstate=Outdated
>>>>
>>>> I don't see any reason why node2 can't become DRBD master, or am I
>>>> missing something? If I do "drbdadm connect all" on node1, then the
>>>> cstate on both nodes changes to "Connected" and node2 immediately
>>>> promotes the DRBD resources to master. Any ideas on why I'm observing
>>>> this incorrect behavior?
>>>>
>>>> Any tips on how I can better filter through the pacemaker/heartbeat logs
>>>> or how to get additional useful debug information?
>>>>
>>>> Thanks,
>>>>
>>>> Andrew
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From: *"Andreas Kurz" < andreas at hastexo.com
> <mailto: andreas at hastexo.com >>
>>>> *To: * pacemaker at oss.clusterlabs.org
>> <mailto:* pacemaker at oss.clusterlabs.org >
>>>> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM
>>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>>>> master on failover
>>>>
>>>> On 01/25/2012 08:58 PM, Andrew Martin wrote:
>>>>> Hello,
>>>>>
>>>>> Recently I finished configuring a two-node cluster with pacemaker 1.1.6
>>>>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This cluster
> includes
>>>>> the following resources:
>>>>> - primitives for DRBD storage devices
>>>>> - primitives for mounting the filesystem on the DRBD storage
>>>>> - primitives for some mount binds
>>>>> - primitive for starting apache
>>>>> - primitives for starting samba and nfs servers (following instructions
>>>>> here < http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf >)
>>>>> - primitives for exporting nfs shares (ocf:heartbeat:exportfs)
>>>>
>>>> not enough information ... please share at least your complete cluster
>>>> configuration
>>>>
>>>> Regards,
>>>> Andreas
>>>>
>>>> --
>>>> Need help with Pacemaker?
>>>> http://www.hastexo.com/now
>>>>
>>>>>
>>>>> Perhaps this is best described through the output of crm_mon:
>>>>> Online: [ node1 node2 ]
>>>>>
>>>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged)
>>>>> p_drbd_mount1:0 (ocf::linbit:drbd): Started node2
>>> (unmanaged)
>>>>> p_drbd_mount1:1 (ocf::linbit:drbd): Started node1
>>>>> (unmanaged) FAILED
>>>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>>>>> p_drbd_mount2:0 (ocf::linbit:drbd): Master node1
>>>>> (unmanaged) FAILED
>>>>> Slaves: [ node2 ]
>>>>> Resource Group: g_core
>>>>> p_fs_mount1 (ocf::heartbeat:Filesystem): Started node1
>>>>> p_fs_mount2 (ocf::heartbeat:Filesystem): Started node1
>>>>> p_ip_nfs (ocf::heartbeat:IPaddr2): Started node1
>>>>> Resource Group: g_apache
>>>>> p_fs_mountbind1 (ocf::heartbeat:Filesystem): Started node1
>>>>> p_fs_mountbind2 (ocf::heartbeat:Filesystem): Started node1
>>>>> p_fs_mountbind3 (ocf::heartbeat:Filesystem): Started node1
>>>>> p_fs_varwww (ocf::heartbeat:Filesystem): Started node1
>>>>> p_apache (ocf::heartbeat:apache): Started node1
>>>>> Resource Group: g_fileservers
>>>>> p_lsb_smb (lsb:smbd): Started node1
>>>>> p_lsb_nmb (lsb:nmbd): Started node1
>>>>> p_lsb_nfsserver (lsb:nfs-kernel-server): Started node1
>>>>> p_exportfs_mount1 (ocf::heartbeat:exportfs): Started node1
>>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs): Started
> node1
>>>>>
>>>>> I have read through the Pacemaker Explained
>>>>>
>>>>
>>>
> < http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained >
>>>>> documentation, however could not find a way to further debug these
>>>>> problems. First, I put node1 into standby mode to attempt failover to
>>>>> the other node (node2). Node2 appeared to start the transition to
>>>>> master, however it failed to promote the DRBD resources to master (the
>>>>> first step). I have attached a copy of this session in commands.log and
>>>>> additional excerpts from /var/log/syslog during important steps. I have
>>>>> attempted everything I can think of to try and start the DRBD resource
>>>>> (e.g. start/stop/promote/manage/cleanup under crm resource, restarting
>>>>> heartbeat) but cannot bring it out of the slave state. However, if
> I set
>>>>> it to unmanaged and then run drbdadm primary all in the terminal,
>>>>> pacemaker is satisfied and continues starting the rest of the
> resources.
>>>>> It then failed when attempting to mount the filesystem for mount2, the
>>>>> p_fs_mount2 resource. I attempted to mount the filesystem myself
> and was
>>>>> successful. I then unmounted it and ran cleanup on p_fs_mount2 and then
>>>>> it mounted. The rest of the resources started as expected until the
>>>>> p_exportfs_mount2 resource, which failed as follows:
>>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs): started node2
>>>>> (unmanaged) FAILED
>>>>>
>>>>> I ran cleanup on this and it started, however when running this test
>>>>> earlier today no command could successfully start this exportfs
>> resource.
>>>>>
>>>>> How can I configure pacemaker to better resolve these problems and be
>>>>> able to bring the node up successfully on its own? What can I check to
>>>>> determine why these failures are occuring? /var/log/syslog did not seem
>>>>> to contain very much useful information regarding why the failures
>>>> occurred.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Andrew
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> This body part will be downloaded on demand.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto: Pacemaker at oss.clusterlabs.org >
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto: Pacemaker at oss.clusterlabs.org >
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto: Pacemaker at oss.clusterlabs.org >
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto: Pacemaker at oss.clusterlabs.org >
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto: Pacemaker at oss.clusterlabs.org >
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
esta es mi vida e me la vivo hasta que dios quiera
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120330/199039dc/attachment.htm>
More information about the Pacemaker
mailing list