[Pacemaker] Drbd disk don't run

Fri May 15 12:58:36 EDT 2009

With pacemaker i can't set up a state primary/primary?
I'm trying to run a disk now, then i wan't put than in primary/primary
state.

With drbdadm i put the disk in working very well. The drbd+ocfs2 is already
working, but now i want the pacemaker init the drbd and ocfs2/o2cb deamon,
set the drbddisks in primary/primary, mount ocfs2 partition and then start
the virtual machine...

The drbd, ocfs2 and vm are ok, lack only the pacemaker function for me to
finish my graduation project... :( ...

On Fri, May 15, 2009 at 12:01 PM, Dejan Muhamedagic <dejanmm at fastmail.fm>wrote:

> Hi,
>
> On Fri, May 15, 2009 at 08:54:31AM -0300, Rafael Emerick wrote:
> > Hi, Dejan
> >
> > The fist problem are solved, but now i have another.
> > When i try to start de ms-drbd11 resource i don't get any error, but in
> the
> > crm_mon i get the log:
> >
> > ============
> > Last updated: Fri May 15 08:44:11 2009
> > Current DC: node1 (57e0232d-5b78-4a1a-976e-e5335ba8266d) - partition with
> > quorum
> > Version: 1.0.3-b133b3f19797c00f9189f4b66b513963f9d25db9
> > 2 Nodes configured, unknown expected votes
> > 2 Resources configured.
> > ============
> >
> > Online: [ node1 node2 ]
> >
> > Clone Set: drbdinit
> >         Started: [ node1 node2 ]
> >
> > Failed actions:
> >     drbd11:0_start_0 (node=node1, call=9, rc=1, status=complete): unknown
> > error
> >     drbd11_start_0 (node=node1, call=17, rc=1, status=complete): unknown
> > error
> >     drbd11:1_start_0 (node=node2, call=9, rc=1, status=complete): unknown
> > error
> >     drbd11_start_0 (node=node2, call=16, rc=1, status=complete): unknown
> > error
> >
> > So, in the messes log file, i get
> >
> >
> > May 15 08:25:03 node1 pengine: [4749]: WARN: unpack_resources: No STONITH
> > resources have been defined
> > May 15 08:25:03 node1 pengine: [4749]: info: determine_online_status:
> Node
> > node1 is online
> > May 15 08:25:03 node1 pengine: [4749]: info: unpack_rsc_op:
> drbd11:0_start_0
> > on node1 returned 1 (unknown error) instead of the expected value: 0 (ok)
> > May 15 08:25:03 node1 pengine: [4749]: WARN: unpack_rsc_op: Processing
> > failed op drbd11:0_start_0 on node1: unknown error
> > May 15 08:25:03 node1 pengine: [4749]: WARN: process_orphan_resource:
> > Nothing known about resource drbd11 running on node1
> > May 15 08:25:03 node1 pengine: [4749]: info: log_data_element:
> > create_fake_resource: Orphan resource <primitive id="drbd11" type="drbd"
> > class="ocf" provider="heartbeat" />
> > May 15 08:25:03 node1 pengine: [4749]: info: process_orphan_resource:
> Making
> > sure orphan drbd11 is stopped
> > May 15 08:25:03 node1 pengine: [4749]: info: unpack_rsc_op:
> drbd11_start_0
> > on node1 returned 1 (unknown error) instead of the expected value: 0 (ok)
> > May 15 08:25:03 node1 pengine: [4749]: WARN: unpack_rsc_op: Processing
> > failed op drbd11_start_0 on node1: unknown error
> > May 15 08:25:03 node1 pengine: [4749]: info: determine_online_status:
> Node
> > node2 is online
> > May 15 08:25:03 node1 pengine: [4749]: info: find_clone: Internally
> renamed
> > drbdi:0 on node2 to drbdi:1
> > May 15 08:25:03 node1 pengine: [4749]: info: unpack_rsc_op:
> drbd11:1_start_0
> > on node2 returned 1 (unknown error) instead of the expected value: 0 (ok)
> > May 15 08:25:03 node1 pengine: [4749]: WARN: unpack_rsc_op: Processing
> > failed op drbd11:1_start_0 on node2: unknown error
> > May 15 08:25:03 node1 pengine: [4749]: info: unpack_rsc_op:
> drbd11_start_0
> > on node2 returned 1 (unknown error) instead of the expected value: 0 (ok)
> > May 15 08:25:03 node1 pengine: [4749]: WARN: unpack_rsc_op: Processing
> > failed op drbd11_start_0 on node2: unknown error
> > May 15 08:25:03 node1 pengine: [4749]: notice: clone_print: Clone Set:
> > drbdinit
> > May 15 08:25:03 node1 pengine: [4749]: notice: print_list:     Started: [
> > node1 node2 ]
> > May 15 08:25:03 node1 pengine: [4749]: notice: clone_print: Master/Slave
> > Set: ms-drbd11
> > May 15 08:25:03 node1 pengine: [4749]: notice: print_list:     Stopped: [
> > drbd11:0 drbd11:1 ]
> > May 15 08:25:03 node1 pengine: [4749]: info: get_failcount: ms-drbd11 has
> > failed 1000000 times on node1
> > May 15 08:25:03 node1 pengine: [4749]: WARN: common_apply_stickiness:
> > Forcing ms-drbd11 away from node1 after 1000000 failures (max=1000000)
> > May 15 08:25:03 node1 pengine: [4749]: info: get_failcount: drbd11 has
> > failed 1000000 times on node1
> > May 15 08:25:03 node1 pengine: [4749]: WARN: common_apply_stickiness:
> > Forcing drbd11 away from node1 after 1000000 failures (max=1000000)
> > May 15 08:25:03 node1 pengine: [4749]: info: get_failcount: ms-drbd11 has
> > failed 1000000 times on node2
> > May 15 08:25:03 node1 pengine: [4749]: WARN: common_apply_stickiness:
> > Forcing ms-drbd11 away from node2 after 1000000 failures (max=1000000)
> > May 15 08:25:03 node1 pengine: [4749]: info: get_failcount: drbd11 has
> > failed 1000000 times on node2
> > May 15 08:25:03 node1 pengine: [4749]: WARN: common_apply_stickiness:
> > Forcing drbd11 away from node2 after 1000000 failures (max=1000000)
> > May 15 08:25:03 node1 pengine: [4749]: WARN: native_color: Resource
> drbd11:0
> > cannot run anywhere
> > May 15 08:25:03 node1 pengine: [4749]: WARN: native_color: Resource
> drbd11:1
> > cannot run anywhere
> > May 15 08:25:03 node1 pengine: [4749]: info: master_color: ms-drbd11:
> > Promoted 0 instances of a possible 1 to master
> > May 15 08:25:03 node1 pengine: [4749]: notice: LogActions: Leave resource
> > drbdi:0      (Started node1)
> > May 15 08:25:03 node1 pengine: [4749]: notice: LogActions: Leave resource
> > drbdi:1      (Started node2)
> > May 15 08:25:03 node1 pengine: [4749]: notice: LogActions: Leave resource
> > drbd11:0     (Stopped)
> > May 15 08:25:03 node1 pengine: [4749]: notice: LogActions: Leave resource
> > drbd11:1     (Stopped)
> >
> >
> > I had this problem with heartbeatV2, then i'm using pacemaker with the
> same
> > error.
> > My idea is that the crm does the management of the drbd, ocfs2 and vmxen
>
> Can ocfs2 run on top of drbd? In that case you need master/master
> resource. What you have is master/slave.
>
> > resources to maintain them working...
>
> It does, but this is a resource level problem. Funny that the
> logs don't show much. You'll have to try by hand using drbdadm.
>
> > To drbd resource init, the Sonith must be configured?
>
> You must have stonith, in particular since it's shared storage.
>
> Also, set
>
> crm configure property no-quorum-policy=ignore
>
> Thanks,
>
> Dejan
>
> > Thank you!
> >
> > On Fri, May 15, 2009 at 7:02 AM, Dejan Muhamedagic <dejanmm at fastmail.fm
> >wrote:
> >
> > > Hi,
> > >
> > > On Fri, May 15, 2009 at 06:47:37AM -0300, Rafael Emerick wrote:
> > > > Hi, Dejan
> > > >
> > > > thanks for attention
> > > > following my cib xml conf
> > > > I am newbie with pacemaker, any hint is very welcome! : D
> > >
> > > The CIB as seen by crm:
> > >
> > > primitive drbd11 ocf:heartbeat:drbd \
> > >        params drbd_resource="drbd11" \
> > >        op monitor interval="59s" role="Master" timeout="30s" \
> > >        op monitor interval="60s" role="Slave" timeout="30s" \
> > >        meta target-role="started" is-managed="true"
> > > ms ms-drbd11 drbd11 \
> > >        meta clone-max="2" notify="true" globally-unique="false"
> > > target-role="stopped"
> > >
> > > The target-role attribute is defined for both the primitive and
> > > the container (ms). You should remove the former:
> > >
> > > crm configure edit drbd11
> > >
> > > and remove all meta attributes (the whole "meta" part). And don't
> > > forget to remove the backslash in the line above it.
> > >
> > > Thanks,
> > >
> > > Dejan
> > >
> > > > thank you very much
> > > > for the help
> > > >
> > > >
> > > > On Fri, May 15, 2009 at 4:46 AM, Dejan Muhamedagic <
> dejanmm at fastmail.fm
> > > >wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, May 14, 2009 at 05:13:50PM -0300, Rafael Emerick wrote:
> > > > > > Hi, Dejan
> > > > > >
> > > > > > There is no two set of meta-attributes.
> > > > > >
> > > > > > I remove the ms-drbd11, add again and the error is the same:
> > > > > > Error performing operation: Required data for this CIB API call
> not
> > > found
> > > > >
> > > > > Can you please post your CIB. As xml.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Dejan
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > On Thu, May 14, 2009 at 3:43 PM, Dejan Muhamedagic <
> > > dejanmm at fastmail.fm
> > > > > >wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Thu, May 14, 2009 at 03:18:15PM -0300, Rafael Emerick wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm tryng to make a cluster with xen-ha using drbd and
> ocfs2...
> > > > > > > >
> > > > > > > > I want that crm management all resources (xen machines, drbd
> > > disks
> > > > > and
> > > > > > > ocfs2
> > > > > > > > filesystem ).
> > > > > > > >
> > > > > > > > First, a create a clone lsb resource to init drbd with gui
> > > interface.
> > > > > > > > Now, I'm following this manual
> > > > > > > http://clusterlabs.org/wiki/DRBD_HowTo_1.0 to
> > > > > > > > create the drbd disk managemnt and after make the ocfs2
> > > filesystem.
> > > > > > > >
> > > > > > > > So, when i run:
> > > > > > > > # crm resource start ms-drbd11
> > > > > > > > # Multiple attributes match name=target-role
> > > > > > > > # Value: stopped
> > >  (id=ms-drbd11-meta_attributes-target-role)
> > > > > > > > # Value: started
>  (id=drbd11-meta_attributes-target-role)
> > > > > > > > # Error performing operation: Required data for this CIB API
> call
> > > not
> > > > > > > found
> > > > > > >
> > > > > > > As it says, there are multiple matches for the attribute. Don't
> > > > > > > know how it came to be. Perhaps you can
> > > > > > >
> > > > > > > crm configure edit ms-drbd11
> > > > > > >
> > > > > > > and drop one of them. It could also be that there are two sets
> of
> > > > > > > meta-attributes.
> > > > > > >
> > > > > > > If crm can't edit the resource (in that case please report it)
> > > > > > > then you can try:
> > > > > > >
> > > > > > > crm configure edit xml ms-drbd11
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Dejan
> > > > > > >
> > > > > > > > My messages:
> > > > > > > > May 14 15:07:11 node1 pengine: [4749]: info: get_fail count:
> > > > > ms-drbd11
> > > > > > > has
> > > > > > > > failed 1000000 times on node2
> > > > > > > > May 14 15:07:11 node1 pengine: [4749]: WARN:
> > > common_apply_stickiness:
> > > > > > > > Forcing ms-drbd11 away from node2 after 1000000 failures
> > > > > (max=1000000)
> > > > > > > > May 14 15:07:11 node1 pengine: [4749]: WARN: native_color:
> > > Resource
> > > > > > > drbd11:0
> > > > > > > > cannot run anywhere
> > > > > > > > May 14 15:07:11 node1 pengine: [4749]: WARN: native_color:
> > > Resource
> > > > > > > drbd11:1
> > > > > > > > cannot run anywhere
> > > > > > > > May 14 15:07:11 node1 pengine: [4749]: info: master_color:
> > > ms-drbd11:
> > > > > > > > Promoted 0 instances of a possible 1 to master
> > > > > > > > May 14 15:07:11 node1 pengine: [4749]: notice: LogActions:
> Leave
> > > > > resource
> > > > > > > > drbdi:0      (Started node1)
> > > > > > > > May 14 15:07:11 node1 pengine: [4749]: notice: LogActions:
> Leave
> > > > > resource
> > > > > > > > drbdi:1      (Started node2)
> > > > > > > > May 14 15:07:11 node1 pengine: [4749]: notice: LogActions:
> Leave
> > > > > resource
> > > > > > > > drbd11:0     (Stopped)
> > > > > > > > May 14 15:07:11 node1 pengine: [4749]: notice: LogActions:
> Leave
> > > > > resource
> > > > > > > > drbd11:1     (Stopped)
> > > > > > > >
> > > > > > > >
> > > > > > > > Thank you for any help!
> > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Pacemaker mailing list
> > > > > > > > Pacemaker at oss.clusterlabs.org
> > > > > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Pacemaker mailing list
> > > > > > > Pacemaker at oss.clusterlabs.org
> > > > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > > > > >
> > > > >
> > > > > > _______________________________________________
> > > > > > Pacemaker mailing list
> > > > > > Pacemaker at oss.clusterlabs.org
> > > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Pacemaker mailing list
> > > > > Pacemaker at oss.clusterlabs.org
> > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > > >
> > >
> > > > _______________________________________________
> > > > Pacemaker mailing list
> > > > Pacemaker at oss.clusterlabs.org
> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list
> > > Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
>
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20090515/3427a7c0/attachment-0001.html>