[Pacemaker] ERROR: Wrong stack o2cb

Wed Jun 26 11:53:17 EDT 2013

----- Original Message -----
> From: "Denis Witt" <denis.witt at concepts-and-training.de>
> To: pacemaker at oss.clusterlabs.org
> Cc: "jsmith" <jsmith at argotec.com>
> Sent: Wednesday, June 26, 2013 8:35:08 AM
> Subject: Re: [Pacemaker] ERROR: Wrong stack o2cb
> 
> On Wed, 26 Jun 2013 07:53:37 -0400 (EDT)
> jsmith <jsmith at argotec.com> wrote:
> 
> > You could start ocfs2 in the cluster just disable/remove the
> > filesystem resource for now. Once pacemaker has started ocfs2 I
> > believe you can do what you need?
> 
> Hi Jake,
> 
> Node test4: standby
> Online: [ test4-node1 test4-node2 ]
> 
>  Master/Slave Set: ms_drbd [drbd]
>      Masters: [ test4-node1 test4-node2 ]
>  Clone Set: clone_pingtest [pingtest]
>      Started: [ test4-node1 test4-node2 ]
>      Stopped: [ pingtest:2 ]
>  Resource Group: grp_all
>      sip	(ocf::heartbeat:IPaddr2):	Started test4-node1
>      apache	(ocf::heartbeat:apache):	Started test4-node1
>  Clone Set: cl_ocfs2mgmt [g_ocfs2mgmt]
>      Started: [ test4-node2 test4-node1 ]
>      Stopped: [ g_ocfs2mgmt:2 ]
>  Clone Set: cl_fs_ocfs2 [fs_drbd]
>      Started: [ test4-node2 test4-node1 ]
>      Stopped: [ fs_drbd:2 ]

If you wanted you could set location constraints preventing test4 and/or clone node max so you don't have stopped status in the clone sets for node test4.

I like shorter when possible so you can also combine the filesystem into the ocfs2 group before cloning and only have one clone set for controld, o2cb, and ocfs fs.  That's just me ;-)

> 
> Failed actions:
>     p_o2cb:0_monitor_0 (node=test4, call=164, rc=5, status=complete):
> not installed p_controld:0_monitor_0 (node=test4, call=163, rc=5,
> status=complete): not installed drbd:0_monitor_0 (node=test4,
> call=159,
> rc=5, status=complete): not installed
> 
> Thanks a lot!
> 

Glad it helped!  That's what we're all here for

> For the record, using DRBD/OCFS2 with Pacemaker/corosync on Debian
> Wheezy:
> 
> apt-get install ocfs2-tools ocfs2-tools-pacemaker openais dlm-pcmk
> 
> Configure your DRBD-Drives, make sure they are running (you can
> format
> them as ext4 to test if they mount well, but don't run them as
> dual-primary, yet).
> 
> DON'T add /etc/ocfs2/cluster.conf
> update-rc.d ocfs2 disable
> update-rc.d o2cb disable
> Add ocfs2_stack_user to /etc/modules
> 
> Then add all groups/clone sets/primitives, except fs_drbd related
> ones.
> When the cluster is running format the drive, so that the correct
> stack
> will be written.
> Then add the fs_drbd related stuff.
> 
> Should work then. I'll check this procedure on a new machine and
> extend
> the list if necessary.

One more comment - I noticed in the log out put last time an error about killproc.  When I tested a bit on Ubuntu with ocfs2 I noticed there was a patch for that error.  It looks from you log like in Debian it still stopped successfully and I'm sure you'll find out for sure when you test fencing/stonith/failures but might want to double check:

Jun 26 10:32:29 test4 lrmd: [3134]: info: rsc:p_o2cb:0 stop[15] (pid 3651)
Jun 26 10:32:29 test4 ocfs2_controld: kill node 302186506 - ocfs2_controld PROCDOWN
Jun 26 10:32:29 test4 stonith-ng: [3133]: info: initiate_remote_stonith_op: Initiating remote operation off for 302186506: 1be401d4-547d-4f59-b380-a5e996c70a31
Jun 26 10:32:29 test4 stonith-ng: [3133]: info: stonith_command: Processed st_query from test4-node1: rc=0
Jun 26 10:32:29 test4 stonith-ng: [3133]: info: crm_new_peer: Node test4 now has id: 302252042
Jun 26 10:32:29 test4 stonith-ng: [3133]: info: crm_new_peer: Node 302252042 is now known as test4
Jun 26 10:32:29 test4 stonith-ng: [3133]: info: crm_new_peer: Node test4-node2 now has id: 302186506
Jun 26 10:32:29 test4 stonith-ng: [3133]: info: crm_new_peer: Node 302186506 is now known as test4-node2
Jun 26 10:32:29 test4 o2cb[3651]: INFO: Stopping p_o2cb:0
Jun 26 10:32:29 test4 o2cb[3651]: INFO: Stopping ocfs2_controld.pcmk
Jun 26 10:32:29 test4 lrmd: [3134]: info: RA output: (p_o2cb:0:stop:stderr) /usr/lib/ocf/resource.d//pacemaker/o2cb: line 171: killproc: command not found
^^^ This line
Jun 26 10:32:30 test4 lrmd: [3134]: info: operation stop[15] on p_o2cb:0 for client 3137: pid 3651 exited with return code 0
Jun 26 10:32:30 test4 crmd: [3137]: info: process_lrm_event: LRM operation p_o2cb:0_stop_0 (call=15, rc=0, cib-update=21, confirmed=true) ok
^^^ looks like the stop succeed anyway but...

On Ubuntu - https://bugs.launchpad.net/ubuntu/lucid/+source/pacemaker/+bug/727422

Jake