[Pacemaker] "ERROR: Wrong stack o2cb" when trying to start o2cb service in Pacemaker cluster
David Guyot
david.guyot at europecamions-interactive.com
Fri Jun 22 12:40:44 UTC 2012
Le 22/06/2012 11:58, Andreas Kurz a écrit :
> On 06/22/2012 11:14 AM, David Guyot wrote:
>> Hello.
>>
>> Concerning dlm-pcmk, it's not available from backports, so I installed
>> it from stable; only ocfs2-tools-pacemaker are available and installed
>> from it.
> thats ok
>
>> I checked if /etc/init.d/ocfs2 and /etc/init.d/o2cb are removed from
>> /etc/rcX.d/*, and they are, so the system cannot boot them up by itself.
> you also explicitely stopped them (on both nodes) or did you reboot the
> systems anyway?
Yes, I explicitly stopped them on both nodes and, to be sure, restarted
the system and then again explicitly stopped them, but without effect, I
always have :
Failed actions:
p_o2cb:1_monitor_0 (node=Vindemiatrix, call=9, rc=5,
status=complete): not installed
p_o2cb:0_monitor_0 (node=Malastare, call=9, rc=5, status=complete):
not installed
>
>> I also reconfigured DRBD resources using notify=true in each DRBD
>> master, then I reconfigured OCFS2 resources using these crm commands
>>
>> primitive p_controld ocf:pacemaker:controld
>> primitive p_o2cb ocf:ocfs2:o2cb
> interesting ... should be ocf:pacemaker:o2cb
In fact, this is an error in the guide I already noticed and corrected
to ocf:pacemaker:o2cb.
>
>> group g_ocfs2mgmt p_controld p_o2cb
>> clone cl_ocfs2mgmt g_ocfs2mgmt meta interleave=true
>>
> looks ok for testing o2cb, controld .. you will need colocation and
> order constraints later when starting the filesystem
>
>> root at Malastare:/home/david# crm configure show
>> node Malastare
>> node Vindemiatrix
>> primitive p_controld ocf:pacemaker:controld
>> primitive p_drbd_backupvi ocf:linbit:drbd \
>> params drbd_resource="backupvi"
>> primitive p_drbd_pgsql ocf:linbit:drbd \
>> params drbd_resource="postgresql"
>> primitive p_drbd_svn ocf:linbit:drbd \
>> params drbd_resource="svn"
>> primitive p_drbd_www ocf:linbit:drbd \
>> params drbd_resource="www"
>> primitive p_o2cb ocf:pacemaker:o2cb
>> primitive soapi-fencing-malastare stonith:external/ovh \
>> params reversedns="ns208812.ovh.net"
>> primitive soapi-fencing-vindemiatrix stonith:external/ovh \
>> params reversedns="ns235795.ovh.net"
>> group g_ocfs2mgmt p_controld p_o2cb
>> ms ms_drbd_backupvi p_drbd_backupvi \
>> meta master-max="2" clone-max="2" notify="true"
>> ms ms_drbd_pgsql p_drbd_pgsql \
>> meta master-max="2" clone-max="2" notify="true"
>> ms ms_drbd_svn p_drbd_svn \
>> meta master-max="2" clone-max="2" notify="true"
>> ms ms_drbd_www p_drbd_www \
>> meta master-max="2" clone-max="2" notify="true"
>> clone cl_ocfs2mgmt g_ocfs2mgmt \
>> meta interleave="true"
>> location stonith-malastare soapi-fencing-malastare -inf: Malastare
>> location stonith-vindemiatrix soapi-fencing-vindemiatrix -inf: Vindemiatrix
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>> cluster-infrastructure="openais" \
>> expected-quorum-votes="2"
>>
>> Unfortunately, the problem is still there :
>>
>> root at Malastare:/home/david# crm_mon --one-shot -VroA
>> ============
>> Last updated: Fri Jun 22 10:54:31 2012
>> Last change: Fri Jun 22 10:54:27 2012 via crm_shadow on Malastare
>> Stack: openais
>> Current DC: Malastare - partition with quorum
>> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> 2 Nodes configured, 2 expected votes
>> 14 Resources configured.
>> ============
>>
>> Online: [ Malastare Vindemiatrix ]
>>
>> Full list of resources:
>>
>> soapi-fencing-malastare (stonith:external/ovh): Started Vindemiatrix
>> soapi-fencing-vindemiatrix (stonith:external/ovh): Started Malastare
>> Master/Slave Set: ms_drbd_pgsql [p_drbd_pgsql]
>> Masters: [ Malastare Vindemiatrix ]
>> Master/Slave Set: ms_drbd_svn [p_drbd_svn]
>> Masters: [ Malastare Vindemiatrix ]
>> Master/Slave Set: ms_drbd_www [p_drbd_www]
>> Masters: [ Malastare Vindemiatrix ]
>> Master/Slave Set: ms_drbd_backupvi [p_drbd_backupvi]
>> Masters: [ Malastare Vindemiatrix ]
>> Clone Set: cl_ocfs2mgmt [g_ocfs2mgmt]
>> Stopped: [ g_ocfs2mgmt:0 g_ocfs2mgmt:1 ]
>>
>> Node Attributes:
>> * Node Malastare:
>> + master-p_drbd_backupvi:0 : 10000
>> + master-p_drbd_pgsql:0 : 10000
>> + master-p_drbd_svn:0 : 10000
>> + master-p_drbd_www:0 : 10000
>> * Node Vindemiatrix:
>> + master-p_drbd_backupvi:1 : 10000
>> + master-p_drbd_pgsql:1 : 10000
>> + master-p_drbd_svn:1 : 10000
>> + master-p_drbd_www:1 : 10000
>>
>> Operations:
>> * Node Vindemiatrix:
>> soapi-fencing-malastare: migration-threshold=1000000
>> + (4) start: rc=0 (ok)
>> p_drbd_pgsql:1: migration-threshold=1000000
>> + (5) probe: rc=8 (master)
>> p_drbd_svn:1: migration-threshold=1000000
>> + (6) probe: rc=8 (master)
>> p_drbd_www:1: migration-threshold=1000000
>> + (7) probe: rc=8 (master)
>> p_drbd_backupvi:1: migration-threshold=1000000
>> + (8) probe: rc=8 (master)
>> p_o2cb:1: migration-threshold=1000000
>> + (10) probe: rc=5 (not installed)
>> * Node Malastare:
>> soapi-fencing-vindemiatrix: migration-threshold=1000000
>> + (4) start: rc=0 (ok)
>> p_drbd_pgsql:0: migration-threshold=1000000
>> + (5) probe: rc=8 (master)
>> p_drbd_svn:0: migration-threshold=1000000
>> + (6) probe: rc=8 (master)
>> p_drbd_www:0: migration-threshold=1000000
>> + (7) probe: rc=8 (master)
>> p_drbd_backupvi:0: migration-threshold=1000000
>> + (8) probe: rc=8 (master)
>> p_o2cb:0: migration-threshold=1000000
>> + (10) probe: rc=5 (not installed)
>>
>> Failed actions:
>> p_o2cb:1_monitor_0 (node=Vindemiatrix, call=10, rc=5,
>> status=complete): not installed
>> p_o2cb:0_monitor_0 (node=Malastare, call=10, rc=5, status=complete):
>> not installed
>>
>> Nevertheless, I noticed a strange error message in Corosync/Pacemaker logs :
>> Jun 22 10:54:25 Vindemiatrix lrmd: [24580]: info: RA output:
>> (p_controld:1:probe:stderr) dlm_controld.pcmk: no process found
> this looks like the initial probing so there is no running controld is
> expected
>
>> This message was immediately followed by "Wrong stack" errors, and
> check the content of /sysfs/fs/ocfs2/loaded_cluster_plugins ... and if
> you have that configfile and it contains the value "user" this is a good
> sign you have started ocfs2/o2cb via init ;-)
Indeed, this file exists and contains "o2cb", but I stopped both ocfs2
and o2cb thrice, before and after reboot, and, as you see here :
root at Malastare:/etc/rc2.d# ls
K02drbd S01fancontrol S01sudo S03bind9 S03hddtemp
S03irqbalance S03lwresd S03smartmontools S03sysstat S04corosync
S04openhpid S05rc.local S05stop-bootlogd
README S01rsyslog S03atd S03bootlogs S03iptables S03logd
S03mdadm S03ssh S03vpn S04cron S04rsync
S05rmnologin
... there are no remnants of OCFS2 nor o2cb in system boot init scripts;
I also grepped case-insensitive these scripts to check if any of them
called OCFS2 or o2cb, but none of them call it.
Nevertheless, I always get OK when I try to manually stop OCFS2 and an
error message when I try to manually stop o2cb :
root at Malastare:/etc/rc2.d# /etc/init.d/ocfs2 stop
Stopping Oracle Cluster File System (OCFS2) OK
root at Malastare:/etc/rc2.d# /etc/init.d/o2cb stop
/etc/init.d/o2cb: line 494: /proc/modules: No such file or directory
/etc/init.d/o2cb: line 494: /proc/modules: No such file or directory
/etc/init.d/o2cb: line 1108: /proc/modules: No such file or directory
/etc/init.d/o2cb: line 1108: /proc/modules: No such file or directory
Indeed, I have no directory named modules in /proc, but my system does
not seems to care about it, so could this be a bug causing o2cb to look
for a no more used procfs directory? If not, which package did I miss?
Thank you in advance.
Kind regards.
> Regards,
> Andreas
>
>> because dlm_controld.pcmk seems to be Pacemaker DLM dæmon, I strongly
>> thinks these messages are related. Strangely, even if I have this dæmon
>> executable in /usr/sbin, it's not loaded by Pacemaker :
>> root at Vindemiatrix:/home/david# ls /usr/sbin/dlm_controld.pcmk
>> /usr/sbin/dlm_controld.pcmk
>> root at Vindemiatrix:/home/david# ps fax | grep pcmk
>> 26360 pts/1 S+ 0:00 \_ grep pcmk
>>
>> But, if I understood correctly, such process should be launched by DLM
>> resource, and as I have no error messages concerning launching such a
>> process whereas its executable is present, do you know where this
>> problem could come from?
>>
>> Thank you in advance.
>>
>> Kind regards.
>>
>> PS: I'll have the next week off, so I won't be able to answer you
>> between this evening and the 2th of July.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120622/3f00e0fd/attachment-0004.sig>
More information about the Pacemaker
mailing list