[Pacemaker] "ERROR: Wrong stack o2cb" when trying to start o2cb service in Pacemaker cluster

David Guyot david.guyot at europecamions-interactive.com
Fri Jun 22 08:40:44 EDT 2012


Le 22/06/2012 11:58, Andreas Kurz a écrit :
> On 06/22/2012 11:14 AM, David Guyot wrote:
>> Hello.
>>
>> Concerning dlm-pcmk, it's not available from backports, so I installed
>> it from stable; only ocfs2-tools-pacemaker are available and installed
>> from it.
> thats ok
>
>> I checked if /etc/init.d/ocfs2 and /etc/init.d/o2cb are removed from
>> /etc/rcX.d/*, and they are, so the system cannot boot them up by itself.
> you also explicitely stopped them (on both nodes) or did you reboot the
> systems anyway?
Yes, I explicitly stopped them on both nodes and, to be sure, restarted
the system and then again explicitly stopped them, but without effect, I
always have :

Failed actions:
    p_o2cb:1_monitor_0 (node=Vindemiatrix, call=9, rc=5,
status=complete): not installed
    p_o2cb:0_monitor_0 (node=Malastare, call=9, rc=5, status=complete):
not installed
>
>> I also reconfigured DRBD resources using notify=true in each DRBD
>> master, then I reconfigured OCFS2 resources using these crm commands
>>
>> primitive p_controld ocf:pacemaker:controld
>> primitive p_o2cb ocf:ocfs2:o2cb
> interesting ... should be ocf:pacemaker:o2cb
In fact, this is an error in the guide I already noticed and corrected
to ocf:pacemaker:o2cb.
>
>> group g_ocfs2mgmt p_controld p_o2cb
>> clone cl_ocfs2mgmt g_ocfs2mgmt meta interleave=true
>>
> looks ok for testing o2cb, controld .. you will need colocation and
> order constraints later when starting the filesystem
>
>> root at Malastare:/home/david# crm configure show
>> node Malastare
>> node Vindemiatrix
>> primitive p_controld ocf:pacemaker:controld
>> primitive p_drbd_backupvi ocf:linbit:drbd \
>>     params drbd_resource="backupvi"
>> primitive p_drbd_pgsql ocf:linbit:drbd \
>>     params drbd_resource="postgresql"
>> primitive p_drbd_svn ocf:linbit:drbd \
>>     params drbd_resource="svn"
>> primitive p_drbd_www ocf:linbit:drbd \
>>     params drbd_resource="www"
>> primitive p_o2cb ocf:pacemaker:o2cb
>> primitive soapi-fencing-malastare stonith:external/ovh \
>>     params reversedns="ns208812.ovh.net"
>> primitive soapi-fencing-vindemiatrix stonith:external/ovh \
>>     params reversedns="ns235795.ovh.net"
>> group g_ocfs2mgmt p_controld p_o2cb
>> ms ms_drbd_backupvi p_drbd_backupvi \
>>     meta master-max="2" clone-max="2" notify="true"
>> ms ms_drbd_pgsql p_drbd_pgsql \
>>     meta master-max="2" clone-max="2" notify="true"
>> ms ms_drbd_svn p_drbd_svn \
>>     meta master-max="2" clone-max="2" notify="true"
>> ms ms_drbd_www p_drbd_www \
>>     meta master-max="2" clone-max="2" notify="true"
>> clone cl_ocfs2mgmt g_ocfs2mgmt \
>>     meta interleave="true"
>> location stonith-malastare soapi-fencing-malastare -inf: Malastare
>> location stonith-vindemiatrix soapi-fencing-vindemiatrix -inf: Vindemiatrix
>> property $id="cib-bootstrap-options" \
>>     dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>>     cluster-infrastructure="openais" \
>>     expected-quorum-votes="2"
>>
>> Unfortunately, the problem is still there :
>>
>> root at Malastare:/home/david# crm_mon --one-shot -VroA
>> ============
>> Last updated: Fri Jun 22 10:54:31 2012
>> Last change: Fri Jun 22 10:54:27 2012 via crm_shadow on Malastare
>> Stack: openais
>> Current DC: Malastare - partition with quorum
>> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> 2 Nodes configured, 2 expected votes
>> 14 Resources configured.
>> ============
>>
>> Online: [ Malastare Vindemiatrix ]
>>
>> Full list of resources:
>>
>>  soapi-fencing-malastare    (stonith:external/ovh):    Started Vindemiatrix
>>  soapi-fencing-vindemiatrix    (stonith:external/ovh):    Started Malastare
>>  Master/Slave Set: ms_drbd_pgsql [p_drbd_pgsql]
>>      Masters: [ Malastare Vindemiatrix ]
>>  Master/Slave Set: ms_drbd_svn [p_drbd_svn]
>>      Masters: [ Malastare Vindemiatrix ]
>>  Master/Slave Set: ms_drbd_www [p_drbd_www]
>>      Masters: [ Malastare Vindemiatrix ]
>>  Master/Slave Set: ms_drbd_backupvi [p_drbd_backupvi]
>>      Masters: [ Malastare Vindemiatrix ]
>>  Clone Set: cl_ocfs2mgmt [g_ocfs2mgmt]
>>      Stopped: [ g_ocfs2mgmt:0 g_ocfs2mgmt:1 ]
>>
>> Node Attributes:
>> * Node Malastare:
>>     + master-p_drbd_backupvi:0            : 10000    
>>     + master-p_drbd_pgsql:0               : 10000    
>>     + master-p_drbd_svn:0                 : 10000    
>>     + master-p_drbd_www:0                 : 10000    
>> * Node Vindemiatrix:
>>     + master-p_drbd_backupvi:1            : 10000    
>>     + master-p_drbd_pgsql:1               : 10000    
>>     + master-p_drbd_svn:1                 : 10000    
>>     + master-p_drbd_www:1                 : 10000    
>>
>> Operations:
>> * Node Vindemiatrix:
>>    soapi-fencing-malastare: migration-threshold=1000000
>>     + (4) start: rc=0 (ok)
>>    p_drbd_pgsql:1: migration-threshold=1000000
>>     + (5) probe: rc=8 (master)
>>    p_drbd_svn:1: migration-threshold=1000000
>>     + (6) probe: rc=8 (master)
>>    p_drbd_www:1: migration-threshold=1000000
>>     + (7) probe: rc=8 (master)
>>    p_drbd_backupvi:1: migration-threshold=1000000
>>     + (8) probe: rc=8 (master)
>>    p_o2cb:1: migration-threshold=1000000
>>     + (10) probe: rc=5 (not installed)
>> * Node Malastare:
>>    soapi-fencing-vindemiatrix: migration-threshold=1000000
>>     + (4) start: rc=0 (ok)
>>    p_drbd_pgsql:0: migration-threshold=1000000
>>     + (5) probe: rc=8 (master)
>>    p_drbd_svn:0: migration-threshold=1000000
>>     + (6) probe: rc=8 (master)
>>    p_drbd_www:0: migration-threshold=1000000
>>     + (7) probe: rc=8 (master)
>>    p_drbd_backupvi:0: migration-threshold=1000000
>>     + (8) probe: rc=8 (master)
>>    p_o2cb:0: migration-threshold=1000000
>>     + (10) probe: rc=5 (not installed)
>>
>> Failed actions:
>>     p_o2cb:1_monitor_0 (node=Vindemiatrix, call=10, rc=5,
>> status=complete): not installed
>>     p_o2cb:0_monitor_0 (node=Malastare, call=10, rc=5, status=complete):
>> not installed
>>
>> Nevertheless, I noticed a strange error message in Corosync/Pacemaker logs :
>> Jun 22 10:54:25 Vindemiatrix lrmd: [24580]: info: RA output:
>> (p_controld:1:probe:stderr) dlm_controld.pcmk: no process found
> this looks like the initial probing so there is no running controld is
> expected
>
>> This message was immediately followed by "Wrong stack" errors, and
> check the content of /sysfs/fs/ocfs2/loaded_cluster_plugins ... and if
> you have that configfile and it contains the value "user" this is a good
> sign you have started ocfs2/o2cb via init ;-)
Indeed, this file exists and contains "o2cb", but I stopped both ocfs2
and o2cb thrice, before and after reboot, and, as you see here :

root at Malastare:/etc/rc2.d# ls
K02drbd  S01fancontrol    S01sudo  S03bind9     S03hddtemp  
S03irqbalance  S03lwresd  S03smartmontools  S03sysstat  S04corosync   
S04openhpid  S05rc.local   S05stop-bootlogd
README     S01rsyslog    S03atd     S03bootlogs  S03iptables  S03logd   
  S03mdadm   S03ssh           S03vpn       S04cron    S04rsync    
S05rmnologin

... there are no remnants of OCFS2 nor o2cb in system boot init scripts;
I also grepped case-insensitive these scripts to check if any of them
called OCFS2 or o2cb, but none of them call it.

Nevertheless, I always get OK when I try to manually stop OCFS2 and an
error message when I try to manually stop o2cb :

root at Malastare:/etc/rc2.d# /etc/init.d/ocfs2 stop
Stopping Oracle Cluster File System (OCFS2) OK
root at Malastare:/etc/rc2.d# /etc/init.d/o2cb stop
/etc/init.d/o2cb: line 494: /proc/modules: No such file or directory
/etc/init.d/o2cb: line 494: /proc/modules: No such file or directory
/etc/init.d/o2cb: line 1108: /proc/modules: No such file or directory
/etc/init.d/o2cb: line 1108: /proc/modules: No such file or directory

Indeed, I have no directory named modules in /proc, but my system does
not seems to care about it, so could this be a bug causing o2cb to look
for a no more used procfs directory? If not, which package did I miss?

Thank you in advance.

Kind regards.
> Regards,
> Andreas
>
>> because dlm_controld.pcmk seems to be Pacemaker DLM dæmon, I strongly
>> thinks these messages are related. Strangely, even if I have this dæmon
>> executable in /usr/sbin, it's not loaded by Pacemaker :
>> root at Vindemiatrix:/home/david# ls /usr/sbin/dlm_controld.pcmk
>> /usr/sbin/dlm_controld.pcmk
>> root at Vindemiatrix:/home/david# ps fax | grep pcmk
>> 26360 pts/1    S+     0:00                          \_ grep pcmk
>>
>> But, if I understood correctly, such process should be launched by DLM
>> resource, and as I have no error messages concerning launching such a
>> process whereas its executable is present, do you know where this
>> problem could come from?
>>
>> Thank you in advance.
>>
>> Kind regards.
>>
>> PS: I'll have the next week off, so I won't be able to answer you
>> between this evening and the 2th of July.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120622/3f00e0fd/attachment-0003.sig>


More information about the Pacemaker mailing list