[Pacemaker] [Openais] Linux HA on debian sparc
Steven Dake
sdake at redhat.com
Fri Jun 3 16:37:10 UTC 2011
On 06/02/2011 08:16 PM, william felipe_welter wrote:
> Well,
>
> Now with this patch, the pacemakerd process starts and up his other
> process ( crmd, lrmd, pengine....) but after the process pacemakerd do
> a fork, the forked process pacemakerd dies due to "signal 10, Bus
> error".. And on the log, the process of pacemark ( crmd, lrmd,
> pengine....) cant connect to open ais plugin (possible because the
> "death" of the pacemakerd process).
> But this time when the forked pacemakerd dies, he generates a coredump.
>
> gdb -c "/usr/var/lib/heartbeat/cores/root/ pacemakerd 7986" -se
> /usr/sbin/pacemakerd :
> GNU gdb (GDB) 7.0.1-debian
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "sparc-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/sbin/pacemakerd...done.
> Reading symbols from /usr/lib64/libuuid.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libuuid.so.1
> Reading symbols from /usr/lib/libcoroipcc.so.4...done.
> Loaded symbols for /usr/lib/libcoroipcc.so.4
> Reading symbols from /usr/lib/libcpg.so.4...done.
> Loaded symbols for /usr/lib/libcpg.so.4
> Reading symbols from /usr/lib/libquorum.so.4...done.
> Loaded symbols for /usr/lib/libquorum.so.4
> Reading symbols from /usr/lib64/libcrmcommon.so.2...done.
> Loaded symbols for /usr/lib64/libcrmcommon.so.2
> Reading symbols from /usr/lib/libcfg.so.4...done.
> Loaded symbols for /usr/lib/libcfg.so.4
> Reading symbols from /usr/lib/libconfdb.so.4...done.
> Loaded symbols for /usr/lib/libconfdb.so.4
> Reading symbols from /usr/lib64/libplumb.so.2...done.
> Loaded symbols for /usr/lib64/libplumb.so.2
> Reading symbols from /usr/lib64/libpils.so.2...done.
> Loaded symbols for /usr/lib64/libpils.so.2
> Reading symbols from /lib/libbz2.so.1.0...(no debugging symbols found)...done.
> Loaded symbols for /lib/libbz2.so.1.0
> Reading symbols from /usr/lib/libxslt.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libxslt.so.1
> Reading symbols from /usr/lib/libxml2.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libxml2.so.2
> Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/librt.so.1
> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib/libdl.so.2
> Reading symbols from /lib/libglib-2.0.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libglib-2.0.so.0
> Reading symbols from /usr/lib/libltdl.so.7...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libltdl.so.7
> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib/ld-linux.so.2
> Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
> Loaded symbols for /lib/libpthread.so.0
> Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
> Loaded symbols for /usr/lib/libz.so.1
> Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done.
> Loaded symbols for /lib/libpcre.so.3
> Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libnss_compat.so.2
> Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/libnsl.so.1
> Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib/libnss_nis.so.2
> Reading symbols from /lib/libnss_files.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libnss_files.so.2
> Core was generated by `pacemakerd'.
> Program terminated with signal 10, Bus error.
> #0 cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339
> 339 switch (dispatch_data->id) {
> (gdb) bt
> #0 cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339
> #1 0xf6f100f0 in ?? ()
> #2 0xf6f100f4 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>
>
>
> I take a look at the cpg.c and see that the dispatch_data was aquired
> by coroipcc_dispatch_get (that was defined on lib/coroipcc.c)
> function:
>
> do {
> error = coroipcc_dispatch_get (
> cpg_inst->handle,
> (void **)&dispatch_data,
> timeout);
>
>
>
Try the recent patch sent to fix alignment.
Regards
-steve
>
> Resumed log:
> ...
> un 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering f to 10
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 10
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including f
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 10
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
> Forked child 7991 for process lrmd
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
> update_node_processes: Node xxxxxxxxxx now has process list:
> 00000000000000000000000000100112 (was
> 00000000000000000000000000100102)
> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering 10 to 11
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 11
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 11
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
> Forked child 7992 for process attrd
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
> update_node_processes: Node xxxxxxxxxx now has process list:
> 00000000000000000000000000101112 (was
> 00000000000000000000000000100112)
> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering 11 to 12
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 12
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 12
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
> Forked child 7993 for process pengine
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
> update_node_processes: Node xxxxxxxxxx now has process list:
> 00000000000000000000000000111112 (was
> 00000000000000000000000000101112)
> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering 12 to 13
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 13
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 13
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
> Forked child 7994 for process crmd
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
> update_node_processes: Node xxxxxxxxxx now has process list:
> 00000000000000000000000000111312 (was
> 00000000000000000000000000111112)
> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: main: Starting mainloop
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering 13 to 14
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 14
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 14
> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering 14 to 15
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 15
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 15
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: Invoked:
> /usr/lib64/heartbeat/stonithd
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
> crm_log_init_worker: Changed active directory to
> /usr/var/lib/heartbeat/cores/root
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: get_cluster_type:
> Cluster type is: 'openais'.
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
> crm_cluster_connect: Connecting to cluster infrastructure: classic
> openais (with plugin)
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
> init_ais_connection_classic: Creating connection to our Corosync
> plugin
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_log_init_worker:
> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: retrieveCib: Reading
> cluster configuration from: /usr/var/lib/heartbeat/crm/cib.xml
> (digest: /usr/var/lib/heartbeat/crm/cib.xml.sig)
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: retrieveCib: Cluster
> configuration not found: /usr/var/lib/heartbeat/crm/cib.xml
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile: Primary
> configuration corrupt or unusable, trying backup...
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: get_last_sequence:
> Series file /usr/var/lib/heartbeat/crm/cib.last does not exist
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile: Backup
> file /usr/var/lib/heartbeat/crm/cib-99.raw not found
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile:
> Continuing with an empty configuration.
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
> <cib epoch="0" num_updates="0" admin_epoch="0"
> validate-with="pacemaker-1.2" >
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
> <configuration >
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
> <crm_config />
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
> <nodes />
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
> <resources />
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
> <constraints />
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
> </configuration>
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
> <status />
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] </cib>
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: validate_with_relaxng:
> Creating RNG parser context
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
> Doesn't exist (12)
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: CRIT: main: Cannot sign
> in to the cluster... terminating
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: Invoked:
> /usr/lib64/heartbeat/crmd
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: Invoked:
> /usr/lib64/heartbeat/pengine
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crm_log_init_worker:
> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: crm_log_init_worker:
> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: main: CRM Hg Version:
> e872eeb39a5f6e1fdb57c3108551a5353648c4f4
>
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Checking for
> old instances of pengine
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug:
> init_client_ipc_comms_nodispatch: Attempting to talk on:
> /usr/var/run/crm/pengine
> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: enabling coredumps
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crmd_init: Starting crmd
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug:
> init_client_ipc_comms_nodispatch: Could not init comms on:
> /usr/var/run/crm/pengine
> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: debug: main: run the loop...
> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: Started.
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Init server comms
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: s_crmd_fsa: Processing
> I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ]
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
> actions:trace: // A_LOG
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
> actions:trace: // A_STARTUP
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: main: Starting pengine
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup:
> Registering Signal Handlers
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup: Creating
> CIB and LRM objects
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
> actions:trace: // A_CIB_START
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Attempting to talk on:
> /usr/var/run/crm/cib_rw
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Could not init comms on:
> /usr/var/run/crm/cib_rw
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
> Connection to command channel failed
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Attempting to talk on:
> /usr/var/run/crm/cib_callback
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Could not init comms on:
> /usr/var/run/crm/cib_callback
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
> Connection to callback channel failed
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
> Connection to CIB failed: connection failed
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signoff:
> Signing out of the CIB Service
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: activateCibXml:
> Triggering CIB write for start op
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: startCib: CIB
> Initialization completed successfully
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: get_cluster_type:
> Cluster type is: 'openais'.
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_cluster_connect:
> Connecting to cluster infrastructure: classic openais (with plugin)
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info:
> init_ais_connection_classic: Creating connection to our Corosync
> plugin
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info:
> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
> Doesn't exist (12)
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: CRIT: cib_init: Cannot sign in
> to the cluster... terminating
> Jun 02 23:12:21 corosync [CPG ] exit_fn for conn=0x62500
> Jun 02 23:12:21 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:21 corosync [TOTEM ] Delivering 15 to 16
> Jun 02 23:12:21 corosync [TOTEM ] Delivering MCAST message with seq 16
> to pending delivery queue
> Jun 02 23:12:21 corosync [CPG ] got procleave message from cluster
> node 1377289226
> Jun 02 23:12:21 corosync [TOTEM ] releasing messages up to and including 16
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: Invoked:
> /usr/lib64/heartbeat/attrd
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_log_init_worker:
> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Starting up
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: get_cluster_type:
> Cluster type is: 'openais'.
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_cluster_connect:
> Connecting to cluster infrastructure: classic openais (with plugin)
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info:
> init_ais_connection_classic: Creating connection to our Corosync
> plugin
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info:
> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
> Doesn't exist (12)
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: HA Signon failed
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Cluster connection active
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Accepting
> attribute updates
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: Aborting startup
> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Attempting to talk on:
> /usr/var/run/crm/cib_rw
> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Could not init comms on:
> /usr/var/run/crm/cib_rw
> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
> Connection to command channel failed
> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Attempting to talk on:
> /usr/var/run/crm/cib_callback
> ...
>
>
> 2011/6/2 Steven Dake <sdake at redhat.com>:
>> On 06/01/2011 11:05 PM, william felipe_welter wrote:
>>> I recompile my kernel without hugetlb .. and the result are the same..
>>>
>>> My test program still resulting:
>>> PATH=/dev/shm/teste123XXXXXX
>>> page size=20000
>>> fd=3
>>> ADDR_ORIG:0xe000a000 ADDR:0xffffffff
>>> Erro
>>>
>>> And Pacemaker still resulting because the mmap error:
>>> Could not initialize Cluster Configuration Database API instance error 2
>>>
>>
>> Give the patch I posted recently a spin - corosync WFM with this patch
>> on sparc64 with hugetlb set. Please report back results.
>>
>> Regards
>> -steve
>>
>>> For make sure that i have disable the hugetlb there is my /proc/meminfo:
>>> MemTotal: 33093488 kB
>>> MemFree: 32855616 kB
>>> Buffers: 5600 kB
>>> Cached: 53480 kB
>>> SwapCached: 0 kB
>>> Active: 45768 kB
>>> Inactive: 28104 kB
>>> Active(anon): 18024 kB
>>> Inactive(anon): 1560 kB
>>> Active(file): 27744 kB
>>> Inactive(file): 26544 kB
>>> Unevictable: 0 kB
>>> Mlocked: 0 kB
>>> SwapTotal: 6104680 kB
>>> SwapFree: 6104680 kB
>>> Dirty: 0 kB
>>> Writeback: 0 kB
>>> AnonPages: 14936 kB
>>> Mapped: 7736 kB
>>> Shmem: 4624 kB
>>> Slab: 39184 kB
>>> SReclaimable: 10088 kB
>>> SUnreclaim: 29096 kB
>>> KernelStack: 7088 kB
>>> PageTables: 1160 kB
>>> Quicklists: 17664 kB
>>> NFS_Unstable: 0 kB
>>> Bounce: 0 kB
>>> WritebackTmp: 0 kB
>>> CommitLimit: 22651424 kB
>>> Committed_AS: 519368 kB
>>> VmallocTotal: 1069547520 kB
>>> VmallocUsed: 11064 kB
>>> VmallocChunk: 1069529616 kB
>>>
>>>
>>> 2011/6/1 Steven Dake <sdake at redhat.com>:
>>>> On 06/01/2011 07:42 AM, william felipe_welter wrote:
>>>>> Steven,
>>>>>
>>>>> cat /proc/meminfo
>>>>> ...
>>>>> HugePages_Total: 0
>>>>> HugePages_Free: 0
>>>>> HugePages_Rsvd: 0
>>>>> HugePages_Surp: 0
>>>>> Hugepagesize: 4096 kB
>>>>> ...
>>>>>
>>>>
>>>> It definitely requires a kernel compile and setting the config option to
>>>> off. I don't know the debian way of doing this.
>>>>
>>>> The only reason you may need this option is if you have very large
>>>> memory sizes, such as 48GB or more.
>>>>
>>>> Regards
>>>> -steve
>>>>
>>>>> Its 4MB..
>>>>>
>>>>> How can i disable hugetlb ? ( passing CONFIG_HUGETLBFS=n at boot to
>>>>> kernel ?)
>>>>>
>>>>> 2011/6/1 Steven Dake <sdake at redhat.com <mailto:sdake at redhat.com>>
>>>>>
>>>>> On 06/01/2011 01:05 AM, Steven Dake wrote:
>>>>> > On 05/31/2011 09:44 PM, Angus Salkeld wrote:
>>>>> >> On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter
>>>>> wrote:
>>>>> >>> Angus,
>>>>> >>>
>>>>> >>> I make some test program (based on the code coreipcc.c) and i
>>>>> now i sure
>>>>> >>> that are problems with the mmap systems call on sparc..
>>>>> >>>
>>>>> >>> Source code of my test program:
>>>>> >>>
>>>>> >>> #include <stdlib.h>
>>>>> >>> #include <sys/mman.h>
>>>>> >>> #include <stdio.h>
>>>>> >>>
>>>>> >>> #define PATH_MAX 36
>>>>> >>>
>>>>> >>> int main()
>>>>> >>> {
>>>>> >>>
>>>>> >>> int32_t fd;
>>>>> >>> void *addr_orig;
>>>>> >>> void *addr;
>>>>> >>> char path[PATH_MAX];
>>>>> >>> const char *file = "teste123XXXXXX";
>>>>> >>> size_t bytes=10024;
>>>>> >>>
>>>>> >>> snprintf (path, PATH_MAX, "/dev/shm/%s", file);
>>>>> >>> printf("PATH=%s\n",path);
>>>>> >>>
>>>>> >>> fd = mkstemp (path);
>>>>> >>> printf("fd=%d \n",fd);
>>>>> >>>
>>>>> >>>
>>>>> >>> addr_orig = mmap (NULL, bytes, PROT_NONE,
>>>>> >>> MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>>>>> >>>
>>>>> >>>
>>>>> >>> addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE,
>>>>> >>> MAP_FIXED | MAP_SHARED, fd, 0);
>>>>> >>>
>>>>> >>> printf("ADDR_ORIG:%p ADDR:%p\n",addr_orig,addr);
>>>>> >>>
>>>>> >>>
>>>>> >>> if (addr != addr_orig) {
>>>>> >>> printf("Erro");
>>>>> >>> }
>>>>> >>> }
>>>>> >>>
>>>>> >>> Results on x86:
>>>>> >>> PATH=/dev/shm/teste123XXXXXX
>>>>> >>> fd=3
>>>>> >>> ADDR_ORIG:0x7f867d8e6000 ADDR:0x7f867d8e6000
>>>>> >>>
>>>>> >>> Results on sparc:
>>>>> >>> PATH=/dev/shm/teste123XXXXXX
>>>>> >>> fd=3
>>>>> >>> ADDR_ORIG:0xf7f72000 ADDR:0xffffffff
>>>>> >>
>>>>> >> Note: 0xffffffff == MAP_FAILED
>>>>> >>
>>>>> >> (from man mmap)
>>>>> >> RETURN VALUE
>>>>> >> On success, mmap() returns a pointer to the mapped area. On
>>>>> >> error, the value MAP_FAILED (that is, (void *) -1) is
>>>>> returned,
>>>>> >> and errno is set appropriately.
>>>>> >>
>>>>> >>>
>>>>> >>>
>>>>> >>> But im wondering if is really needed to call mmap 2 times ?
>>>>> What are the
>>>>> >>> reason to call the mmap 2 times, on the second time using the
>>>>> address of the
>>>>> >>> first?
>>>>> >>>
>>>>> >>>
>>>>> >> Well there are 3 calls to mmap()
>>>>> >> 1) one to allocate 2 * what you need (in pages)
>>>>> >> 2) maps the first half of the mem to a real file
>>>>> >> 3) maps the second half of the mem to the same file
>>>>> >>
>>>>> >> The point is when you write to an address over the end of the
>>>>> >> first half of memory it is taken care of the the third mmap which
>>>>> maps
>>>>> >> the address back to the top of the file for you. This means you
>>>>> >> don't have to worry about ringbuffer wrapping which can be a
>>>>> headache.
>>>>> >>
>>>>> >> -Angus
>>>>> >>
>>>>> >
>>>>> > interesting this mmap operation doesn't work on sparc linux.
>>>>> >
>>>>> > Not sure how I can help here - Next step would be a follow up with the
>>>>> > sparc linux mailing list. I'll do that and cc you on the message
>>>>> - see
>>>>> > if we get any response.
>>>>> >
>>>>> > http://vger.kernel.org/vger-lists.html
>>>>> >
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> 2011/5/31 Angus Salkeld <asalkeld at redhat.com
>>>>> <mailto:asalkeld at redhat.com>>
>>>>> >>>
>>>>> >>>> On Tue, May 31, 2011 at 06:25:56PM -0300, william felipe_welter
>>>>> wrote:
>>>>> >>>>> Thanks Steven,
>>>>> >>>>>
>>>>> >>>>> Now im try to run on the MCP:
>>>>> >>>>> - Uninstall the pacemaker 1.0
>>>>> >>>>> - Compile and install 1.1
>>>>> >>>>>
>>>>> >>>>> But now i have problems to initialize the pacemakerd: Could not
>>>>> >>>> initialize
>>>>> >>>>> Cluster Configuration Database API instance error 2
>>>>> >>>>> Debbuging with gdb i see that the error are on the confdb.. most
>>>>> >>>> specificaly
>>>>> >>>>> the errors start on coreipcc.c at line:
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> 448 if (addr != addr_orig) {
>>>>> >>>>> 449 goto error_close_unlink; <- enter here
>>>>> >>>>> 450 }
>>>>> >>>>>
>>>>> >>>>> Some ideia about what can cause this ?
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>> I tried porting a ringbuffer (www.libqb.org
>>>>> <http://www.libqb.org>) to sparc and had the same
>>>>> >>>> failure.
>>>>> >>>> There are 3 mmap() calls and on sparc the third one keeps failing.
>>>>> >>>>
>>>>> >>>> This is a common way of creating a ring buffer, see:
>>>>> >>>>
>>>>> http://en.wikipedia.org/wiki/Circular_buffer#Exemplary_POSIX_Implementation
>>>>> >>>>
>>>>> >>>> I couldn't get it working in the short time I tried. It's probably
>>>>> >>>> worth looking at the clib implementation to see why it's failing
>>>>> >>>> (I didn't get to that).
>>>>> >>>>
>>>>> >>>> -Angus
>>>>> >>>>
>>>>>
>>>>> Note, we sorted this out we believe. Your kernel has hugetlb enabled,
>>>>> probably with 4MB pages. This requires corosync to allocate 4MB pages.
>>>>>
>>>>> Can you verify your hugetlb settings?
>>>>>
>>>>> If you can turn this option off, you should have atleast a working
>>>>> corosync.
>>>>>
>>>>> Regards
>>>>> -steve
>>>>> >>>>
>>>>> >>>> _______________________________________________
>>>>> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> <mailto:Pacemaker at oss.clusterlabs.org>
>>>>> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> >>>>
>>>>> >>>> Project Home: http://www.clusterlabs.org
>>>>> >>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> >>>> Bugs:
>>>>> >>>>
>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> William Felipe Welter
>>>>> >>> ------------------------------
>>>>> >>> Consultor em Tecnologias Livres
>>>>> >>> william.welter at 4linux.com.br <mailto:william.welter at 4linux.com.br>
>>>>> >>> www.4linux.com.br <http://www.4linux.com.br>
>>>>> >>
>>>>> >>> _______________________________________________
>>>>> >>> Openais mailing list
>>>>> >>> Openais at lists.linux-foundation.org
>>>>> <mailto:Openais at lists.linux-foundation.org>
>>>>> >>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>> >>
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> <mailto:Pacemaker at oss.clusterlabs.org>
>>>>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> >>
>>>>> >> Project Home: http://www.clusterlabs.org
>>>>> >> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> >> Bugs:
>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>> >
>>>>> > _______________________________________________
>>>>> > Openais mailing list
>>>>> > Openais at lists.linux-foundation.org
>>>>> <mailto:Openais at lists.linux-foundation.org>
>>>>> > https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> <mailto:Pacemaker at oss.clusterlabs.org>
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs:
>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> William Felipe Welter
>>>>> ------------------------------
>>>>> Consultor em Tecnologias Livres
>>>>> william.welter at 4linux.com.br <mailto:william.welter at 4linux.com.br>
>>>>> www.4linux.com.br <http://www.4linux.com.br>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
>
More information about the Pacemaker
mailing list