[Pacemaker] [Openais] Linux HA on debian sparc

Tue Jun 7 13:08:43 UTC 2011

On 06/07/2011 04:44 AM, william felipe_welter wrote:
> More two questions.. The patch for mmap calls will be on the mainly
> development for all archs ?
> Any problems if i send this patch's for Debian project ?
> 

These patches will go into the maintenance branches

You can send them to whoever you like ;)

Regards
-steve

> 2011/6/3 Steven Dake <sdake at redhat.com>:
>> On 06/02/2011 08:16 PM, william felipe_welter wrote:
>>> Well,
>>>
>>> Now with this patch, the pacemakerd process starts and up his other
>>> process ( crmd, lrmd, pengine....) but after the process pacemakerd do
>>> a fork, the forked  process pacemakerd dies due to "signal 10, Bus
>>> error".. And  on the log, the process of pacemark ( crmd, lrmd,
>>> pengine....) cant connect to open ais plugin (possible because the
>>> "death" of the pacemakerd process).
>>> But this time when the forked pacemakerd dies, he generates a coredump.
>>>
>>> gdb  -c "/usr/var/lib/heartbeat/cores/root/ pacemakerd 7986"  -se
>>> /usr/sbin/pacemakerd :
>>> GNU gdb (GDB) 7.0.1-debian
>>> Copyright (C) 2009 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "sparc-linux-gnu".
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>...
>>> Reading symbols from /usr/sbin/pacemakerd...done.
>>> Reading symbols from /usr/lib64/libuuid.so.1...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib64/libuuid.so.1
>>> Reading symbols from /usr/lib/libcoroipcc.so.4...done.
>>> Loaded symbols for /usr/lib/libcoroipcc.so.4
>>> Reading symbols from /usr/lib/libcpg.so.4...done.
>>> Loaded symbols for /usr/lib/libcpg.so.4
>>> Reading symbols from /usr/lib/libquorum.so.4...done.
>>> Loaded symbols for /usr/lib/libquorum.so.4
>>> Reading symbols from /usr/lib64/libcrmcommon.so.2...done.
>>> Loaded symbols for /usr/lib64/libcrmcommon.so.2
>>> Reading symbols from /usr/lib/libcfg.so.4...done.
>>> Loaded symbols for /usr/lib/libcfg.so.4
>>> Reading symbols from /usr/lib/libconfdb.so.4...done.
>>> Loaded symbols for /usr/lib/libconfdb.so.4
>>> Reading symbols from /usr/lib64/libplumb.so.2...done.
>>> Loaded symbols for /usr/lib64/libplumb.so.2
>>> Reading symbols from /usr/lib64/libpils.so.2...done.
>>> Loaded symbols for /usr/lib64/libpils.so.2
>>> Reading symbols from /lib/libbz2.so.1.0...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libbz2.so.1.0
>>> Reading symbols from /usr/lib/libxslt.so.1...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib/libxslt.so.1
>>> Reading symbols from /usr/lib/libxml2.so.2...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib/libxml2.so.2
>>> Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libc.so.6
>>> Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/librt.so.1
>>> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libdl.so.2
>>> Reading symbols from /lib/libglib-2.0.so.0...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /lib/libglib-2.0.so.0
>>> Reading symbols from /usr/lib/libltdl.so.7...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib/libltdl.so.7
>>> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/ld-linux.so.2
>>> Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libpthread.so.0
>>> Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libm.so.6
>>> Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
>>> Loaded symbols for /usr/lib/libz.so.1
>>> Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libpcre.so.3
>>> Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /lib/libnss_compat.so.2
>>> Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libnsl.so.1
>>> Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libnss_nis.so.2
>>> Reading symbols from /lib/libnss_files.so.2...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /lib/libnss_files.so.2
>>> Core was generated by `pacemakerd'.
>>> Program terminated with signal 10, Bus error.
>>> #0  cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339
>>> 339                   switch (dispatch_data->id) {
>>> (gdb) bt
>>> #0  cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339
>>> #1  0xf6f100f0 in ?? ()
>>> #2  0xf6f100f4 in ?? ()
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>
>>>
>>>
>>> I take a look at the cpg.c and see that the dispatch_data was aquired
>>> by coroipcc_dispatch_get (that was defined on lib/coroipcc.c)
>>> function:
>>>
>>>        do {
>>>                 error = coroipcc_dispatch_get (
>>>                         cpg_inst->handle,
>>>                         (void **)&dispatch_data,
>>>                         timeout);
>>>
>>>
>>>
>>
>> Try the recent patch sent to fix alignment.
>>
>> Regards
>> -steve
>>
>>>
>>> Resumed log:
>>> ...
>>> un 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering f to 10
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 10
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including f
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 10
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>>> Forked child 7991 for process lrmd
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>>> update_node_processes: Node xxxxxxxxxx now has process list:
>>> 00000000000000000000000000100112 (was
>>> 00000000000000000000000000100102)
>>> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 10 to 11
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 11
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 11
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>>> Forked child 7992 for process attrd
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>>> update_node_processes: Node xxxxxxxxxx now has process list:
>>> 00000000000000000000000000101112 (was
>>> 00000000000000000000000000100112)
>>> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 11 to 12
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 12
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 12
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>>> Forked child 7993 for process pengine
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>>> update_node_processes: Node xxxxxxxxxx now has process list:
>>> 00000000000000000000000000111112 (was
>>> 00000000000000000000000000101112)
>>> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 12 to 13
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 13
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 13
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>>> Forked child 7994 for process crmd
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>>> update_node_processes: Node xxxxxxxxxx now has process list:
>>> 00000000000000000000000000111312 (was
>>> 00000000000000000000000000111112)
>>> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: main: Starting mainloop
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 13 to 14
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 14
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 14
>>> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 14 to 15
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 15
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 15
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: Invoked:
>>> /usr/lib64/heartbeat/stonithd
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>>> crm_log_init_worker: Changed active directory to
>>> /usr/var/lib/heartbeat/cores/root
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: get_cluster_type:
>>> Cluster type is: 'openais'.
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>>> crm_cluster_connect: Connecting to cluster infrastructure: classic
>>> openais (with plugin)
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>>> init_ais_connection_classic: Creating connection to our Corosync
>>> plugin
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_log_init_worker:
>>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: retrieveCib: Reading
>>> cluster configuration from: /usr/var/lib/heartbeat/crm/cib.xml
>>> (digest: /usr/var/lib/heartbeat/crm/cib.xml.sig)
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: retrieveCib: Cluster
>>> configuration not found: /usr/var/lib/heartbeat/crm/cib.xml
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile: Primary
>>> configuration corrupt or unusable, trying backup...
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: get_last_sequence:
>>> Series file /usr/var/lib/heartbeat/crm/cib.last does not exist
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile: Backup
>>> file /usr/var/lib/heartbeat/crm/cib-99.raw not found
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile:
>>> Continuing with an empty configuration.
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>> <cib epoch="0" num_updates="0" admin_epoch="0"
>>> validate-with="pacemaker-1.2" >
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>>   <configuration >
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>>     <crm_config />
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>>     <nodes />
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>>     <resources />
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>>     <constraints />
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>>   </configuration>
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>>   <status />
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] </cib>
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: validate_with_relaxng:
>>> Creating RNG parser context
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>>> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
>>> Doesn't exist (12)
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: CRIT: main: Cannot sign
>>> in to the cluster... terminating
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: Invoked:
>>> /usr/lib64/heartbeat/crmd
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: Invoked:
>>> /usr/lib64/heartbeat/pengine
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crm_log_init_worker:
>>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: crm_log_init_worker:
>>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: main: CRM Hg Version:
>>> e872eeb39a5f6e1fdb57c3108551a5353648c4f4
>>>
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Checking for
>>> old instances of pengine
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug:
>>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>>> /usr/var/run/crm/pengine
>>> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: enabling coredumps
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crmd_init: Starting crmd
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug:
>>> init_client_ipc_comms_nodispatch: Could not init comms on:
>>> /usr/var/run/crm/pengine
>>> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: debug: main: run the loop...
>>> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: Started.
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Init server comms
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: s_crmd_fsa: Processing
>>> I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ]
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
>>> actions:trace:        // A_LOG
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
>>> actions:trace:        // A_STARTUP
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: main: Starting pengine
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup:
>>> Registering Signal Handlers
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup: Creating
>>> CIB and LRM objects
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
>>> actions:trace:        // A_CIB_START
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>>> /usr/var/run/crm/cib_rw
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Could not init comms on:
>>> /usr/var/run/crm/cib_rw
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>>> Connection to command channel failed
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>>> /usr/var/run/crm/cib_callback
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Could not init comms on:
>>> /usr/var/run/crm/cib_callback
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>>> Connection to callback channel failed
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>>> Connection to CIB failed: connection failed
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signoff:
>>> Signing out of the CIB Service
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: activateCibXml:
>>> Triggering CIB write for start op
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: startCib: CIB
>>> Initialization completed successfully
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: get_cluster_type:
>>> Cluster type is: 'openais'.
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_cluster_connect:
>>> Connecting to cluster infrastructure: classic openais (with plugin)
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info:
>>> init_ais_connection_classic: Creating connection to our Corosync
>>> plugin
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info:
>>> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
>>> Doesn't exist (12)
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: CRIT: cib_init: Cannot sign in
>>> to the cluster... terminating
>>> Jun 02 23:12:21 corosync [CPG   ] exit_fn for conn=0x62500
>>> Jun 02 23:12:21 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:21 corosync [TOTEM ] Delivering 15 to 16
>>> Jun 02 23:12:21 corosync [TOTEM ] Delivering MCAST message with seq 16
>>> to pending delivery queue
>>> Jun 02 23:12:21 corosync [CPG   ] got procleave message from cluster
>>> node 1377289226
>>> Jun 02 23:12:21 corosync [TOTEM ] releasing messages up to and including 16
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: Invoked:
>>> /usr/lib64/heartbeat/attrd
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_log_init_worker:
>>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Starting up
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: get_cluster_type:
>>> Cluster type is: 'openais'.
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_cluster_connect:
>>> Connecting to cluster infrastructure: classic openais (with plugin)
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info:
>>> init_ais_connection_classic: Creating connection to our Corosync
>>> plugin
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info:
>>> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
>>> Doesn't exist (12)
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: HA Signon failed
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Cluster connection active
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Accepting
>>> attribute updates
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: Aborting startup
>>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>>> /usr/var/run/crm/cib_rw
>>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Could not init comms on:
>>> /usr/var/run/crm/cib_rw
>>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>>> Connection to command channel failed
>>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>>> /usr/var/run/crm/cib_callback
>>> ...
>>>
>>>
>>> 2011/6/2 Steven Dake <sdake at redhat.com>:
>>>> On 06/01/2011 11:05 PM, william felipe_welter wrote:
>>>>> I recompile my kernel without hugetlb .. and the result are the same..
>>>>>
>>>>> My test program still resulting:
>>>>> PATH=/dev/shm/teste123XXXXXX
>>>>> page size=20000
>>>>> fd=3
>>>>> ADDR_ORIG:0xe000a000  ADDR:0xffffffff
>>>>> Erro
>>>>>
>>>>> And Pacemaker still resulting because the mmap error:
>>>>> Could not initialize Cluster Configuration Database API instance error 2
>>>>>
>>>>
>>>> Give the patch I posted recently a spin - corosync WFM with this patch
>>>> on sparc64 with hugetlb set.  Please report back results.
>>>>
>>>> Regards
>>>> -steve
>>>>
>>>>> For make sure that i have disable the hugetlb there is my /proc/meminfo:
>>>>> MemTotal:       33093488 kB
>>>>> MemFree:        32855616 kB
>>>>> Buffers:            5600 kB
>>>>> Cached:            53480 kB
>>>>> SwapCached:            0 kB
>>>>> Active:            45768 kB
>>>>> Inactive:          28104 kB
>>>>> Active(anon):      18024 kB
>>>>> Inactive(anon):     1560 kB
>>>>> Active(file):      27744 kB
>>>>> Inactive(file):    26544 kB
>>>>> Unevictable:           0 kB
>>>>> Mlocked:               0 kB
>>>>> SwapTotal:       6104680 kB
>>>>> SwapFree:        6104680 kB
>>>>> Dirty:                 0 kB
>>>>> Writeback:             0 kB
>>>>> AnonPages:         14936 kB
>>>>> Mapped:             7736 kB
>>>>> Shmem:              4624 kB
>>>>> Slab:              39184 kB
>>>>> SReclaimable:      10088 kB
>>>>> SUnreclaim:        29096 kB
>>>>> KernelStack:        7088 kB
>>>>> PageTables:         1160 kB
>>>>> Quicklists:        17664 kB
>>>>> NFS_Unstable:          0 kB
>>>>> Bounce:                0 kB
>>>>> WritebackTmp:          0 kB
>>>>> CommitLimit:    22651424 kB
>>>>> Committed_AS:     519368 kB
>>>>> VmallocTotal:   1069547520 kB
>>>>> VmallocUsed:       11064 kB
>>>>> VmallocChunk:   1069529616 kB
>>>>>
>>>>>
>>>>> 2011/6/1 Steven Dake <sdake at redhat.com>:
>>>>>> On 06/01/2011 07:42 AM, william felipe_welter wrote:
>>>>>>> Steven,
>>>>>>>
>>>>>>> cat /proc/meminfo
>>>>>>> ...
>>>>>>> HugePages_Total:       0
>>>>>>> HugePages_Free:        0
>>>>>>> HugePages_Rsvd:        0
>>>>>>> HugePages_Surp:        0
>>>>>>> Hugepagesize:       4096 kB
>>>>>>> ...
>>>>>>>
>>>>>>
>>>>>> It definitely requires a kernel compile and setting the config option to
>>>>>> off.  I don't know the debian way of doing this.
>>>>>>
>>>>>> The only reason you may need this option is if you have very large
>>>>>> memory sizes, such as 48GB or more.
>>>>>>
>>>>>> Regards
>>>>>> -steve
>>>>>>
>>>>>>> Its 4MB..
>>>>>>>
>>>>>>> How can i disable hugetlb ? ( passing CONFIG_HUGETLBFS=n at boot to
>>>>>>> kernel ?)
>>>>>>>
>>>>>>> 2011/6/1 Steven Dake <sdake at redhat.com <mailto:sdake at redhat.com>>
>>>>>>>
>>>>>>>     On 06/01/2011 01:05 AM, Steven Dake wrote:
>>>>>>>     > On 05/31/2011 09:44 PM, Angus Salkeld wrote:
>>>>>>>     >> On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter
>>>>>>>     wrote:
>>>>>>>     >>> Angus,
>>>>>>>     >>>
>>>>>>>     >>> I make some test program (based on the code coreipcc.c) and i
>>>>>>>     now i sure
>>>>>>>     >>> that are problems with the mmap systems call on sparc..
>>>>>>>     >>>
>>>>>>>     >>> Source code of my test program:
>>>>>>>     >>>
>>>>>>>     >>> #include <stdlib.h>
>>>>>>>     >>> #include <sys/mman.h>
>>>>>>>     >>> #include <stdio.h>
>>>>>>>     >>>
>>>>>>>     >>> #define PATH_MAX  36
>>>>>>>     >>>
>>>>>>>     >>> int main()
>>>>>>>     >>> {
>>>>>>>     >>>
>>>>>>>     >>> int32_t fd;
>>>>>>>     >>> void *addr_orig;
>>>>>>>     >>> void *addr;
>>>>>>>     >>> char path[PATH_MAX];
>>>>>>>     >>> const char *file = "teste123XXXXXX";
>>>>>>>     >>> size_t bytes=10024;
>>>>>>>     >>>
>>>>>>>     >>> snprintf (path, PATH_MAX, "/dev/shm/%s", file);
>>>>>>>     >>> printf("PATH=%s\n",path);
>>>>>>>     >>>
>>>>>>>     >>> fd = mkstemp (path);
>>>>>>>     >>> printf("fd=%d \n",fd);
>>>>>>>     >>>
>>>>>>>     >>>
>>>>>>>     >>> addr_orig = mmap (NULL, bytes, PROT_NONE,
>>>>>>>     >>>               MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>>>>>>>     >>>
>>>>>>>     >>>
>>>>>>>     >>> addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE,
>>>>>>>     >>>               MAP_FIXED | MAP_SHARED, fd, 0);
>>>>>>>     >>>
>>>>>>>     >>> printf("ADDR_ORIG:%p  ADDR:%p\n",addr_orig,addr);
>>>>>>>     >>>
>>>>>>>     >>>
>>>>>>>     >>>   if (addr != addr_orig) {
>>>>>>>     >>>                printf("Erro");
>>>>>>>     >>>         }
>>>>>>>     >>> }
>>>>>>>     >>>
>>>>>>>     >>> Results on x86:
>>>>>>>     >>> PATH=/dev/shm/teste123XXXXXX
>>>>>>>     >>> fd=3
>>>>>>>     >>> ADDR_ORIG:0x7f867d8e6000  ADDR:0x7f867d8e6000
>>>>>>>     >>>
>>>>>>>     >>> Results on sparc:
>>>>>>>     >>> PATH=/dev/shm/teste123XXXXXX
>>>>>>>     >>> fd=3
>>>>>>>     >>> ADDR_ORIG:0xf7f72000  ADDR:0xffffffff
>>>>>>>     >>
>>>>>>>     >> Note: 0xffffffff == MAP_FAILED
>>>>>>>     >>
>>>>>>>     >> (from man mmap)
>>>>>>>     >> RETURN VALUE
>>>>>>>     >>        On success, mmap() returns a pointer to the mapped area.  On
>>>>>>>     >>        error, the value MAP_FAILED (that is, (void *) -1) is
>>>>>>>     returned,
>>>>>>>     >>        and errno is  set appropriately.
>>>>>>>     >>
>>>>>>>     >>>
>>>>>>>     >>>
>>>>>>>     >>> But im wondering if is really needed to call mmap 2 times ?
>>>>>>>      What are the
>>>>>>>     >>> reason to call the mmap 2 times, on the second time using the
>>>>>>>     address of the
>>>>>>>     >>> first?
>>>>>>>     >>>
>>>>>>>     >>>
>>>>>>>     >> Well there are 3 calls to mmap()
>>>>>>>     >> 1) one to allocate 2 * what you need (in pages)
>>>>>>>     >> 2) maps the first half of the mem to a real file
>>>>>>>     >> 3) maps the second half of the mem to the same file
>>>>>>>     >>
>>>>>>>     >> The point is when you write to an address over the end of the
>>>>>>>     >> first half of memory it is taken care of the the third mmap which
>>>>>>>     maps
>>>>>>>     >> the address back to the top of the file for you. This means you
>>>>>>>     >> don't have to worry about ringbuffer wrapping which can be a
>>>>>>>     headache.
>>>>>>>     >>
>>>>>>>     >> -Angus
>>>>>>>     >>
>>>>>>>     >
>>>>>>>     > interesting this mmap operation doesn't work on sparc linux.
>>>>>>>     >
>>>>>>>     > Not sure how I can help here - Next step would be a follow up with the
>>>>>>>     > sparc linux mailing list.  I'll do that and cc you on the message
>>>>>>>     - see
>>>>>>>     > if we get any response.
>>>>>>>     >
>>>>>>>     > http://vger.kernel.org/vger-lists.html
>>>>>>>     >
>>>>>>>     >>>
>>>>>>>     >>>
>>>>>>>     >>>
>>>>>>>     >>>
>>>>>>>     >>> 2011/5/31 Angus Salkeld <asalkeld at redhat.com
>>>>>>>     <mailto:asalkeld at redhat.com>>
>>>>>>>     >>>
>>>>>>>     >>>> On Tue, May 31, 2011 at 06:25:56PM -0300, william felipe_welter
>>>>>>>     wrote:
>>>>>>>     >>>>> Thanks Steven,
>>>>>>>     >>>>>
>>>>>>>     >>>>> Now im try to run on the MCP:
>>>>>>>     >>>>> - Uninstall the pacemaker 1.0
>>>>>>>     >>>>> - Compile and install 1.1
>>>>>>>     >>>>>
>>>>>>>     >>>>> But now i have problems to initialize the pacemakerd: Could not
>>>>>>>     >>>> initialize
>>>>>>>     >>>>> Cluster Configuration Database API instance error 2
>>>>>>>     >>>>> Debbuging with gdb i see that the error are on the confdb.. most
>>>>>>>     >>>> specificaly
>>>>>>>     >>>>> the errors start on coreipcc.c  at line:
>>>>>>>     >>>>>
>>>>>>>     >>>>>
>>>>>>>     >>>>> 448        if (addr != addr_orig) {
>>>>>>>     >>>>> 449                goto error_close_unlink;  <- enter here
>>>>>>>     >>>>> 450       }
>>>>>>>     >>>>>
>>>>>>>     >>>>> Some ideia about  what can cause this  ?
>>>>>>>     >>>>>
>>>>>>>     >>>>
>>>>>>>     >>>> I tried porting a ringbuffer (www.libqb.org
>>>>>>>     <http://www.libqb.org>) to sparc and had the same
>>>>>>>     >>>> failure.
>>>>>>>     >>>> There are 3 mmap() calls and on sparc the third one keeps failing.
>>>>>>>     >>>>
>>>>>>>     >>>> This is a common way of creating a ring buffer, see:
>>>>>>>     >>>>
>>>>>>>     http://en.wikipedia.org/wiki/Circular_buffer#Exemplary_POSIX_Implementation
>>>>>>>     >>>>
>>>>>>>     >>>> I couldn't get it working in the short time I tried. It's probably
>>>>>>>     >>>> worth looking at the clib implementation to see why it's failing
>>>>>>>     >>>> (I didn't get to that).
>>>>>>>     >>>>
>>>>>>>     >>>> -Angus
>>>>>>>     >>>>
>>>>>>>
>>>>>>>     Note, we sorted this out we believe.  Your kernel has hugetlb enabled,
>>>>>>>     probably with 4MB pages.  This requires corosync to allocate 4MB pages.
>>>>>>>
>>>>>>>     Can you verify your hugetlb settings?
>>>>>>>
>>>>>>>     If you can turn this option off, you should have atleast a working
>>>>>>>     corosync.
>>>>>>>
>>>>>>>     Regards
>>>>>>>     -steve
>>>>>>>     >>>>
>>>>>>>     >>>> _______________________________________________
>>>>>>>     >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>>>     >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>     >>>>
>>>>>>>     >>>> Project Home: http://www.clusterlabs.org
>>>>>>>     >>>> Getting started:
>>>>>>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>     >>>> Bugs:
>>>>>>>     >>>>
>>>>>>>     http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>>     >>>>
>>>>>>>     >>>
>>>>>>>     >>>
>>>>>>>     >>>
>>>>>>>     >>> --
>>>>>>>     >>> William Felipe Welter
>>>>>>>     >>> ------------------------------
>>>>>>>     >>> Consultor em Tecnologias Livres
>>>>>>>     >>> william.welter at 4linux.com.br <mailto:william.welter at 4linux.com.br>
>>>>>>>     >>> www.4linux.com.br <http://www.4linux.com.br>
>>>>>>>     >>
>>>>>>>     >>> _______________________________________________
>>>>>>>     >>> Openais mailing list
>>>>>>>     >>> Openais at lists.linux-foundation.org
>>>>>>>     <mailto:Openais at lists.linux-foundation.org>
>>>>>>>     >>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>>>     >>
>>>>>>>     >>
>>>>>>>     >> _______________________________________________
>>>>>>>     >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>>>     >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>     >>
>>>>>>>     >> Project Home: http://www.clusterlabs.org
>>>>>>>     >> Getting started:
>>>>>>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>     >> Bugs:
>>>>>>>     http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>>     >
>>>>>>>     > _______________________________________________
>>>>>>>     > Openais mailing list
>>>>>>>     > Openais at lists.linux-foundation.org
>>>>>>>     <mailto:Openais at lists.linux-foundation.org>
>>>>>>>     > https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>>>
>>>>>>>
>>>>>>>     _______________________________________________
>>>>>>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>>>     http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>>     Project Home: http://www.clusterlabs.org
>>>>>>>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>     Bugs:
>>>>>>>     http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> William Felipe Welter
>>>>>>> ------------------------------
>>>>>>> Consultor em Tecnologias Livres
>>>>>>> william.welter at 4linux.com.br <mailto:william.welter at 4linux.com.br>
>>>>>>> www.4linux.com.br <http://www.4linux.com.br>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
> 
> 
>