[Pacemaker] [Openais] Linux HA on debian sparc

william felipe_welter wfelipew at gmail.com
Tue Jun 7 11:44:02 UTC 2011


More two questions.. The patch for mmap calls will be on the mainly
development for all archs ?
Any problems if i send this patch's for Debian project ?

2011/6/3 Steven Dake <sdake at redhat.com>:
> On 06/02/2011 08:16 PM, william felipe_welter wrote:
>> Well,
>>
>> Now with this patch, the pacemakerd process starts and up his other
>> process ( crmd, lrmd, pengine....) but after the process pacemakerd do
>> a fork, the forked  process pacemakerd dies due to "signal 10, Bus
>> error".. And  on the log, the process of pacemark ( crmd, lrmd,
>> pengine....) cant connect to open ais plugin (possible because the
>> "death" of the pacemakerd process).
>> But this time when the forked pacemakerd dies, he generates a coredump.
>>
>> gdb  -c "/usr/var/lib/heartbeat/cores/root/ pacemakerd 7986"  -se
>> /usr/sbin/pacemakerd :
>> GNU gdb (GDB) 7.0.1-debian
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "sparc-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/sbin/pacemakerd...done.
>> Reading symbols from /usr/lib64/libuuid.so.1...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib64/libuuid.so.1
>> Reading symbols from /usr/lib/libcoroipcc.so.4...done.
>> Loaded symbols for /usr/lib/libcoroipcc.so.4
>> Reading symbols from /usr/lib/libcpg.so.4...done.
>> Loaded symbols for /usr/lib/libcpg.so.4
>> Reading symbols from /usr/lib/libquorum.so.4...done.
>> Loaded symbols for /usr/lib/libquorum.so.4
>> Reading symbols from /usr/lib64/libcrmcommon.so.2...done.
>> Loaded symbols for /usr/lib64/libcrmcommon.so.2
>> Reading symbols from /usr/lib/libcfg.so.4...done.
>> Loaded symbols for /usr/lib/libcfg.so.4
>> Reading symbols from /usr/lib/libconfdb.so.4...done.
>> Loaded symbols for /usr/lib/libconfdb.so.4
>> Reading symbols from /usr/lib64/libplumb.so.2...done.
>> Loaded symbols for /usr/lib64/libplumb.so.2
>> Reading symbols from /usr/lib64/libpils.so.2...done.
>> Loaded symbols for /usr/lib64/libpils.so.2
>> Reading symbols from /lib/libbz2.so.1.0...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libbz2.so.1.0
>> Reading symbols from /usr/lib/libxslt.so.1...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libxslt.so.1
>> Reading symbols from /usr/lib/libxml2.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libxml2.so.2
>> Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libc.so.6
>> Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/librt.so.1
>> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libdl.so.2
>> Reading symbols from /lib/libglib-2.0.so.0...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/libglib-2.0.so.0
>> Reading symbols from /usr/lib/libltdl.so.7...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libltdl.so.7
>> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
>> Loaded symbols for /lib/ld-linux.so.2
>> Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libpthread.so.0
>> Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libm.so.6
>> Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /usr/lib/libz.so.1
>> Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libpcre.so.3
>> Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/libnss_compat.so.2
>> Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libnsl.so.1
>> Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libnss_nis.so.2
>> Reading symbols from /lib/libnss_files.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/libnss_files.so.2
>> Core was generated by `pacemakerd'.
>> Program terminated with signal 10, Bus error.
>> #0  cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339
>> 339                   switch (dispatch_data->id) {
>> (gdb) bt
>> #0  cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339
>> #1  0xf6f100f0 in ?? ()
>> #2  0xf6f100f4 in ?? ()
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>
>>
>>
>> I take a look at the cpg.c and see that the dispatch_data was aquired
>> by coroipcc_dispatch_get (that was defined on lib/coroipcc.c)
>> function:
>>
>>        do {
>>                 error = coroipcc_dispatch_get (
>>                         cpg_inst->handle,
>>                         (void **)&dispatch_data,
>>                         timeout);
>>
>>
>>
>
> Try the recent patch sent to fix alignment.
>
> Regards
> -steve
>
>>
>> Resumed log:
>> ...
>> un 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering f to 10
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 10
>> to pending delivery queue
>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including f
>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 10
>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>> Forked child 7991 for process lrmd
>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>> update_node_processes: Node xxxxxxxxxx now has process list:
>> 00000000000000000000000000100112 (was
>> 00000000000000000000000000100102)
>> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 10 to 11
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 11
>> to pending delivery queue
>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 11
>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>> Forked child 7992 for process attrd
>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>> update_node_processes: Node xxxxxxxxxx now has process list:
>> 00000000000000000000000000101112 (was
>> 00000000000000000000000000100112)
>> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 11 to 12
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 12
>> to pending delivery queue
>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 12
>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>> Forked child 7993 for process pengine
>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>> update_node_processes: Node xxxxxxxxxx now has process list:
>> 00000000000000000000000000111112 (was
>> 00000000000000000000000000101112)
>> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 12 to 13
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 13
>> to pending delivery queue
>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 13
>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>> Forked child 7994 for process crmd
>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>> update_node_processes: Node xxxxxxxxxx now has process list:
>> 00000000000000000000000000111312 (was
>> 00000000000000000000000000111112)
>> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: main: Starting mainloop
>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 13 to 14
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 14
>> to pending delivery queue
>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 14
>> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 14 to 15
>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 15
>> to pending delivery queue
>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 15
>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: Invoked:
>> /usr/lib64/heartbeat/stonithd
>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>> crm_log_init_worker: Changed active directory to
>> /usr/var/lib/heartbeat/cores/root
>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: get_cluster_type:
>> Cluster type is: 'openais'.
>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>> crm_cluster_connect: Connecting to cluster infrastructure: classic
>> openais (with plugin)
>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>> init_ais_connection_classic: Creating connection to our Corosync
>> plugin
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_log_init_worker:
>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: retrieveCib: Reading
>> cluster configuration from: /usr/var/lib/heartbeat/crm/cib.xml
>> (digest: /usr/var/lib/heartbeat/crm/cib.xml.sig)
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: retrieveCib: Cluster
>> configuration not found: /usr/var/lib/heartbeat/crm/cib.xml
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile: Primary
>> configuration corrupt or unusable, trying backup...
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: get_last_sequence:
>> Series file /usr/var/lib/heartbeat/crm/cib.last does not exist
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile: Backup
>> file /usr/var/lib/heartbeat/crm/cib-99.raw not found
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile:
>> Continuing with an empty configuration.
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>> <cib epoch="0" num_updates="0" admin_epoch="0"
>> validate-with="pacemaker-1.2" >
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>   <configuration >
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>     <crm_config />
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>     <nodes />
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>     <resources />
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>     <constraints />
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>   </configuration>
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>   <status />
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] </cib>
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: validate_with_relaxng:
>> Creating RNG parser context
>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
>> Doesn't exist (12)
>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: CRIT: main: Cannot sign
>> in to the cluster... terminating
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: Invoked:
>> /usr/lib64/heartbeat/crmd
>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: Invoked:
>> /usr/lib64/heartbeat/pengine
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crm_log_init_worker:
>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: crm_log_init_worker:
>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: main: CRM Hg Version:
>> e872eeb39a5f6e1fdb57c3108551a5353648c4f4
>>
>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Checking for
>> old instances of pengine
>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug:
>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>> /usr/var/run/crm/pengine
>> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: enabling coredumps
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crmd_init: Starting crmd
>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug:
>> init_client_ipc_comms_nodispatch: Could not init comms on:
>> /usr/var/run/crm/pengine
>> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: debug: main: run the loop...
>> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: Started.
>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Init server comms
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: s_crmd_fsa: Processing
>> I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ]
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
>> actions:trace:        // A_LOG
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
>> actions:trace:        // A_STARTUP
>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: main: Starting pengine
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup:
>> Registering Signal Handlers
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup: Creating
>> CIB and LRM objects
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
>> actions:trace:        // A_CIB_START
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>> /usr/var/run/crm/cib_rw
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>> init_client_ipc_comms_nodispatch: Could not init comms on:
>> /usr/var/run/crm/cib_rw
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>> Connection to command channel failed
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>> /usr/var/run/crm/cib_callback
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>> init_client_ipc_comms_nodispatch: Could not init comms on:
>> /usr/var/run/crm/cib_callback
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>> Connection to callback channel failed
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>> Connection to CIB failed: connection failed
>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signoff:
>> Signing out of the CIB Service
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: activateCibXml:
>> Triggering CIB write for start op
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: startCib: CIB
>> Initialization completed successfully
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: get_cluster_type:
>> Cluster type is: 'openais'.
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_cluster_connect:
>> Connecting to cluster infrastructure: classic openais (with plugin)
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info:
>> init_ais_connection_classic: Creating connection to our Corosync
>> plugin
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info:
>> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
>> Doesn't exist (12)
>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: CRIT: cib_init: Cannot sign in
>> to the cluster... terminating
>> Jun 02 23:12:21 corosync [CPG   ] exit_fn for conn=0x62500
>> Jun 02 23:12:21 corosync [TOTEM ] mcasted message added to pending queue
>> Jun 02 23:12:21 corosync [TOTEM ] Delivering 15 to 16
>> Jun 02 23:12:21 corosync [TOTEM ] Delivering MCAST message with seq 16
>> to pending delivery queue
>> Jun 02 23:12:21 corosync [CPG   ] got procleave message from cluster
>> node 1377289226
>> Jun 02 23:12:21 corosync [TOTEM ] releasing messages up to and including 16
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: Invoked:
>> /usr/lib64/heartbeat/attrd
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_log_init_worker:
>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Starting up
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: get_cluster_type:
>> Cluster type is: 'openais'.
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_cluster_connect:
>> Connecting to cluster infrastructure: classic openais (with plugin)
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info:
>> init_ais_connection_classic: Creating connection to our Corosync
>> plugin
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info:
>> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
>> Doesn't exist (12)
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: HA Signon failed
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Cluster connection active
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Accepting
>> attribute updates
>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: Aborting startup
>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>> /usr/var/run/crm/cib_rw
>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
>> init_client_ipc_comms_nodispatch: Could not init comms on:
>> /usr/var/run/crm/cib_rw
>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>> Connection to command channel failed
>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>> /usr/var/run/crm/cib_callback
>> ...
>>
>>
>> 2011/6/2 Steven Dake <sdake at redhat.com>:
>>> On 06/01/2011 11:05 PM, william felipe_welter wrote:
>>>> I recompile my kernel without hugetlb .. and the result are the same..
>>>>
>>>> My test program still resulting:
>>>> PATH=/dev/shm/teste123XXXXXX
>>>> page size=20000
>>>> fd=3
>>>> ADDR_ORIG:0xe000a000  ADDR:0xffffffff
>>>> Erro
>>>>
>>>> And Pacemaker still resulting because the mmap error:
>>>> Could not initialize Cluster Configuration Database API instance error 2
>>>>
>>>
>>> Give the patch I posted recently a spin - corosync WFM with this patch
>>> on sparc64 with hugetlb set.  Please report back results.
>>>
>>> Regards
>>> -steve
>>>
>>>> For make sure that i have disable the hugetlb there is my /proc/meminfo:
>>>> MemTotal:       33093488 kB
>>>> MemFree:        32855616 kB
>>>> Buffers:            5600 kB
>>>> Cached:            53480 kB
>>>> SwapCached:            0 kB
>>>> Active:            45768 kB
>>>> Inactive:          28104 kB
>>>> Active(anon):      18024 kB
>>>> Inactive(anon):     1560 kB
>>>> Active(file):      27744 kB
>>>> Inactive(file):    26544 kB
>>>> Unevictable:           0 kB
>>>> Mlocked:               0 kB
>>>> SwapTotal:       6104680 kB
>>>> SwapFree:        6104680 kB
>>>> Dirty:                 0 kB
>>>> Writeback:             0 kB
>>>> AnonPages:         14936 kB
>>>> Mapped:             7736 kB
>>>> Shmem:              4624 kB
>>>> Slab:              39184 kB
>>>> SReclaimable:      10088 kB
>>>> SUnreclaim:        29096 kB
>>>> KernelStack:        7088 kB
>>>> PageTables:         1160 kB
>>>> Quicklists:        17664 kB
>>>> NFS_Unstable:          0 kB
>>>> Bounce:                0 kB
>>>> WritebackTmp:          0 kB
>>>> CommitLimit:    22651424 kB
>>>> Committed_AS:     519368 kB
>>>> VmallocTotal:   1069547520 kB
>>>> VmallocUsed:       11064 kB
>>>> VmallocChunk:   1069529616 kB
>>>>
>>>>
>>>> 2011/6/1 Steven Dake <sdake at redhat.com>:
>>>>> On 06/01/2011 07:42 AM, william felipe_welter wrote:
>>>>>> Steven,
>>>>>>
>>>>>> cat /proc/meminfo
>>>>>> ...
>>>>>> HugePages_Total:       0
>>>>>> HugePages_Free:        0
>>>>>> HugePages_Rsvd:        0
>>>>>> HugePages_Surp:        0
>>>>>> Hugepagesize:       4096 kB
>>>>>> ...
>>>>>>
>>>>>
>>>>> It definitely requires a kernel compile and setting the config option to
>>>>> off.  I don't know the debian way of doing this.
>>>>>
>>>>> The only reason you may need this option is if you have very large
>>>>> memory sizes, such as 48GB or more.
>>>>>
>>>>> Regards
>>>>> -steve
>>>>>
>>>>>> Its 4MB..
>>>>>>
>>>>>> How can i disable hugetlb ? ( passing CONFIG_HUGETLBFS=n at boot to
>>>>>> kernel ?)
>>>>>>
>>>>>> 2011/6/1 Steven Dake <sdake at redhat.com <mailto:sdake at redhat.com>>
>>>>>>
>>>>>>     On 06/01/2011 01:05 AM, Steven Dake wrote:
>>>>>>     > On 05/31/2011 09:44 PM, Angus Salkeld wrote:
>>>>>>     >> On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter
>>>>>>     wrote:
>>>>>>     >>> Angus,
>>>>>>     >>>
>>>>>>     >>> I make some test program (based on the code coreipcc.c) and i
>>>>>>     now i sure
>>>>>>     >>> that are problems with the mmap systems call on sparc..
>>>>>>     >>>
>>>>>>     >>> Source code of my test program:
>>>>>>     >>>
>>>>>>     >>> #include <stdlib.h>
>>>>>>     >>> #include <sys/mman.h>
>>>>>>     >>> #include <stdio.h>
>>>>>>     >>>
>>>>>>     >>> #define PATH_MAX  36
>>>>>>     >>>
>>>>>>     >>> int main()
>>>>>>     >>> {
>>>>>>     >>>
>>>>>>     >>> int32_t fd;
>>>>>>     >>> void *addr_orig;
>>>>>>     >>> void *addr;
>>>>>>     >>> char path[PATH_MAX];
>>>>>>     >>> const char *file = "teste123XXXXXX";
>>>>>>     >>> size_t bytes=10024;
>>>>>>     >>>
>>>>>>     >>> snprintf (path, PATH_MAX, "/dev/shm/%s", file);
>>>>>>     >>> printf("PATH=%s\n",path);
>>>>>>     >>>
>>>>>>     >>> fd = mkstemp (path);
>>>>>>     >>> printf("fd=%d \n",fd);
>>>>>>     >>>
>>>>>>     >>>
>>>>>>     >>> addr_orig = mmap (NULL, bytes, PROT_NONE,
>>>>>>     >>>               MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>>>>>>     >>>
>>>>>>     >>>
>>>>>>     >>> addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE,
>>>>>>     >>>               MAP_FIXED | MAP_SHARED, fd, 0);
>>>>>>     >>>
>>>>>>     >>> printf("ADDR_ORIG:%p  ADDR:%p\n",addr_orig,addr);
>>>>>>     >>>
>>>>>>     >>>
>>>>>>     >>>   if (addr != addr_orig) {
>>>>>>     >>>                printf("Erro");
>>>>>>     >>>         }
>>>>>>     >>> }
>>>>>>     >>>
>>>>>>     >>> Results on x86:
>>>>>>     >>> PATH=/dev/shm/teste123XXXXXX
>>>>>>     >>> fd=3
>>>>>>     >>> ADDR_ORIG:0x7f867d8e6000  ADDR:0x7f867d8e6000
>>>>>>     >>>
>>>>>>     >>> Results on sparc:
>>>>>>     >>> PATH=/dev/shm/teste123XXXXXX
>>>>>>     >>> fd=3
>>>>>>     >>> ADDR_ORIG:0xf7f72000  ADDR:0xffffffff
>>>>>>     >>
>>>>>>     >> Note: 0xffffffff == MAP_FAILED
>>>>>>     >>
>>>>>>     >> (from man mmap)
>>>>>>     >> RETURN VALUE
>>>>>>     >>        On success, mmap() returns a pointer to the mapped area.  On
>>>>>>     >>        error, the value MAP_FAILED (that is, (void *) -1) is
>>>>>>     returned,
>>>>>>     >>        and errno is  set appropriately.
>>>>>>     >>
>>>>>>     >>>
>>>>>>     >>>
>>>>>>     >>> But im wondering if is really needed to call mmap 2 times ?
>>>>>>      What are the
>>>>>>     >>> reason to call the mmap 2 times, on the second time using the
>>>>>>     address of the
>>>>>>     >>> first?
>>>>>>     >>>
>>>>>>     >>>
>>>>>>     >> Well there are 3 calls to mmap()
>>>>>>     >> 1) one to allocate 2 * what you need (in pages)
>>>>>>     >> 2) maps the first half of the mem to a real file
>>>>>>     >> 3) maps the second half of the mem to the same file
>>>>>>     >>
>>>>>>     >> The point is when you write to an address over the end of the
>>>>>>     >> first half of memory it is taken care of the the third mmap which
>>>>>>     maps
>>>>>>     >> the address back to the top of the file for you. This means you
>>>>>>     >> don't have to worry about ringbuffer wrapping which can be a
>>>>>>     headache.
>>>>>>     >>
>>>>>>     >> -Angus
>>>>>>     >>
>>>>>>     >
>>>>>>     > interesting this mmap operation doesn't work on sparc linux.
>>>>>>     >
>>>>>>     > Not sure how I can help here - Next step would be a follow up with the
>>>>>>     > sparc linux mailing list.  I'll do that and cc you on the message
>>>>>>     - see
>>>>>>     > if we get any response.
>>>>>>     >
>>>>>>     > http://vger.kernel.org/vger-lists.html
>>>>>>     >
>>>>>>     >>>
>>>>>>     >>>
>>>>>>     >>>
>>>>>>     >>>
>>>>>>     >>> 2011/5/31 Angus Salkeld <asalkeld at redhat.com
>>>>>>     <mailto:asalkeld at redhat.com>>
>>>>>>     >>>
>>>>>>     >>>> On Tue, May 31, 2011 at 06:25:56PM -0300, william felipe_welter
>>>>>>     wrote:
>>>>>>     >>>>> Thanks Steven,
>>>>>>     >>>>>
>>>>>>     >>>>> Now im try to run on the MCP:
>>>>>>     >>>>> - Uninstall the pacemaker 1.0
>>>>>>     >>>>> - Compile and install 1.1
>>>>>>     >>>>>
>>>>>>     >>>>> But now i have problems to initialize the pacemakerd: Could not
>>>>>>     >>>> initialize
>>>>>>     >>>>> Cluster Configuration Database API instance error 2
>>>>>>     >>>>> Debbuging with gdb i see that the error are on the confdb.. most
>>>>>>     >>>> specificaly
>>>>>>     >>>>> the errors start on coreipcc.c  at line:
>>>>>>     >>>>>
>>>>>>     >>>>>
>>>>>>     >>>>> 448        if (addr != addr_orig) {
>>>>>>     >>>>> 449                goto error_close_unlink;  <- enter here
>>>>>>     >>>>> 450       }
>>>>>>     >>>>>
>>>>>>     >>>>> Some ideia about  what can cause this  ?
>>>>>>     >>>>>
>>>>>>     >>>>
>>>>>>     >>>> I tried porting a ringbuffer (www.libqb.org
>>>>>>     <http://www.libqb.org>) to sparc and had the same
>>>>>>     >>>> failure.
>>>>>>     >>>> There are 3 mmap() calls and on sparc the third one keeps failing.
>>>>>>     >>>>
>>>>>>     >>>> This is a common way of creating a ring buffer, see:
>>>>>>     >>>>
>>>>>>     http://en.wikipedia.org/wiki/Circular_buffer#Exemplary_POSIX_Implementation
>>>>>>     >>>>
>>>>>>     >>>> I couldn't get it working in the short time I tried. It's probably
>>>>>>     >>>> worth looking at the clib implementation to see why it's failing
>>>>>>     >>>> (I didn't get to that).
>>>>>>     >>>>
>>>>>>     >>>> -Angus
>>>>>>     >>>>
>>>>>>
>>>>>>     Note, we sorted this out we believe.  Your kernel has hugetlb enabled,
>>>>>>     probably with 4MB pages.  This requires corosync to allocate 4MB pages.
>>>>>>
>>>>>>     Can you verify your hugetlb settings?
>>>>>>
>>>>>>     If you can turn this option off, you should have atleast a working
>>>>>>     corosync.
>>>>>>
>>>>>>     Regards
>>>>>>     -steve
>>>>>>     >>>>
>>>>>>     >>>> _______________________________________________
>>>>>>     >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>>     >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>     >>>>
>>>>>>     >>>> Project Home: http://www.clusterlabs.org
>>>>>>     >>>> Getting started:
>>>>>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>     >>>> Bugs:
>>>>>>     >>>>
>>>>>>     http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>     >>>>
>>>>>>     >>>
>>>>>>     >>>
>>>>>>     >>>
>>>>>>     >>> --
>>>>>>     >>> William Felipe Welter
>>>>>>     >>> ------------------------------
>>>>>>     >>> Consultor em Tecnologias Livres
>>>>>>     >>> william.welter at 4linux.com.br <mailto:william.welter at 4linux.com.br>
>>>>>>     >>> www.4linux.com.br <http://www.4linux.com.br>
>>>>>>     >>
>>>>>>     >>> _______________________________________________
>>>>>>     >>> Openais mailing list
>>>>>>     >>> Openais at lists.linux-foundation.org
>>>>>>     <mailto:Openais at lists.linux-foundation.org>
>>>>>>     >>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>>     >>
>>>>>>     >>
>>>>>>     >> _______________________________________________
>>>>>>     >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>>     >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>     >>
>>>>>>     >> Project Home: http://www.clusterlabs.org
>>>>>>     >> Getting started:
>>>>>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>     >> Bugs:
>>>>>>     http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>     >
>>>>>>     > _______________________________________________
>>>>>>     > Openais mailing list
>>>>>>     > Openais at lists.linux-foundation.org
>>>>>>     <mailto:Openais at lists.linux-foundation.org>
>>>>>>     > https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>>
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>>     http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>>     Project Home: http://www.clusterlabs.org
>>>>>>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>     Bugs:
>>>>>>     http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> William Felipe Welter
>>>>>> ------------------------------
>>>>>> Consultor em Tecnologias Livres
>>>>>> william.welter at 4linux.com.br <mailto:william.welter at 4linux.com.br>
>>>>>> www.4linux.com.br <http://www.4linux.com.br>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>



-- 
William Felipe Welter
------------------------------
Consultor em Tecnologias Livres
william.welter at 4linux.com.br
www.4linux.com.br




More information about the Pacemaker mailing list