[Pacemaker] cibadmin -Q: Call cib_query failed (-62): Timer expired
Andrew Beekhof
andrew at beekhof.net
Wed Oct 2 05:13:58 UTC 2013
On 28/09/2013, at 5:37 AM, Radoslaw Garbacz <radoslaw.garbacz at xtremedatainc.com> wrote:
> The problem was actually of a different nature - nothing to do with
> cib_shm. The logs showed later on that the connection to cib was
> established, just the corosync configuration file didn't hava a proper
> quorum section, which caused the experienced problems.
>
> After fixing "corosync,conf" "quorum" section everything works.
I would not have expected that one would result in the other.
Glad you got it sorted out though!
>
> many thanks,
>
>
> On Fri, Sep 27, 2013 at 2:16 PM, Radoslaw Garbacz
> <radoslaw.garbacz at xtremedatainc.com> wrote:
>> cibadmin -Ql works, problem is persistent after upgrade, and the logs
>> for "crmd" reviled the problem:
>>
>> Sep 27 16:19:22 [5074] ip-10-82-197-219 crmd: info:
>> crm_ipc_connect: Could not establish cib_shm connection: Connection
>> refused (111)
>> Sep 27 16:19:22 [5074] ip-10-82-197-219 crmd: debug:
>> cib_native_signon_raw: Connection unsuccessful (0 (nil))
>> Sep 27 16:19:22 [5074] ip-10-82-197-219 crmd: debug:
>> cib_native_signon_raw: Connection to CIB failed: Transport endpoint
>> is not connected
>>
>> I will keep searching for the solution, but in meantime, if you had a
>> moment, any hint would be welcomed.
>>
>> many thanks,
>>
>>
>> On Thu, Sep 26, 2013 at 9:25 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>
>>> On 27/09/2013, at 8:45 AM, Radoslaw Garbacz <radoslaw.garbacz at xtremedatainc.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a problem starting up a cluster after upgrading corosync from
>>>> 1.4 to 2.3.2 and pacemaker from 1.8 to 1.9.
>>>>
>>>> All "crm_node" calls report well, but any CIB manipulation fails, i.e.:
>>>> * crm_node -q: 1
>>>> * crm_node -l: OK
>>>> * crm_node -p: OK
>>>> * cibadmin -Q: Call cib_query failed (-62): Timer expired
>>>
>>> Does cibadmin -Ql work?
>>> If so, there might be a DC election going on (look in the logs for "crmd").
>>> Is the error transient or persistent?
>>>
>>>>
>>>> No iptables, no SELinux, 3 nodes cluster, corosync.conf:
>>>> ...
>>>> ringnumber: 0
>>>> bindnetaddr: ...
>>>> mcastport: 7800
>>>> }
>>>>
>>>> transport: udpu
>>>>
>>>>
>>>>
>>>> Any help greatly appreciated.
>>>>
>>>>
>>>> Below is some more information:
>>>>
>>>> * pacemaker logs:
>>>>
>>>> Sep 26 22:24:00 [2836] ip-10-114-210-162 cib: info:
>>>> crm_client_new: Connecting 0x111b780 for uid=0 gid=0 pid=2883
>>>> id=977d6f23-963b-41a4-8fe0-a63024080d41
>>>> Sep 26 22:24:00 [2836] ip-10-114-210-162 cib: info:
>>>> cib_process_request: Forwarding cib_query operation for section
>>>> 'all' to master (origin=local/cibadmin/2)
>>>> Sep 26 22:24:30 [2836] ip-10-114-210-162 cib: info:
>>>> crm_client_destroy: Destroying 0 events
>>>>
>>>>
>>>> * ps axf | grep pacemaker|corosync:
>>>>
>>>> 2806 ? Ssl 0:10 corosync
>>>> 2834 pts/1 S 0:00 pacemakerd
>>>> 2836 ? Ss 0:01 \_ /usr/libexec/pacemaker/cib
>>>> 2837 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd
>>>> 2838 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd
>>>> 2839 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd
>>>> 2840 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine
>>>> 2841 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd
>>>>
>>>>
>>>> * strace cibadmin -Q:
>>>>
>>>> open("/dev/shm/qb-cib_rw-event-2836-2897-12-data", O_RDWR) = 6
>>>> ftruncate(6, 20480000) = 0
>>>> mmap(NULL, 40960000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
>>>> 0x7fa221692000
>>>> mmap(0x7fa221692000, 20480000, PROT_READ|PROT_WRITE,
>>>> MAP_SHARED|MAP_FIXED, 6, 0) = 0x7fa221692000
>>>> mmap(0x7fa222a1a000, 20480000, PROT_READ|PROT_WRITE,
>>>> MAP_SHARED|MAP_FIXED, 6, 0) = 0x7fa222a1a000
>>>> close(6) = 0
>>>> close(5) = 0
>>>> close(6) = -1 EBADF (Bad file descriptor)
>>>> fstat(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
>>>> fcntl(4, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
>>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
>>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
>>>> sendto(4, "~", 1, MSG_NOSIGNAL, NULL, 0) = 1
>>>> futex(0x7fa22df4cb60, FUTEX_WAKE_PRIVATE, 2147483647) = 0
>>>> gettimeofday({1380234692, 68879}, NULL) = 0
>>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
>>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
>>>> gettimeofday({1380234692, 69522}, NULL) = 0
>>>> sendto(4, "\274", 1, MSG_NOSIGNAL, NULL, 0) = 1
>>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
>>>> gettimeofday({1380234692, 70085}, NULL) = 0
>>>> gettimeofday({1380234692, 70197}, NULL) = 0
>>>> poll([{fd=4, events=POLLIN}], 1, 30000) = 0 (Timeout)
>>>> gettimeofday({1380234722, 91625}, NULL) = 0
>>>> write(2, "Call cib_query failed (-62): Tim"..., 43Call cib_query
>>>> failed (-62): Timer expired
>>>> ) = 43
>>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
>>>>
>>>>
>>>> * netstat -lxp:
>>>>
>>>> Active UNIX domain sockets (only servers)
>>>> Proto RefCnt Flags Type State I-Node PID/Program
>>>> name Path
>>>> unix 2 [ ACC ] STREAM LISTENING 20021 2836/cib
>>>> @cib_rw
>>>> unix 2 [ ACC ] STREAM LISTENING 19958 2838/lrmd
>>>> @lrmd
>>>> unix 2 [ ACC ] STREAM LISTENING 19789 2806/corosync
>>>> @quorum
>>>> unix 2 [ ACC ] STREAM LISTENING 19786 2806/corosync
>>>> @cmap
>>>> unix 2 [ ACC ] STREAM LISTENING 20020 2836/cib
>>>> @cib_ro
>>>> unix 2 [ ACC ] STREAM LISTENING 20057 2837/stonithd
>>>> @stonith-ng
>>>> unix 2 [ ACC ] STREAM LISTENING 19787 2806/corosync
>>>> @cfg
>>>> unix 2 [ ACC ] STREAM LISTENING 19906
>>>> 2834/pacemakerd @pacemakerd
>>>> unix 2 [ ACC ] STREAM LISTENING 19788 2806/corosync
>>>> @cpg
>>>> unix 2 [ ACC ] STREAM LISTENING 20022 2836/cib
>>>> @cib_shm
>>>> unix 2 [ ACC ] STREAM LISTENING 19985 2840/pengine
>>>> @pengine
>>>>
>>>>
>>>>
>>>> Thanks in advance,
>>>>
>>>> --
>>>> Best Regards,
>>>>
>>>> Radoslaw Garbacz
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>>
>> --
>> Best Regards,
>>
>> Radoslaw Garbacz
>> XtremeData Incorporation
>
>
>
> --
> Best Regards,
>
> Radoslaw Garbacz
> XtremeData Incorporation
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131002/6bb8bc61/attachment-0003.sig>
More information about the Pacemaker
mailing list