[ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
Andrew Beekhof
andrew at beekhof.net
Tue Aug 4 04:16:29 UTC 2015
> On 12 May 2015, at 12:12 pm, renayama19661014 at ybb.ne.jp wrote:
>
> Hi All,
>
> The problem is like a buffer becoming NULL after crm_resouce -C practice somehow or other after having rebooted remote node.
>
> I incorporated log in a source code and confirmed it.
>
> ------------------------------------------------
> crm_remote_recv_once(crm_remote_t * remote)
> {
> (snip)
> /* automatically grow the buffer when needed */
> if(remote->buffer_size < read_len) {
> remote->buffer_size = 2 * read_len;
> crm_trace("Expanding buffer to %u bytes", remote->buffer_size);
>
> remote->buffer = realloc_safe(remote->buffer, remote->buffer_size + 1);
> CRM_ASSERT(remote->buffer != NULL);
> }
>
> #ifdef HAVE_GNUTLS_GNUTLS_H
> if (remote->tls_session) {
> if (remote->buffer == NULL) {
> crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d] readlen[%d]", remote->buffer_size, read_len);
> }
> rc = gnutls_record_recv(*(remote->tls_session),
> remote->buffer + remote->buffer_offset,
> remote->buffer_size - remote->buffer_offset);
> (snip)
> ------------------------------------------------
>
> May 12 10:54:01 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
> May 12 10:54:02 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
> May 12 10:54:04 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
Do you know if this behaviour still exists?
A LOT of work went into the remote node logic in the last couple of months, its possible this was fixed as a side-effect.
>
> ------------------------------------------------
>
> gnutls_record_recv processes an empty buffer and becomes the error.
>
> ------------------------------------------------
> (snip)
> ssize_t
> _gnutls_recv_int(gnutls_session_t session, content_type_t type,
> gnutls_handshake_description_t htype,
> gnutls_packet_t *packet,
> uint8_t * data, size_t data_size, void *seq,
> unsigned int ms)
> {
> int ret;
>
> if (packet == NULL && (type != GNUTLS_ALERT && type != GNUTLS_HEARTBEAT)
> && (data_size == 0 || data == NULL))
> return gnutls_assert_val(GNUTLS_E_INVALID_REQUEST);
>
> (sip)
> ssize_t
> gnutls_record_recv(gnutls_session_t session, void *data, size_t data_size)
> {
> return _gnutls_recv_int(session, GNUTLS_APPLICATION_DATA, -1, NULL,
> data, data_size, NULL,
> session->internals.record_timeout_ms);
> }
> (snip)
> ------------------------------------------------
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> ----- Original Message -----
>> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
>> To: "users at clusterlabs.org" <users at clusterlabs.org>
>> Cc:
>> Date: 2015/5/11, Mon 16:45
>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
>>
>> Hi Ulrich,
>>
>> Thank you for comments.
>>
>>> So your host and you resource are both named "snmp1"? I also
>> don't
>>> have much experience with cleaning up resources for a node that is offline.
>> What
>>> change should it make (while the node is offline)?
>>
>>
>> The name of the remote resource and the name of the remote node make same
>> "snmp1".
>>
>>
>> (snip)
>> primitive snmp1 ocf:pacemaker:remote \
>> params \
>> server="snmp1" \
>> op start interval="0s" timeout="60s"
>> on-fail="ignore" \
>> op monitor interval="3s" timeout="15s" \
>> op stop interval="0s" timeout="60s"
>> on-fail="ignore"
>>
>> primitive Host-rsc1 ocf:heartbeat:Dummy \
>> op start interval="0s" timeout="60s"
>> on-fail="restart" \
>> op monitor interval="10s" timeout="60s"
>> on-fail="restart" \
>> op stop interval="0s" timeout="60s"
>> on-fail="ignore"
>>
>> primitive Remote-rsc1 ocf:heartbeat:Dummy \
>> op start interval="0s" timeout="60s"
>> on-fail="restart" \
>> op monitor interval="10s" timeout="60s"
>> on-fail="restart" \
>> op stop interval="0s" timeout="60s"
>> on-fail="ignore"
>>
>> location loc1 Remote-rsc1 \
>> rule 200: #uname eq snmp1
>> location loc3 Host-rsc1 \
>> rule 200: #uname eq bl460g8n1
>> (snip)
>>
>> The pacemaker_remoted of the snmp1 node stops in SIGTERM.
>> I reboot pacemaker_remoted of the snmp1 node afterwards.
>> And I execute crm_resource command, but the snmp1 node remains off-line.
>>
>> After having executed crm_resource command, the remote node thinks that it is
>> right movement to become the snmp1 online.
>>
>>
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>>
>>
>>
>> ----- Original Message -----
>>> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
>>> Cc:
>>> Date: 2015/5/11, Mon 15:39
>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: [Question] About movement of
>> pacemaker_remote.
>>>
>>>>>> <renayama19661014 at ybb.ne.jp> schrieb am 11.05.2015 um
>> 06:22
>>> in Nachricht
>>> <361916.15877.qm at web200006.mail.kks.yahoo.co.jp>:
>>>> Hi All,
>>>>
>>>> I matched the OS version of the remote node with a host once again and
>>
>>>> confirmed it in Pacemaker1.1.13-rc2.
>>>>
>>>> It was the same even if I made a host RHEL7.1.(bl460g8n1)
>>>> I made the remote host RHEL7.1.(snmp1)
>>>>
>>>> The first crm_resource -C fails.
>>>> --------------------------------
>>>> [root at bl460g8n1 ~]# crm_resource -C -r snmp1
>>>> Cleaning up snmp1 on bl460g8n1
>>>> Waiting for 1 replies from the CRMd. OK
>>>>
>>>> [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>> Last updated: Mon May 11 12:44:31 2015
>>>> Last change: Mon May 11 12:43:30 2015
>>>> Stack: corosync
>>>> Current DC: bl460g8n1 - partition WITHOUT quorum
>>>> Version: 1.1.12-7a2e3ae
>>>> 2 Nodes configured
>>>> 3 Resources configured
>>>>
>>>>
>>>> Online: [ bl460g8n1 ]
>>>> RemoteOFFLINE: [ snmp1 ]
>>>
>>> So your host and you resource are both named "snmp1"? I also
>> don't
>>> have much experience with cleaning up resources for a node that is offline.
>> What
>>> change should it make (while the node is offline)?
>>>
>>>>
>>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1
>>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1 (failure
>> ignored)
>>>>
>>>> Node Attributes:
>>>> * Node bl460g8n1:
>>>> + ringnumber_0 : 192.168.101.21 is UP
>>>> + ringnumber_1 : 192.168.102.21 is UP
>>>>
>>>> Migration summary:
>>>> * Node bl460g8n1:
>>>> snmp1: migration-threshold=1 fail-count=1000000
>> last-failure='Mon
>>> May 11
>>>> 12:44:28 2015'
>>>>
>>>> Failed actions:
>>>> snmp1_start_0 on bl460g8n1 'unknown error' (1): call=5,
>>> status=Timed
>>>> Out, exit-reason='none', last-rc-change='Mon May 11
>> 12:43:31
>>> 2015', queued=0ms,
>>>> exec=0ms
>>>> --------------------------------
>>>>
>>>>
>>>> The second crm_resource -C succeeded and was connected to the remote
>> host.
>>>
>>> Then the node was online it seems.
>>>
>>> Regards,
>>> Ulrich
>>>
>>>> --------------------------------
>>>> [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>> Last updated: Mon May 11 12:44:54 2015
>>>> Last change: Mon May 11 12:44:48 2015
>>>> Stack: corosync
>>>> Current DC: bl460g8n1 - partition WITHOUT quorum
>>>> Version: 1.1.12-7a2e3ae
>>>> 2 Nodes configured
>>>> 3 Resources configured
>>>>
>>>>
>>>> Online: [ bl460g8n1 ]
>>>> RemoteOnline: [ snmp1 ]
>>>>
>>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1
>>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
>>>> snmp1 (ocf::pacemaker:remote): Started bl460g8n1
>>>>
>>>> Node Attributes:
>>>> * Node bl460g8n1:
>>>> + ringnumber_0 : 192.168.101.21 is UP
>>>> + ringnumber_1 : 192.168.102.21 is UP
>>>> * Node snmp1:
>>>>
>>>> Migration summary:
>>>> * Node bl460g8n1:
>>>> * Node snmp1:
>>>> --------------------------------
>>>>
>>>> The gnutls of a host and the remote node was the next version.
>>>>
>>>> gnutls-devel-3.3.8-12.el7.x86_64
>>>> gnutls-dane-3.3.8-12.el7.x86_64
>>>> gnutls-c++-3.3.8-12.el7.x86_64
>>>> gnutls-3.3.8-12.el7.x86_64
>>>> gnutls-utils-3.3.8-12.el7.x86_64
>>>>
>>>>
>>>> Best Regards,
>>>> Hideo Yamauchi.
>>>>
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "renayama19661014 at ybb.ne.jp"
>>> <renayama19661014 at ybb.ne.jp>
>>>>> To: Cluster Labs - All topics related to open-source clustering
>>> welcomed
>>>> <users at clusterlabs.org>
>>>>> Cc:
>>>>> Date: 2015/4/28, Tue 14:06
>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question] About movement of
>>>> pacemaker_remote.
>>>>>
>>>>> Hi David,
>>>>>
>>>>> Even if the result changed the remote node to RHEL7.1, it was the
>> same.
>>>>>
>>>>>
>>>>> I try it with a host node of pacemaker as RHEL7.1 this time.
>>>>>
>>>>>
>>>>> I noticed an interesting phenomenon.
>>>>> The remote node fails in a reconnection in the first crm_resource.
>>>>> However, the remote node succeeds in a reconnection in the second
>>>> crm_resource.
>>>>>
>>>>> I think that I have some problem with the point where I cut the
>>> connection
>>>> with
>>>>> the remote node first.
>>>>>
>>>>> Best Regards,
>>>>> Hideo Yamauchi.
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "renayama19661014 at ybb.ne.jp"
>>>>> <renayama19661014 at ybb.ne.jp>
>>>>>> To: Cluster Labs - All topics related to open-source
>> clustering
>>> welcomed
>>>>> <users at clusterlabs.org>
>>>>>> Cc:
>>>>>> Date: 2015/4/28, Tue 11:52
>>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question] About
>> movement of
>>>>> pacemaker_remote.
>>>>>>
>>>>>> Hi David,
>>>>>> Thank you for comments.
>>>>>>> At first glance this looks gnutls related. GNUTLS is
>>> returning -50
>>>>> during
>>>>>> receive
>>>>>>
>>>>>>> on the client side (pacemaker's side). -50 maps to
>>> 'invalid
>>>>>> request'. >debug: crm_remote_recv_once: TLS
>> receive
>>> failed: The
>>>>>> request is invalid. >We treat this error as fatal and
>> destroy
>>> the
>>>>> connection.
>>>>>> I've never encountered
>>>>>>> this error and I don't know what causes it. It's
>>> possible
>>>>>> there's a bug in
>>>>>>> our gnutls usage... it's also possible there's a
>> bug
>>> in the
>>>>> version
>>>>>> of gnutls
>>>>>>> that is in use as well.
>>>>>> We built the remote node in RHEL6.5.
>>>>>> Because it may be a problem of gnutls, I confirm it in
>> RHEL7.1.
>>>>>>
>>>>>> Best Regards,
>>>>>> Hideo Yamauchi.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list