[ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.

Tue Aug 4 00:16:29 EDT 2015

> On 12 May 2015, at 12:12 pm, renayama19661014 at ybb.ne.jp wrote:
> 
> Hi All,
> 
> The problem is like a buffer becoming NULL after crm_resouce -C practice somehow or other after having rebooted remote node.
> 
> I incorporated log in a source code and confirmed it.
> 
> ------------------------------------------------
> crm_remote_recv_once(crm_remote_t * remote)
> {
> (snip)
>     /* automatically grow the buffer when needed */
>     if(remote->buffer_size < read_len) {
>            remote->buffer_size = 2 * read_len;
>         crm_trace("Expanding buffer to %u bytes", remote->buffer_size);
> 
>         remote->buffer = realloc_safe(remote->buffer, remote->buffer_size + 1);
>         CRM_ASSERT(remote->buffer != NULL);
>     }
> 
> #ifdef HAVE_GNUTLS_GNUTLS_H
>     if (remote->tls_session) {
>         if (remote->buffer == NULL) {
>        crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d] readlen[%d]", remote->buffer_size, read_len);
>         }
>         rc = gnutls_record_recv(*(remote->tls_session),
>                                 remote->buffer + remote->buffer_offset,
>                                 remote->buffer_size - remote->buffer_offset);
> (snip)
> ------------------------------------------------
> 
> May 12 10:54:01 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
> May 12 10:54:02 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
> May 12 10:54:04 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]

Do you know if this behaviour still exists?
A LOT of work went into the remote node logic in the last couple of months, its possible this was fixed as a side-effect.

> 
> ------------------------------------------------
> 
> gnutls_record_recv processes an empty buffer and becomes the error.
> 
> ------------------------------------------------
> (snip)
> ssize_t
> _gnutls_recv_int(gnutls_session_t session, content_type_t type,
> gnutls_handshake_description_t htype,
> gnutls_packet_t *packet,
> uint8_t * data, size_t data_size, void *seq,
> unsigned int ms)
> {
> int ret;
> 
> if (packet == NULL && (type != GNUTLS_ALERT && type != GNUTLS_HEARTBEAT)
>    && (data_size == 0 || data == NULL))
> return gnutls_assert_val(GNUTLS_E_INVALID_REQUEST);
> 
> (sip)
> ssize_t
> gnutls_record_recv(gnutls_session_t session, void *data, size_t data_size)
> {
> return _gnutls_recv_int(session, GNUTLS_APPLICATION_DATA, -1, NULL,
> data, data_size, NULL,
> session->internals.record_timeout_ms);
> }
> (snip)
> ------------------------------------------------
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> ----- Original Message -----
>> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
>> To: "users at clusterlabs.org" <users at clusterlabs.org>
>> Cc: 
>> Date: 2015/5/11, Mon 16:45
>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
>> 
>> Hi Ulrich,
>> 
>> Thank you for comments.
>> 
>>> So your host and you resource are both named "snmp1"? I also 
>> don't 
>>> have much experience with cleaning up resources for a node that is offline. 
>> What 
>>> change should it make (while the node is offline)?
>> 
>> 
>> The name of the remote resource and the name of the remote node make same 
>> "snmp1".
>> 
>> 
>> (snip)
>> primitive snmp1 ocf:pacemaker:remote \
>>         params \
>>                 server="snmp1" \
>>         op start interval="0s" timeout="60s" 
>> on-fail="ignore" \
>>         op monitor interval="3s" timeout="15s" \
>>         op stop interval="0s" timeout="60s" 
>> on-fail="ignore"
>> 
>> primitive Host-rsc1 ocf:heartbeat:Dummy \
>>         op start interval="0s" timeout="60s" 
>> on-fail="restart" \
>>         op monitor interval="10s" timeout="60s" 
>> on-fail="restart" \
>>         op stop interval="0s" timeout="60s" 
>> on-fail="ignore"
>> 
>> primitive Remote-rsc1 ocf:heartbeat:Dummy \
>>         op start interval="0s" timeout="60s" 
>> on-fail="restart" \
>>         op monitor interval="10s" timeout="60s" 
>> on-fail="restart" \
>>         op stop interval="0s" timeout="60s" 
>> on-fail="ignore"
>> 
>> location loc1 Remote-rsc1 \
>>         rule 200: #uname eq snmp1
>> location loc3 Host-rsc1 \
>>         rule 200: #uname eq bl460g8n1
>> (snip)
>> 
>> The pacemaker_remoted of the snmp1 node stops in SIGTERM.
>> I reboot pacemaker_remoted of the snmp1 node afterwards.
>> And I execute crm_resource command, but the snmp1 node remains off-line.
>> 
>> After having executed crm_resource command, the remote node thinks that it is 
>> right movement to become the snmp1 online.
>> 
>> 
>> 
>> Best Regards,
>> Hideo Yamauchi.
>> 
>> 
>> 
>> 
>> 
>> ----- Original Message -----
>>> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
>>> Cc: 
>>> Date: 2015/5/11, Mon 15:39
>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: [Question] About movement of 
>> pacemaker_remote.
>>> 
>>>>>>   <renayama19661014 at ybb.ne.jp> schrieb am 11.05.2015 um 
>> 06:22 
>>> in Nachricht
>>> <361916.15877.qm at web200006.mail.kks.yahoo.co.jp>:
>>>>   Hi All,
>>>> 
>>>>   I matched the OS version of the remote node with a host once again and 
>> 
>>>>   confirmed it in Pacemaker1.1.13-rc2.
>>>> 
>>>>   It was the same even if I made a host RHEL7.1.(bl460g8n1)
>>>>   I made the remote host RHEL7.1.(snmp1)
>>>> 
>>>>   The first crm_resource -C fails.
>>>>   --------------------------------
>>>>   [root at bl460g8n1 ~]# crm_resource -C -r snmp1
>>>>   Cleaning up snmp1 on bl460g8n1
>>>>   Waiting for 1 replies from the CRMd. OK
>>>> 
>>>>   [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>>   Last updated: Mon May 11 12:44:31 2015
>>>>   Last change: Mon May 11 12:43:30 2015
>>>>   Stack: corosync
>>>>   Current DC: bl460g8n1 - partition WITHOUT quorum
>>>>   Version: 1.1.12-7a2e3ae
>>>>   2 Nodes configured
>>>>   3 Resources configured
>>>> 
>>>> 
>>>>   Online: [ bl460g8n1 ]
>>>>   RemoteOFFLINE: [ snmp1 ]
>>> 
>>> So your host and you resource are both named "snmp1"? I also 
>> don't 
>>> have much experience with cleaning up resources for a node that is offline. 
>> What 
>>> change should it make (while the node is offline)?
>>> 
>>>> 
>>>>   Host-rsc1      (ocf::heartbeat:Dummy): Started bl460g8n1
>>>>   Remote-rsc1    (ocf::heartbeat:Dummy): Started bl460g8n1 (failure 
>> ignored)
>>>> 
>>>>   Node Attributes:
>>>>   * Node bl460g8n1:
>>>>      + ringnumber_0                      : 192.168.101.21 is UP
>>>>      + ringnumber_1                      : 192.168.102.21 is UP
>>>> 
>>>>   Migration summary:
>>>>   * Node bl460g8n1:
>>>>     snmp1: migration-threshold=1 fail-count=1000000 
>> last-failure='Mon 
>>> May 11 
>>>>   12:44:28 2015'
>>>> 
>>>>   Failed actions:
>>>>      snmp1_start_0 on bl460g8n1 'unknown error' (1): call=5, 
>>> status=Timed 
>>>>   Out, exit-reason='none', last-rc-change='Mon May 11 
>> 12:43:31 
>>> 2015', queued=0ms, 
>>>>   exec=0ms
>>>>   --------------------------------
>>>> 
>>>> 
>>>>   The second crm_resource -C succeeded and was connected to the remote 
>> host.
>>> 
>>> Then the node was online it seems.
>>> 
>>> Regards,
>>> Ulrich
>>> 
>>>>   --------------------------------
>>>>   [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>>   Last updated: Mon May 11 12:44:54 2015
>>>>   Last change: Mon May 11 12:44:48 2015
>>>>   Stack: corosync
>>>>   Current DC: bl460g8n1 - partition WITHOUT quorum
>>>>   Version: 1.1.12-7a2e3ae
>>>>   2 Nodes configured
>>>>   3 Resources configured
>>>> 
>>>> 
>>>>   Online: [ bl460g8n1 ]
>>>>   RemoteOnline: [ snmp1 ]
>>>> 
>>>>   Host-rsc1      (ocf::heartbeat:Dummy): Started bl460g8n1
>>>>   Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
>>>>   snmp1  (ocf::pacemaker:remote):        Started bl460g8n1
>>>> 
>>>>   Node Attributes:
>>>>   * Node bl460g8n1:
>>>>      + ringnumber_0                      : 192.168.101.21 is UP
>>>>      + ringnumber_1                      : 192.168.102.21 is UP
>>>>   * Node snmp1:
>>>> 
>>>>   Migration summary:
>>>>   * Node bl460g8n1:
>>>>   * Node snmp1:
>>>>   --------------------------------
>>>> 
>>>>   The gnutls of a host and the remote node was the next version.
>>>> 
>>>>   gnutls-devel-3.3.8-12.el7.x86_64
>>>>   gnutls-dane-3.3.8-12.el7.x86_64
>>>>   gnutls-c++-3.3.8-12.el7.x86_64
>>>>   gnutls-3.3.8-12.el7.x86_64
>>>>   gnutls-utils-3.3.8-12.el7.x86_64
>>>> 
>>>> 
>>>>   Best Regards,
>>>>   Hideo Yamauchi.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>   ----- Original Message -----
>>>>>   From: "renayama19661014 at ybb.ne.jp" 
>>> <renayama19661014 at ybb.ne.jp>
>>>>>   To: Cluster Labs - All topics related to open-source clustering 
>>> welcomed 
>>>>   <users at clusterlabs.org>
>>>>>   Cc: 
>>>>>   Date: 2015/4/28, Tue 14:06
>>>>>   Subject: Re: [ClusterLabs] Antw: Re: [Question] About movement of 
>>>>   pacemaker_remote.
>>>>> 
>>>>>   Hi David,
>>>>> 
>>>>>   Even if the result changed the remote node to RHEL7.1, it was the 
>> same.
>>>>> 
>>>>> 
>>>>>   I try it with a host node of pacemaker as RHEL7.1 this time.
>>>>> 
>>>>> 
>>>>>   I noticed an interesting phenomenon.
>>>>>   The remote node fails in a reconnection in the first crm_resource.
>>>>>   However, the remote node succeeds in a reconnection in the second 
>>>>   crm_resource.
>>>>> 
>>>>>   I think that I have some problem with the point where I cut the 
>>> connection 
>>>>   with 
>>>>>   the remote node first.
>>>>> 
>>>>>   Best Regards,
>>>>>   Hideo Yamauchi.
>>>>> 
>>>>> 
>>>>>   ----- Original Message -----
>>>>>>   From: "renayama19661014 at ybb.ne.jp" 
>>>>>   <renayama19661014 at ybb.ne.jp>
>>>>>>   To: Cluster Labs - All topics related to open-source 
>> clustering 
>>> welcomed 
>>>>>   <users at clusterlabs.org>
>>>>>>   Cc: 
>>>>>>   Date: 2015/4/28, Tue 11:52
>>>>>>   Subject: Re: [ClusterLabs] Antw: Re: [Question] About 
>> movement of 
>>>>>   pacemaker_remote.
>>>>>> 
>>>>>>   Hi David,
>>>>>>   Thank you for comments.
>>>>>>>   At first glance this looks gnutls related.  GNUTLS is 
>>> returning -50 
>>>>>   during 
>>>>>>   receive
>>>>>> 
>>>>>>>   on the client side (pacemaker's side). -50 maps to 
>>> 'invalid 
>>>>>>   request'. >debug: crm_remote_recv_once:     TLS 
>> receive 
>>> failed: The 
>>>>>>   request is invalid. >We treat this error as fatal and 
>> destroy 
>>> the 
>>>>>   connection. 
>>>>>>   I've never encountered
>>>>>>>   this error and I don't know what causes it. It's 
>>> possible 
>>>>>>   there's a bug in
>>>>>>>   our gnutls usage... it's also possible there's a 
>> bug 
>>> in the 
>>>>>   version 
>>>>>>   of gnutls
>>>>>>>   that is in use as well. 
>>>>>>   We built the remote node in RHEL6.5.
>>>>>>   Because it may be a problem of gnutls, I confirm it in 
>> RHEL7.1.
>>>>>> 
>>>>>>   Best Regards,
>>>>>>   Hideo Yamauchi.
>>>>>> 
>>>>>>   _______________________________________________
>>>>>>   Users mailing list: Users at clusterlabs.org 
>>>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>>>> 
>>>>>>   Project Home: http://www.clusterlabs.org 
>>>>>>   Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>   Bugs: http://bugs.clusterlabs.org 
>>>>>> 
>>>>> 
>>>>>   _______________________________________________
>>>>>   Users mailing list: Users at clusterlabs.org 
>>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>>> 
>>>>>   Project Home: http://www.clusterlabs.org 
>>>>>   Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>   Bugs: http://bugs.clusterlabs.org 
>>>>> 
>>>> 
>>>>   _______________________________________________
>>>>   Users mailing list: Users at clusterlabs.org 
>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>> 
>>>>   Project Home: http://www.clusterlabs.org 
>>>>   Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>   Bugs: http://bugs.clusterlabs.org 
>>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org