[ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.

Tue May 12 02:12:55 UTC 2015

Hi All,

The problem is like a buffer becoming NULL after crm_resouce -C practice somehow or other after having rebooted remote node.

I incorporated log in a source code and confirmed it.

------------------------------------------------
crm_remote_recv_once(crm_remote_t * remote)
{
(snip)
    /* automatically grow the buffer when needed */
    if(remote->buffer_size < read_len) {
           remote->buffer_size = 2 * read_len;
        crm_trace("Expanding buffer to %u bytes", remote->buffer_size);

        remote->buffer = realloc_safe(remote->buffer, remote->buffer_size + 1);
        CRM_ASSERT(remote->buffer != NULL);
    }

#ifdef HAVE_GNUTLS_GNUTLS_H
    if (remote->tls_session) {
        if (remote->buffer == NULL) {
        crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d] readlen[%d]", remote->buffer_size, read_len);
        }
        rc = gnutls_record_recv(*(remote->tls_session),
                                remote->buffer + remote->buffer_offset,
                                remote->buffer_size - remote->buffer_offset);
(snip)
------------------------------------------------

May 12 10:54:01 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
May 12 10:54:02 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
May 12 10:54:04 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]

------------------------------------------------

gnutls_record_recv processes an empty buffer and becomes the error.

------------------------------------------------
(snip)
ssize_t
_gnutls_recv_int(gnutls_session_t session, content_type_t type,
 gnutls_handshake_description_t htype,
 gnutls_packet_t *packet,
 uint8_t * data, size_t data_size, void *seq,
 unsigned int ms)
{
int ret;

if (packet == NULL && (type != GNUTLS_ALERT && type != GNUTLS_HEARTBEAT)
    && (data_size == 0 || data == NULL))
return gnutls_assert_val(GNUTLS_E_INVALID_REQUEST);

(sip)
ssize_t
gnutls_record_recv(gnutls_session_t session, void *data, size_t data_size)
{
return _gnutls_recv_int(session, GNUTLS_APPLICATION_DATA, -1, NULL,
data, data_size, NULL,
session->internals.record_timeout_ms);
}
(snip)
------------------------------------------------

Best Regards,
Hideo Yamauchi.

----- Original Message -----
> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> To: "users at clusterlabs.org" <users at clusterlabs.org>
> Cc: 
> Date: 2015/5/11, Mon 16:45
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
> 
> Hi Ulrich,
> 
> Thank you for comments.
> 
>>  So your host and you resource are both named "snmp1"? I also 
> don't 
>>  have much experience with cleaning up resources for a node that is offline. 
> What 
>>  change should it make (while the node is offline)?
> 
> 
> The name of the remote resource and the name of the remote node make same 
> "snmp1".
> 
> 
> (snip)
> primitive snmp1 ocf:pacemaker:remote \
>         params \
>                 server="snmp1" \
>         op start interval="0s" timeout="60s" 
> on-fail="ignore" \
>         op monitor interval="3s" timeout="15s" \
>         op stop interval="0s" timeout="60s" 
> on-fail="ignore"
> 
> primitive Host-rsc1 ocf:heartbeat:Dummy \
>         op start interval="0s" timeout="60s" 
> on-fail="restart" \
>         op monitor interval="10s" timeout="60s" 
> on-fail="restart" \
>         op stop interval="0s" timeout="60s" 
> on-fail="ignore"
> 
> primitive Remote-rsc1 ocf:heartbeat:Dummy \
>         op start interval="0s" timeout="60s" 
> on-fail="restart" \
>         op monitor interval="10s" timeout="60s" 
> on-fail="restart" \
>         op stop interval="0s" timeout="60s" 
> on-fail="ignore"
> 
> location loc1 Remote-rsc1 \
>         rule 200: #uname eq snmp1
> location loc3 Host-rsc1 \
>         rule 200: #uname eq bl460g8n1
> (snip)
> 
> The pacemaker_remoted of the snmp1 node stops in SIGTERM.
> I reboot pacemaker_remoted of the snmp1 node afterwards.
> And I execute crm_resource command, but the snmp1 node remains off-line.
> 
> After having executed crm_resource command, the remote node thinks that it is 
> right movement to become the snmp1 online.
> 
> 
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> 
> 
> ----- Original Message -----
>>  From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>  To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
>>  Cc: 
>>  Date: 2015/5/11, Mon 15:39
>>  Subject: Antw: Re: [ClusterLabs] Antw: Re: [Question] About movement of 
> pacemaker_remote.
>> 
>>>>>   <renayama19661014 at ybb.ne.jp> schrieb am 11.05.2015 um 
> 06:22 
>>  in Nachricht
>>  <361916.15877.qm at web200006.mail.kks.yahoo.co.jp>:
>>>   Hi All,
>>> 
>>>   I matched the OS version of the remote node with a host once again and 
> 
>>>   confirmed it in Pacemaker1.1.13-rc2.
>>> 
>>>   It was the same even if I made a host RHEL7.1.(bl460g8n1)
>>>   I made the remote host RHEL7.1.(snmp1)
>>> 
>>>   The first crm_resource -C fails.
>>>   --------------------------------
>>>   [root at bl460g8n1 ~]# crm_resource -C -r snmp1
>>>   Cleaning up snmp1 on bl460g8n1
>>>   Waiting for 1 replies from the CRMd. OK
>>> 
>>>   [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>   Last updated: Mon May 11 12:44:31 2015
>>>   Last change: Mon May 11 12:43:30 2015
>>>   Stack: corosync
>>>   Current DC: bl460g8n1 - partition WITHOUT quorum
>>>   Version: 1.1.12-7a2e3ae
>>>   2 Nodes configured
>>>   3 Resources configured
>>> 
>>> 
>>>   Online: [ bl460g8n1 ]
>>>   RemoteOFFLINE: [ snmp1 ]
>> 
>>  So your host and you resource are both named "snmp1"? I also 
> don't 
>>  have much experience with cleaning up resources for a node that is offline. 
> What 
>>  change should it make (while the node is offline)?
>> 
>>> 
>>>    Host-rsc1      (ocf::heartbeat:Dummy): Started bl460g8n1
>>>    Remote-rsc1    (ocf::heartbeat:Dummy): Started bl460g8n1 (failure 
> ignored)
>>> 
>>>   Node Attributes:
>>>   * Node bl460g8n1:
>>>       + ringnumber_0                      : 192.168.101.21 is UP
>>>       + ringnumber_1                      : 192.168.102.21 is UP
>>> 
>>>   Migration summary:
>>>   * Node bl460g8n1:
>>>      snmp1: migration-threshold=1 fail-count=1000000 
> last-failure='Mon 
>>  May 11 
>>>   12:44:28 2015'
>>> 
>>>   Failed actions:
>>>       snmp1_start_0 on bl460g8n1 'unknown error' (1): call=5, 
>>  status=Timed 
>>>   Out, exit-reason='none', last-rc-change='Mon May 11 
> 12:43:31 
>>  2015', queued=0ms, 
>>>   exec=0ms
>>>   --------------------------------
>>> 
>>> 
>>>   The second crm_resource -C succeeded and was connected to the remote 
> host.
>> 
>>  Then the node was online it seems.
>> 
>>  Regards,
>>  Ulrich
>> 
>>>   --------------------------------
>>>   [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>   Last updated: Mon May 11 12:44:54 2015
>>>   Last change: Mon May 11 12:44:48 2015
>>>   Stack: corosync
>>>   Current DC: bl460g8n1 - partition WITHOUT quorum
>>>   Version: 1.1.12-7a2e3ae
>>>   2 Nodes configured
>>>   3 Resources configured
>>> 
>>> 
>>>   Online: [ bl460g8n1 ]
>>>   RemoteOnline: [ snmp1 ]
>>> 
>>>    Host-rsc1      (ocf::heartbeat:Dummy): Started bl460g8n1
>>>    Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
>>>    snmp1  (ocf::pacemaker:remote):        Started bl460g8n1
>>> 
>>>   Node Attributes:
>>>   * Node bl460g8n1:
>>>       + ringnumber_0                      : 192.168.101.21 is UP
>>>       + ringnumber_1                      : 192.168.102.21 is UP
>>>   * Node snmp1:
>>> 
>>>   Migration summary:
>>>   * Node bl460g8n1:
>>>   * Node snmp1:
>>>   --------------------------------
>>> 
>>>   The gnutls of a host and the remote node was the next version.
>>> 
>>>   gnutls-devel-3.3.8-12.el7.x86_64
>>>   gnutls-dane-3.3.8-12.el7.x86_64
>>>   gnutls-c++-3.3.8-12.el7.x86_64
>>>   gnutls-3.3.8-12.el7.x86_64
>>>   gnutls-utils-3.3.8-12.el7.x86_64
>>> 
>>> 
>>>   Best Regards,
>>>   Hideo Yamauchi.
>>> 
>>> 
>>> 
>>> 
>>>   ----- Original Message -----
>>>>   From: "renayama19661014 at ybb.ne.jp" 
>>  <renayama19661014 at ybb.ne.jp>
>>>>   To: Cluster Labs - All topics related to open-source clustering 
>>  welcomed 
>>>   <users at clusterlabs.org>
>>>>   Cc: 
>>>>   Date: 2015/4/28, Tue 14:06
>>>>   Subject: Re: [ClusterLabs] Antw: Re: [Question] About movement of 
>>>   pacemaker_remote.
>>>> 
>>>>   Hi David,
>>>> 
>>>>   Even if the result changed the remote node to RHEL7.1, it was the 
> same.
>>>> 
>>>> 
>>>>   I try it with a host node of pacemaker as RHEL7.1 this time.
>>>> 
>>>> 
>>>>   I noticed an interesting phenomenon.
>>>>   The remote node fails in a reconnection in the first crm_resource.
>>>>   However, the remote node succeeds in a reconnection in the second 
>>>   crm_resource.
>>>> 
>>>>   I think that I have some problem with the point where I cut the 
>>  connection 
>>>   with 
>>>>   the remote node first.
>>>> 
>>>>   Best Regards,
>>>>   Hideo Yamauchi.
>>>> 
>>>> 
>>>>   ----- Original Message -----
>>>>>    From: "renayama19661014 at ybb.ne.jp" 
>>>>   <renayama19661014 at ybb.ne.jp>
>>>>>    To: Cluster Labs - All topics related to open-source 
> clustering 
>>  welcomed 
>>>>   <users at clusterlabs.org>
>>>>>    Cc: 
>>>>>    Date: 2015/4/28, Tue 11:52
>>>>>    Subject: Re: [ClusterLabs] Antw: Re: [Question] About 
> movement of 
>>>>   pacemaker_remote.
>>>>> 
>>>>>    Hi David,
>>>>>    Thank you for comments.
>>>>>>    At first glance this looks gnutls related.  GNUTLS is 
>>  returning -50 
>>>>   during 
>>>>>    receive
>>>>> 
>>>>>>    on the client side (pacemaker's side). -50 maps to 
>>  'invalid 
>>>>>    request'. >debug: crm_remote_recv_once:     TLS 
> receive 
>>  failed: The 
>>>>>    request is invalid. >We treat this error as fatal and 
> destroy 
>>  the 
>>>>   connection. 
>>>>>    I've never encountered
>>>>>>    this error and I don't know what causes it. It's 
>>  possible 
>>>>>    there's a bug in
>>>>>>    our gnutls usage... it's also possible there's a 
> bug 
>>  in the 
>>>>   version 
>>>>>    of gnutls
>>>>>>    that is in use as well. 
>>>>>    We built the remote node in RHEL6.5.
>>>>>    Because it may be a problem of gnutls, I confirm it in 
> RHEL7.1.
>>>>> 
>>>>>    Best Regards,
>>>>>    Hideo Yamauchi.
>>>>> 
>>>>>    _______________________________________________
>>>>>    Users mailing list: Users at clusterlabs.org 
>>>>>    http://clusterlabs.org/mailman/listinfo/users 
>>>>> 
>>>>>    Project Home: http://www.clusterlabs.org 
>>>>>    Getting started: 
>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>    Bugs: http://bugs.clusterlabs.org 
>>>>> 
>>>> 
>>>>   _______________________________________________
>>>>   Users mailing list: Users at clusterlabs.org 
>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>> 
>>>>   Project Home: http://www.clusterlabs.org 
>>>>   Getting started: 
>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>   Bugs: http://bugs.clusterlabs.org 
>>>> 
>>> 
>>>   _______________________________________________
>>>   Users mailing list: Users at clusterlabs.org 
>>>   http://clusterlabs.org/mailman/listinfo/users 
>>> 
>>>   Project Home: http://www.clusterlabs.org 
>>>   Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>   Bugs: http://bugs.clusterlabs.org 
>> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>