[ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.

Tue Aug 4 05:40:13 EDT 2015

Hi Andrew,

> Do you know if this behaviour still exists?
> A LOT of work went into the remote node logic in the last couple of months, its 
> possible this was fixed as a side-effect.

It is the latest and does not confirm it.
I confirm it.

Many Thanks!
Hideo Yamauchi.

----- Original Message -----
> From: Andrew Beekhof <andrew at beekhof.net>
> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Cc: 
> Date: 2015/8/4, Tue 13:16
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
> 
> 
>>  On 12 May 2015, at 12:12 pm, renayama19661014 at ybb.ne.jp wrote:
>> 
>>  Hi All,
>> 
>>  The problem is like a buffer becoming NULL after crm_resouce -C practice 
> somehow or other after having rebooted remote node.
>> 
>>  I incorporated log in a source code and confirmed it.
>> 
>>  ------------------------------------------------
>>  crm_remote_recv_once(crm_remote_t * remote)
>>  {
>>  (snip)
>>      /* automatically grow the buffer when needed */
>>      if(remote->buffer_size < read_len) {
>>             remote->buffer_size = 2 * read_len;
>>          crm_trace("Expanding buffer to %u bytes", 
> remote->buffer_size);
>> 
>>          remote->buffer = realloc_safe(remote->buffer, 
> remote->buffer_size + 1);
>>          CRM_ASSERT(remote->buffer != NULL);
>>      }
>> 
>>  #ifdef HAVE_GNUTLS_GNUTLS_H
>>      if (remote->tls_session) {
>>          if (remote->buffer == NULL) {
>>         crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d] 
> readlen[%d]", remote->buffer_size, read_len);
>>          }
>>          rc = gnutls_record_recv(*(remote->tls_session),
>>                                  remote->buffer + 
> remote->buffer_offset,
>>                                  remote->buffer_size - 
> remote->buffer_offset);
>>  (snip)
>>  ------------------------------------------------
>> 
>>  May 12 10:54:01 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### 
> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
>>  May 12 10:54:02 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### 
> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
>>  May 12 10:54:04 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### 
> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
> 
> Do you know if this behaviour still exists?
> A LOT of work went into the remote node logic in the last couple of months, its 
> possible this was fixed as a side-effect.
> 
>> 
>>  ------------------------------------------------
>> 
>>  gnutls_record_recv processes an empty buffer and becomes the error.
>> 
>>  ------------------------------------------------
>>  (snip)
>>  ssize_t
>>  _gnutls_recv_int(gnutls_session_t session, content_type_t type,
>>  gnutls_handshake_description_t htype,
>>  gnutls_packet_t *packet,
>>  uint8_t * data, size_t data_size, void *seq,
>>  unsigned int ms)
>>  {
>>  int ret;
>> 
>>  if (packet == NULL && (type != GNUTLS_ALERT && type != 
> GNUTLS_HEARTBEAT)
>>     && (data_size == 0 || data == NULL))
>>  return gnutls_assert_val(GNUTLS_E_INVALID_REQUEST);
>> 
>>  (sip)
>>  ssize_t
>>  gnutls_record_recv(gnutls_session_t session, void *data, size_t data_size)
>>  {
>>  return _gnutls_recv_int(session, GNUTLS_APPLICATION_DATA, -1, NULL,
>>  data, data_size, NULL,
>>  session->internals.record_timeout_ms);
>>  }
>>  (snip)
>>  ------------------------------------------------
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
>> 
>> 
>> 
>>  ----- Original Message -----
>>>  From: "renayama19661014 at ybb.ne.jp" 
> <renayama19661014 at ybb.ne.jp>
>>>  To: "users at clusterlabs.org" <users at clusterlabs.org>
>>>  Cc: 
>>>  Date: 2015/5/11, Mon 16:45
>>>  Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About 
> movement of pacemaker_remote.
>>> 
>>>  Hi Ulrich,
>>> 
>>>  Thank you for comments.
>>> 
>>>>  So your host and you resource are both named "snmp1"? I 
> also 
>>>  don't 
>>>>  have much experience with cleaning up resources for a node that is 
> offline. 
>>>  What 
>>>>  change should it make (while the node is offline)?
>>> 
>>> 
>>>  The name of the remote resource and the name of the remote node make 
> same 
>>>  "snmp1".
>>> 
>>> 
>>>  (snip)
>>>  primitive snmp1 ocf:pacemaker:remote \
>>>          params \
>>>                  server="snmp1" \
>>>          op start interval="0s" timeout="60s" 
>>>  on-fail="ignore" \
>>>          op monitor interval="3s" timeout="15s" 
> \
>>>          op stop interval="0s" timeout="60s" 
>>>  on-fail="ignore"
>>> 
>>>  primitive Host-rsc1 ocf:heartbeat:Dummy \
>>>          op start interval="0s" timeout="60s" 
>>>  on-fail="restart" \
>>>          op monitor interval="10s" timeout="60s" 
>>>  on-fail="restart" \
>>>          op stop interval="0s" timeout="60s" 
>>>  on-fail="ignore"
>>> 
>>>  primitive Remote-rsc1 ocf:heartbeat:Dummy \
>>>          op start interval="0s" timeout="60s" 
>>>  on-fail="restart" \
>>>          op monitor interval="10s" timeout="60s" 
>>>  on-fail="restart" \
>>>          op stop interval="0s" timeout="60s" 
>>>  on-fail="ignore"
>>> 
>>>  location loc1 Remote-rsc1 \
>>>          rule 200: #uname eq snmp1
>>>  location loc3 Host-rsc1 \
>>>          rule 200: #uname eq bl460g8n1
>>>  (snip)
>>> 
>>>  The pacemaker_remoted of the snmp1 node stops in SIGTERM.
>>>  I reboot pacemaker_remoted of the snmp1 node afterwards.
>>>  And I execute crm_resource command, but the snmp1 node remains 
> off-line.
>>> 
>>>  After having executed crm_resource command, the remote node thinks that 
> it is 
>>>  right movement to become the snmp1 online.
>>> 
>>> 
>>> 
>>>  Best Regards,
>>>  Hideo Yamauchi.
>>> 
>>> 
>>> 
>>> 
>>> 
>>>  ----- Original Message -----
>>>>  From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>>>  To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
>>>>  Cc: 
>>>>  Date: 2015/5/11, Mon 15:39
>>>>  Subject: Antw: Re: [ClusterLabs] Antw: Re: [Question] About 
> movement of 
>>>  pacemaker_remote.
>>>> 
>>>>>>>    <renayama19661014 at ybb.ne.jp> schrieb am 
> 11.05.2015 um 
>>>  06:22 
>>>>  in Nachricht
>>>>  <361916.15877.qm at web200006.mail.kks.yahoo.co.jp>:
>>>>>    Hi All,
>>>>> 
>>>>>    I matched the OS version of the remote node with a host once 
> again and 
>>> 
>>>>>    confirmed it in Pacemaker1.1.13-rc2.
>>>>> 
>>>>>    It was the same even if I made a host RHEL7.1.(bl460g8n1)
>>>>>    I made the remote host RHEL7.1.(snmp1)
>>>>> 
>>>>>    The first crm_resource -C fails.
>>>>>    --------------------------------
>>>>>    [root at bl460g8n1 ~]# crm_resource -C -r snmp1
>>>>>    Cleaning up snmp1 on bl460g8n1
>>>>>    Waiting for 1 replies from the CRMd. OK
>>>>> 
>>>>>    [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>>>    Last updated: Mon May 11 12:44:31 2015
>>>>>    Last change: Mon May 11 12:43:30 2015
>>>>>    Stack: corosync
>>>>>    Current DC: bl460g8n1 - partition WITHOUT quorum
>>>>>    Version: 1.1.12-7a2e3ae
>>>>>    2 Nodes configured
>>>>>    3 Resources configured
>>>>> 
>>>>> 
>>>>>    Online: [ bl460g8n1 ]
>>>>>    RemoteOFFLINE: [ snmp1 ]
>>>> 
>>>>  So your host and you resource are both named "snmp1"? I 
> also 
>>>  don't 
>>>>  have much experience with cleaning up resources for a node that is 
> offline. 
>>>  What 
>>>>  change should it make (while the node is offline)?
>>>> 
>>>>> 
>>>>>    Host-rsc1      (ocf::heartbeat:Dummy): Started bl460g8n1
>>>>>    Remote-rsc1    (ocf::heartbeat:Dummy): Started bl460g8n1 
> (failure 
>>>  ignored)
>>>>> 
>>>>>    Node Attributes:
>>>>>    * Node bl460g8n1:
>>>>>       + ringnumber_0                      : 192.168.101.21 is UP
>>>>>       + ringnumber_1                      : 192.168.102.21 is UP
>>>>> 
>>>>>    Migration summary:
>>>>>    * Node bl460g8n1:
>>>>>      snmp1: migration-threshold=1 fail-count=1000000 
>>>  last-failure='Mon 
>>>>  May 11 
>>>>>    12:44:28 2015'
>>>>> 
>>>>>    Failed actions:
>>>>>       snmp1_start_0 on bl460g8n1 'unknown error' (1): 
> call=5, 
>>>>  status=Timed 
>>>>>    Out, exit-reason='none', last-rc-change='Mon May 
> 11 
>>>  12:43:31 
>>>>  2015', queued=0ms, 
>>>>>    exec=0ms
>>>>>    --------------------------------
>>>>> 
>>>>> 
>>>>>    The second crm_resource -C succeeded and was connected to the 
> remote 
>>>  host.
>>>> 
>>>>  Then the node was online it seems.
>>>> 
>>>>  Regards,
>>>>  Ulrich
>>>> 
>>>>>    --------------------------------
>>>>>    [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>>>    Last updated: Mon May 11 12:44:54 2015
>>>>>    Last change: Mon May 11 12:44:48 2015
>>>>>    Stack: corosync
>>>>>    Current DC: bl460g8n1 - partition WITHOUT quorum
>>>>>    Version: 1.1.12-7a2e3ae
>>>>>    2 Nodes configured
>>>>>    3 Resources configured
>>>>> 
>>>>> 
>>>>>    Online: [ bl460g8n1 ]
>>>>>    RemoteOnline: [ snmp1 ]
>>>>> 
>>>>>    Host-rsc1      (ocf::heartbeat:Dummy): Started bl460g8n1
>>>>>    Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
>>>>>    snmp1  (ocf::pacemaker:remote):        Started bl460g8n1
>>>>> 
>>>>>    Node Attributes:
>>>>>    * Node bl460g8n1:
>>>>>       + ringnumber_0                      : 192.168.101.21 is UP
>>>>>       + ringnumber_1                      : 192.168.102.21 is UP
>>>>>    * Node snmp1:
>>>>> 
>>>>>    Migration summary:
>>>>>    * Node bl460g8n1:
>>>>>    * Node snmp1:
>>>>>    --------------------------------
>>>>> 
>>>>>    The gnutls of a host and the remote node was the next 
> version.
>>>>> 
>>>>>    gnutls-devel-3.3.8-12.el7.x86_64
>>>>>    gnutls-dane-3.3.8-12.el7.x86_64
>>>>>    gnutls-c++-3.3.8-12.el7.x86_64
>>>>>    gnutls-3.3.8-12.el7.x86_64
>>>>>    gnutls-utils-3.3.8-12.el7.x86_64
>>>>> 
>>>>> 
>>>>>    Best Regards,
>>>>>    Hideo Yamauchi.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>    ----- Original Message -----
>>>>>>    From: "renayama19661014 at ybb.ne.jp" 
>>>>  <renayama19661014 at ybb.ne.jp>
>>>>>>    To: Cluster Labs - All topics related to open-source 
> clustering 
>>>>  welcomed 
>>>>>    <users at clusterlabs.org>
>>>>>>    Cc: 
>>>>>>    Date: 2015/4/28, Tue 14:06
>>>>>>    Subject: Re: [ClusterLabs] Antw: Re: [Question] About 
> movement of 
>>>>>    pacemaker_remote.
>>>>>> 
>>>>>>    Hi David,
>>>>>> 
>>>>>>    Even if the result changed the remote node to RHEL7.1, it 
> was the 
>>>  same.
>>>>>> 
>>>>>> 
>>>>>>    I try it with a host node of pacemaker as RHEL7.1 this 
> time.
>>>>>> 
>>>>>> 
>>>>>>    I noticed an interesting phenomenon.
>>>>>>    The remote node fails in a reconnection in the first 
> crm_resource.
>>>>>>    However, the remote node succeeds in a reconnection in 
> the second 
>>>>>    crm_resource.
>>>>>> 
>>>>>>    I think that I have some problem with the point where I 
> cut the 
>>>>  connection 
>>>>>    with 
>>>>>>    the remote node first.
>>>>>> 
>>>>>>    Best Regards,
>>>>>>    Hideo Yamauchi.
>>>>>> 
>>>>>> 
>>>>>>    ----- Original Message -----
>>>>>>>    From: "renayama19661014 at ybb.ne.jp" 
>>>>>>    <renayama19661014 at ybb.ne.jp>
>>>>>>>    To: Cluster Labs - All topics related to open-source 
>>>  clustering 
>>>>  welcomed 
>>>>>>    <users at clusterlabs.org>
>>>>>>>    Cc: 
>>>>>>>    Date: 2015/4/28, Tue 11:52
>>>>>>>    Subject: Re: [ClusterLabs] Antw: Re: [Question] About 
> 
>>>  movement of 
>>>>>>    pacemaker_remote.
>>>>>>> 
>>>>>>>    Hi David,
>>>>>>>    Thank you for comments.
>>>>>>>>    At first glance this looks gnutls related.  
> GNUTLS is 
>>>>  returning -50 
>>>>>>    during 
>>>>>>>    receive
>>>>>>> 
>>>>>>>>    on the client side (pacemaker's side). -50 
> maps to 
>>>>  'invalid 
>>>>>>>    request'. >debug: crm_remote_recv_once:    
> TLS 
>>>  receive 
>>>>  failed: The 
>>>>>>>    request is invalid. >We treat this error as fatal 
> and 
>>>  destroy 
>>>>  the 
>>>>>>    connection. 
>>>>>>>    I've never encountered
>>>>>>>>    this error and I don't know what causes it. 
> It's 
>>>>  possible 
>>>>>>>    there's a bug in
>>>>>>>>    our gnutls usage... it's also possible 
> there's a 
>>>  bug 
>>>>  in the 
>>>>>>    version 
>>>>>>>    of gnutls
>>>>>>>>    that is in use as well. 
>>>>>>>    We built the remote node in RHEL6.5.
>>>>>>>    Because it may be a problem of gnutls, I confirm it 
> in 
>>>  RHEL7.1.
>>>>>>> 
>>>>>>>    Best Regards,
>>>>>>>    Hideo Yamauchi.
>>>>>>> 
>>>>>>>    _______________________________________________
>>>>>>>    Users mailing list: Users at clusterlabs.org 
>>>>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>>>>> 
>>>>>>>    Project Home: http://www.clusterlabs.org 
>>>>>>>    Getting started: 
>>>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>>    Bugs: http://bugs.clusterlabs.org 
>>>>>>> 
>>>>>> 
>>>>>>    _______________________________________________
>>>>>>    Users mailing list: Users at clusterlabs.org 
>>>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>>>> 
>>>>>>    Project Home: http://www.clusterlabs.org 
>>>>>>    Getting started: 
>>>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>    Bugs: http://bugs.clusterlabs.org 
>>>>>> 
>>>>> 
>>>>>    _______________________________________________
>>>>>    Users mailing list: Users at clusterlabs.org 
>>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>>> 
>>>>>    Project Home: http://www.clusterlabs.org 
>>>>>    Getting started: 
>>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>    Bugs: http://bugs.clusterlabs.org 
>>>> 
>>> 
>>>  _______________________________________________
>>>  Users mailing list: Users at clusterlabs.org
>>>  http://clusterlabs.org/mailman/listinfo/users
>>> 
>>>  Project Home: http://www.clusterlabs.org
>>>  Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>>  _______________________________________________
>>  Users mailing list: Users at clusterlabs.org
>>  http://clusterlabs.org/mailman/listinfo/users
>> 
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
>