[ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Tue Aug 4 05:40:13 EDT 2015
Hi Andrew,
> Do you know if this behaviour still exists?
> A LOT of work went into the remote node logic in the last couple of months, its
> possible this was fixed as a side-effect.
It is the latest and does not confirm it.
I confirm it.
Many Thanks!
Hideo Yamauchi.
----- Original Message -----
> From: Andrew Beekhof <andrew at beekhof.net>
> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Cc:
> Date: 2015/8/4, Tue 13:16
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
>
>
>> On 12 May 2015, at 12:12 pm, renayama19661014 at ybb.ne.jp wrote:
>>
>> Hi All,
>>
>> The problem is like a buffer becoming NULL after crm_resouce -C practice
> somehow or other after having rebooted remote node.
>>
>> I incorporated log in a source code and confirmed it.
>>
>> ------------------------------------------------
>> crm_remote_recv_once(crm_remote_t * remote)
>> {
>> (snip)
>> /* automatically grow the buffer when needed */
>> if(remote->buffer_size < read_len) {
>> remote->buffer_size = 2 * read_len;
>> crm_trace("Expanding buffer to %u bytes",
> remote->buffer_size);
>>
>> remote->buffer = realloc_safe(remote->buffer,
> remote->buffer_size + 1);
>> CRM_ASSERT(remote->buffer != NULL);
>> }
>>
>> #ifdef HAVE_GNUTLS_GNUTLS_H
>> if (remote->tls_session) {
>> if (remote->buffer == NULL) {
>> crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d]
> readlen[%d]", remote->buffer_size, read_len);
>> }
>> rc = gnutls_record_recv(*(remote->tls_session),
>> remote->buffer +
> remote->buffer_offset,
>> remote->buffer_size -
> remote->buffer_offset);
>> (snip)
>> ------------------------------------------------
>>
>> May 12 10:54:01 sl7-01 crmd[30447]: info: crm_remote_recv_once: ###
> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
>> May 12 10:54:02 sl7-01 crmd[30447]: info: crm_remote_recv_once: ###
> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
>> May 12 10:54:04 sl7-01 crmd[30447]: info: crm_remote_recv_once: ###
> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
>
> Do you know if this behaviour still exists?
> A LOT of work went into the remote node logic in the last couple of months, its
> possible this was fixed as a side-effect.
>
>>
>> ------------------------------------------------
>>
>> gnutls_record_recv processes an empty buffer and becomes the error.
>>
>> ------------------------------------------------
>> (snip)
>> ssize_t
>> _gnutls_recv_int(gnutls_session_t session, content_type_t type,
>> gnutls_handshake_description_t htype,
>> gnutls_packet_t *packet,
>> uint8_t * data, size_t data_size, void *seq,
>> unsigned int ms)
>> {
>> int ret;
>>
>> if (packet == NULL && (type != GNUTLS_ALERT && type !=
> GNUTLS_HEARTBEAT)
>> && (data_size == 0 || data == NULL))
>> return gnutls_assert_val(GNUTLS_E_INVALID_REQUEST);
>>
>> (sip)
>> ssize_t
>> gnutls_record_recv(gnutls_session_t session, void *data, size_t data_size)
>> {
>> return _gnutls_recv_int(session, GNUTLS_APPLICATION_DATA, -1, NULL,
>> data, data_size, NULL,
>> session->internals.record_timeout_ms);
>> }
>> (snip)
>> ------------------------------------------------
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>>
>> ----- Original Message -----
>>> From: "renayama19661014 at ybb.ne.jp"
> <renayama19661014 at ybb.ne.jp>
>>> To: "users at clusterlabs.org" <users at clusterlabs.org>
>>> Cc:
>>> Date: 2015/5/11, Mon 16:45
>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About
> movement of pacemaker_remote.
>>>
>>> Hi Ulrich,
>>>
>>> Thank you for comments.
>>>
>>>> So your host and you resource are both named "snmp1"? I
> also
>>> don't
>>>> have much experience with cleaning up resources for a node that is
> offline.
>>> What
>>>> change should it make (while the node is offline)?
>>>
>>>
>>> The name of the remote resource and the name of the remote node make
> same
>>> "snmp1".
>>>
>>>
>>> (snip)
>>> primitive snmp1 ocf:pacemaker:remote \
>>> params \
>>> server="snmp1" \
>>> op start interval="0s" timeout="60s"
>>> on-fail="ignore" \
>>> op monitor interval="3s" timeout="15s"
> \
>>> op stop interval="0s" timeout="60s"
>>> on-fail="ignore"
>>>
>>> primitive Host-rsc1 ocf:heartbeat:Dummy \
>>> op start interval="0s" timeout="60s"
>>> on-fail="restart" \
>>> op monitor interval="10s" timeout="60s"
>>> on-fail="restart" \
>>> op stop interval="0s" timeout="60s"
>>> on-fail="ignore"
>>>
>>> primitive Remote-rsc1 ocf:heartbeat:Dummy \
>>> op start interval="0s" timeout="60s"
>>> on-fail="restart" \
>>> op monitor interval="10s" timeout="60s"
>>> on-fail="restart" \
>>> op stop interval="0s" timeout="60s"
>>> on-fail="ignore"
>>>
>>> location loc1 Remote-rsc1 \
>>> rule 200: #uname eq snmp1
>>> location loc3 Host-rsc1 \
>>> rule 200: #uname eq bl460g8n1
>>> (snip)
>>>
>>> The pacemaker_remoted of the snmp1 node stops in SIGTERM.
>>> I reboot pacemaker_remoted of the snmp1 node afterwards.
>>> And I execute crm_resource command, but the snmp1 node remains
> off-line.
>>>
>>> After having executed crm_resource command, the remote node thinks that
> it is
>>> right movement to become the snmp1 online.
>>>
>>>
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>>> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
>>>> Cc:
>>>> Date: 2015/5/11, Mon 15:39
>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: [Question] About
> movement of
>>> pacemaker_remote.
>>>>
>>>>>>> <renayama19661014 at ybb.ne.jp> schrieb am
> 11.05.2015 um
>>> 06:22
>>>> in Nachricht
>>>> <361916.15877.qm at web200006.mail.kks.yahoo.co.jp>:
>>>>> Hi All,
>>>>>
>>>>> I matched the OS version of the remote node with a host once
> again and
>>>
>>>>> confirmed it in Pacemaker1.1.13-rc2.
>>>>>
>>>>> It was the same even if I made a host RHEL7.1.(bl460g8n1)
>>>>> I made the remote host RHEL7.1.(snmp1)
>>>>>
>>>>> The first crm_resource -C fails.
>>>>> --------------------------------
>>>>> [root at bl460g8n1 ~]# crm_resource -C -r snmp1
>>>>> Cleaning up snmp1 on bl460g8n1
>>>>> Waiting for 1 replies from the CRMd. OK
>>>>>
>>>>> [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>>> Last updated: Mon May 11 12:44:31 2015
>>>>> Last change: Mon May 11 12:43:30 2015
>>>>> Stack: corosync
>>>>> Current DC: bl460g8n1 - partition WITHOUT quorum
>>>>> Version: 1.1.12-7a2e3ae
>>>>> 2 Nodes configured
>>>>> 3 Resources configured
>>>>>
>>>>>
>>>>> Online: [ bl460g8n1 ]
>>>>> RemoteOFFLINE: [ snmp1 ]
>>>>
>>>> So your host and you resource are both named "snmp1"? I
> also
>>> don't
>>>> have much experience with cleaning up resources for a node that is
> offline.
>>> What
>>>> change should it make (while the node is offline)?
>>>>
>>>>>
>>>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1
>>>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1
> (failure
>>> ignored)
>>>>>
>>>>> Node Attributes:
>>>>> * Node bl460g8n1:
>>>>> + ringnumber_0 : 192.168.101.21 is UP
>>>>> + ringnumber_1 : 192.168.102.21 is UP
>>>>>
>>>>> Migration summary:
>>>>> * Node bl460g8n1:
>>>>> snmp1: migration-threshold=1 fail-count=1000000
>>> last-failure='Mon
>>>> May 11
>>>>> 12:44:28 2015'
>>>>>
>>>>> Failed actions:
>>>>> snmp1_start_0 on bl460g8n1 'unknown error' (1):
> call=5,
>>>> status=Timed
>>>>> Out, exit-reason='none', last-rc-change='Mon May
> 11
>>> 12:43:31
>>>> 2015', queued=0ms,
>>>>> exec=0ms
>>>>> --------------------------------
>>>>>
>>>>>
>>>>> The second crm_resource -C succeeded and was connected to the
> remote
>>> host.
>>>>
>>>> Then the node was online it seems.
>>>>
>>>> Regards,
>>>> Ulrich
>>>>
>>>>> --------------------------------
>>>>> [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>>> Last updated: Mon May 11 12:44:54 2015
>>>>> Last change: Mon May 11 12:44:48 2015
>>>>> Stack: corosync
>>>>> Current DC: bl460g8n1 - partition WITHOUT quorum
>>>>> Version: 1.1.12-7a2e3ae
>>>>> 2 Nodes configured
>>>>> 3 Resources configured
>>>>>
>>>>>
>>>>> Online: [ bl460g8n1 ]
>>>>> RemoteOnline: [ snmp1 ]
>>>>>
>>>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1
>>>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
>>>>> snmp1 (ocf::pacemaker:remote): Started bl460g8n1
>>>>>
>>>>> Node Attributes:
>>>>> * Node bl460g8n1:
>>>>> + ringnumber_0 : 192.168.101.21 is UP
>>>>> + ringnumber_1 : 192.168.102.21 is UP
>>>>> * Node snmp1:
>>>>>
>>>>> Migration summary:
>>>>> * Node bl460g8n1:
>>>>> * Node snmp1:
>>>>> --------------------------------
>>>>>
>>>>> The gnutls of a host and the remote node was the next
> version.
>>>>>
>>>>> gnutls-devel-3.3.8-12.el7.x86_64
>>>>> gnutls-dane-3.3.8-12.el7.x86_64
>>>>> gnutls-c++-3.3.8-12.el7.x86_64
>>>>> gnutls-3.3.8-12.el7.x86_64
>>>>> gnutls-utils-3.3.8-12.el7.x86_64
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Hideo Yamauchi.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "renayama19661014 at ybb.ne.jp"
>>>> <renayama19661014 at ybb.ne.jp>
>>>>>> To: Cluster Labs - All topics related to open-source
> clustering
>>>> welcomed
>>>>> <users at clusterlabs.org>
>>>>>> Cc:
>>>>>> Date: 2015/4/28, Tue 14:06
>>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question] About
> movement of
>>>>> pacemaker_remote.
>>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> Even if the result changed the remote node to RHEL7.1, it
> was the
>>> same.
>>>>>>
>>>>>>
>>>>>> I try it with a host node of pacemaker as RHEL7.1 this
> time.
>>>>>>
>>>>>>
>>>>>> I noticed an interesting phenomenon.
>>>>>> The remote node fails in a reconnection in the first
> crm_resource.
>>>>>> However, the remote node succeeds in a reconnection in
> the second
>>>>> crm_resource.
>>>>>>
>>>>>> I think that I have some problem with the point where I
> cut the
>>>> connection
>>>>> with
>>>>>> the remote node first.
>>>>>>
>>>>>> Best Regards,
>>>>>> Hideo Yamauchi.
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "renayama19661014 at ybb.ne.jp"
>>>>>> <renayama19661014 at ybb.ne.jp>
>>>>>>> To: Cluster Labs - All topics related to open-source
>>> clustering
>>>> welcomed
>>>>>> <users at clusterlabs.org>
>>>>>>> Cc:
>>>>>>> Date: 2015/4/28, Tue 11:52
>>>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question] About
>
>>> movement of
>>>>>> pacemaker_remote.
>>>>>>>
>>>>>>> Hi David,
>>>>>>> Thank you for comments.
>>>>>>>> At first glance this looks gnutls related.
> GNUTLS is
>>>> returning -50
>>>>>> during
>>>>>>> receive
>>>>>>>
>>>>>>>> on the client side (pacemaker's side). -50
> maps to
>>>> 'invalid
>>>>>>> request'. >debug: crm_remote_recv_once:
> TLS
>>> receive
>>>> failed: The
>>>>>>> request is invalid. >We treat this error as fatal
> and
>>> destroy
>>>> the
>>>>>> connection.
>>>>>>> I've never encountered
>>>>>>>> this error and I don't know what causes it.
> It's
>>>> possible
>>>>>>> there's a bug in
>>>>>>>> our gnutls usage... it's also possible
> there's a
>>> bug
>>>> in the
>>>>>> version
>>>>>>> of gnutls
>>>>>>>> that is in use as well.
>>>>>>> We built the remote node in RHEL6.5.
>>>>>>> Because it may be a problem of gnutls, I confirm it
> in
>>> RHEL7.1.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Hideo Yamauchi.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list