[ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Wed Aug 5 02:00:50 EDT 2015
Hi Andrew,
>> Do you know if this behaviour still exists?
>> A LOT of work went into the remote node logic in the last couple of months,
> its
>> possible this was fixed as a side-effect.
>
>
> It is the latest and does not confirm it.
> I confirm it.
I confirmed it in latest Pacemaker.(pacemaker-eefdc909a41b571dc2e155f7b14b5ef0368f2de7)
After all the phenomenon occurs.
In the first clean up, pacemaker fails in connection with pacemaker_remote.
The second succeeds.
The problem does not seem to be settled somehow or other.
It was the latest and incorporated my log again.
-------
(snip)
static size_tcrm_remote_recv_once(crm_remote_t * remote){ int rc = 0;
size_t read_len = sizeof(struct crm_remote_header_v0);
struct crm_remote_header_v0 *header = crm_remote_header(remote);
if(header) {
/* Stop at the end of the current message */
read_len = header->size_total;
}
/* automatically grow the buffer when needed */
if(remote->buffer_size < read_len) {
remote->buffer_size = 2 * read_len;
crm_trace("Expanding buffer to %u bytes", remote->buffer_size);
remote->buffer = realloc_safe(remote->buffer, remote->buffer_size + 1); CRM_ASSERT(remote->buffer != NULL);
}
#ifdef HAVE_GNUTLS_GNUTLS_H
if (remote->tls_session) { if (remote->buffer == NULL) {
crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d] readlen[%d]", remote->buffer_size, read_len);
}
rc = gnutls_record_recv(*(remote->tls_session),
remote->buffer + remote->buffer_offset,
remote->buffer_size - remote->buffer_offset);
(snip)
-------
When Pacemaker fails in connection first in remote, my log is printed.
My log is not printed by the second connection.
[root at sl7-01 ~]# tail -f /var/log/messages | grep YAMA
Aug 5 14:46:25 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
Aug 5 14:46:26 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
Aug 5 14:46:28 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
Aug 5 14:46:30 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
Aug 5 14:46:31 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
(snip)
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Cc:
> Date: 2015/8/4, Tue 18:40
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
>
> Hi Andrew,
>
>> Do you know if this behaviour still exists?
>> A LOT of work went into the remote node logic in the last couple of months,
> its
>> possible this was fixed as a side-effect.
>
>
> It is the latest and does not confirm it.
> I confirm it.
>
> Many Thanks!
> Hideo Yamauchi.
>
>
> ----- Original Message -----
>> From: Andrew Beekhof <andrew at beekhof.net>
>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to
> open-source clustering welcomed <users at clusterlabs.org>
>> Cc:
>> Date: 2015/8/4, Tue 13:16
>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of
> pacemaker_remote.
>>
>>
>>> On 12 May 2015, at 12:12 pm, renayama19661014 at ybb.ne.jp wrote:
>>>
>>> Hi All,
>>>
>>> The problem is like a buffer becoming NULL after crm_resouce -C
> practice
>> somehow or other after having rebooted remote node.
>>>
>>> I incorporated log in a source code and confirmed it.
>>>
>>> ------------------------------------------------
>>> crm_remote_recv_once(crm_remote_t * remote)
>>> {
>>> (snip)
>>> /* automatically grow the buffer when needed */
>>> if(remote->buffer_size < read_len) {
>>> remote->buffer_size = 2 * read_len;
>>> crm_trace("Expanding buffer to %u bytes",
>> remote->buffer_size);
>>>
>>> remote->buffer = realloc_safe(remote->buffer,
>> remote->buffer_size + 1);
>>> CRM_ASSERT(remote->buffer != NULL);
>>> }
>>>
>>> #ifdef HAVE_GNUTLS_GNUTLS_H
>>> if (remote->tls_session) {
>>> if (remote->buffer == NULL) {
>>> crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d]
>> readlen[%d]", remote->buffer_size, read_len);
>>> }
>>> rc = gnutls_record_recv(*(remote->tls_session),
>>> remote->buffer +
>> remote->buffer_offset,
>>> remote->buffer_size -
>> remote->buffer_offset);
>>> (snip)
>>> ------------------------------------------------
>>>
>>> May 12 10:54:01 sl7-01 crmd[30447]: info: crm_remote_recv_once: ###
>> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
>>> May 12 10:54:02 sl7-01 crmd[30447]: info: crm_remote_recv_once: ###
>> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
>>> May 12 10:54:04 sl7-01 crmd[30447]: info: crm_remote_recv_once: ###
>> YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
>>
>> Do you know if this behaviour still exists?
>> A LOT of work went into the remote node logic in the last couple of months,
> its
>> possible this was fixed as a side-effect.
>>
>>>
>>> ------------------------------------------------
>>>
>>> gnutls_record_recv processes an empty buffer and becomes the error.
>>>
>>> ------------------------------------------------
>>> (snip)
>>> ssize_t
>>> _gnutls_recv_int(gnutls_session_t session, content_type_t type,
>>> gnutls_handshake_description_t htype,
>>> gnutls_packet_t *packet,
>>> uint8_t * data, size_t data_size, void *seq,
>>> unsigned int ms)
>>> {
>>> int ret;
>>>
>>> if (packet == NULL && (type != GNUTLS_ALERT && type !=
>
>> GNUTLS_HEARTBEAT)
>>> && (data_size == 0 || data == NULL))
>>> return gnutls_assert_val(GNUTLS_E_INVALID_REQUEST);
>>>
>>> (sip)
>>> ssize_t
>>> gnutls_record_recv(gnutls_session_t session, void *data, size_t
> data_size)
>>> {
>>> return _gnutls_recv_int(session, GNUTLS_APPLICATION_DATA, -1, NULL,
>>> data, data_size, NULL,
>>> session->internals.record_timeout_ms);
>>> }
>>> (snip)
>>> ------------------------------------------------
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: "renayama19661014 at ybb.ne.jp"
>> <renayama19661014 at ybb.ne.jp>
>>>> To: "users at clusterlabs.org"
> <users at clusterlabs.org>
>>>> Cc:
>>>> Date: 2015/5/11, Mon 16:45
>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About
>> movement of pacemaker_remote.
>>>>
>>>> Hi Ulrich,
>>>>
>>>> Thank you for comments.
>>>>
>>>>> So your host and you resource are both named
> "snmp1"? I
>> also
>>>> don't
>>>>> have much experience with cleaning up resources for a node
> that is
>> offline.
>>>> What
>>>>> change should it make (while the node is offline)?
>>>>
>>>>
>>>> The name of the remote resource and the name of the remote node
> make
>> same
>>>> "snmp1".
>>>>
>>>>
>>>> (snip)
>>>> primitive snmp1 ocf:pacemaker:remote \
>>>> params \
>>>> server="snmp1" \
>>>> op start interval="0s" timeout="60s"
>>>> on-fail="ignore" \
>>>> op monitor interval="3s" timeout="15s"
>
>> \
>>>> op stop interval="0s" timeout="60s"
>>>> on-fail="ignore"
>>>>
>>>> primitive Host-rsc1 ocf:heartbeat:Dummy \
>>>> op start interval="0s" timeout="60s"
>>>> on-fail="restart" \
>>>> op monitor interval="10s"
> timeout="60s"
>>>> on-fail="restart" \
>>>> op stop interval="0s" timeout="60s"
>>>> on-fail="ignore"
>>>>
>>>> primitive Remote-rsc1 ocf:heartbeat:Dummy \
>>>> op start interval="0s" timeout="60s"
>>>> on-fail="restart" \
>>>> op monitor interval="10s"
> timeout="60s"
>>>> on-fail="restart" \
>>>> op stop interval="0s" timeout="60s"
>>>> on-fail="ignore"
>>>>
>>>> location loc1 Remote-rsc1 \
>>>> rule 200: #uname eq snmp1
>>>> location loc3 Host-rsc1 \
>>>> rule 200: #uname eq bl460g8n1
>>>> (snip)
>>>>
>>>> The pacemaker_remoted of the snmp1 node stops in SIGTERM.
>>>> I reboot pacemaker_remoted of the snmp1 node afterwards.
>>>> And I execute crm_resource command, but the snmp1 node remains
>> off-line.
>>>>
>>>> After having executed crm_resource command, the remote node thinks
> that
>> it is
>>>> right movement to become the snmp1 online.
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>> Hideo Yamauchi.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>>>> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
>>>>> Cc:
>>>>> Date: 2015/5/11, Mon 15:39
>>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: [Question] About
>> movement of
>>>> pacemaker_remote.
>>>>>
>>>>>>>> <renayama19661014 at ybb.ne.jp> schrieb am
>> 11.05.2015 um
>>>> 06:22
>>>>> in Nachricht
>>>>> <361916.15877.qm at web200006.mail.kks.yahoo.co.jp>:
>>>>>> Hi All,
>>>>>>
>>>>>> I matched the OS version of the remote node with a host
> once
>> again and
>>>>
>>>>>> confirmed it in Pacemaker1.1.13-rc2.
>>>>>>
>>>>>> It was the same even if I made a host
> RHEL7.1.(bl460g8n1)
>>>>>> I made the remote host RHEL7.1.(snmp1)
>>>>>>
>>>>>> The first crm_resource -C fails.
>>>>>> --------------------------------
>>>>>> [root at bl460g8n1 ~]# crm_resource -C -r snmp1
>>>>>> Cleaning up snmp1 on bl460g8n1
>>>>>> Waiting for 1 replies from the CRMd. OK
>>>>>>
>>>>>> [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>>>> Last updated: Mon May 11 12:44:31 2015
>>>>>> Last change: Mon May 11 12:43:30 2015
>>>>>> Stack: corosync
>>>>>> Current DC: bl460g8n1 - partition WITHOUT quorum
>>>>>> Version: 1.1.12-7a2e3ae
>>>>>> 2 Nodes configured
>>>>>> 3 Resources configured
>>>>>>
>>>>>>
>>>>>> Online: [ bl460g8n1 ]
>>>>>> RemoteOFFLINE: [ snmp1 ]
>>>>>
>>>>> So your host and you resource are both named
> "snmp1"? I
>> also
>>>> don't
>>>>> have much experience with cleaning up resources for a node
> that is
>> offline.
>>>> What
>>>>> change should it make (while the node is offline)?
>>>>>
>>>>>>
>>>>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1
>>>>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1
>
>> (failure
>>>> ignored)
>>>>>>
>>>>>> Node Attributes:
>>>>>> * Node bl460g8n1:
>>>>>> + ringnumber_0 : 192.168.101.21
> is UP
>>>>>> + ringnumber_1 : 192.168.102.21
> is UP
>>>>>>
>>>>>> Migration summary:
>>>>>> * Node bl460g8n1:
>>>>>> snmp1: migration-threshold=1 fail-count=1000000
>>>> last-failure='Mon
>>>>> May 11
>>>>>> 12:44:28 2015'
>>>>>>
>>>>>> Failed actions:
>>>>>> snmp1_start_0 on bl460g8n1 'unknown error'
> (1):
>> call=5,
>>>>> status=Timed
>>>>>> Out, exit-reason='none', last-rc-change='Mon
> May
>> 11
>>>> 12:43:31
>>>>> 2015', queued=0ms,
>>>>>> exec=0ms
>>>>>> --------------------------------
>>>>>>
>>>>>>
>>>>>> The second crm_resource -C succeeded and was connected
> to the
>> remote
>>>> host.
>>>>>
>>>>> Then the node was online it seems.
>>>>>
>>>>> Regards,
>>>>> Ulrich
>>>>>
>>>>>> --------------------------------
>>>>>> [root at bl460g8n1 ~]# crm_mon -1 -Af
>>>>>> Last updated: Mon May 11 12:44:54 2015
>>>>>> Last change: Mon May 11 12:44:48 2015
>>>>>> Stack: corosync
>>>>>> Current DC: bl460g8n1 - partition WITHOUT quorum
>>>>>> Version: 1.1.12-7a2e3ae
>>>>>> 2 Nodes configured
>>>>>> 3 Resources configured
>>>>>>
>>>>>>
>>>>>> Online: [ bl460g8n1 ]
>>>>>> RemoteOnline: [ snmp1 ]
>>>>>>
>>>>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1
>>>>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
>>>>>> snmp1 (ocf::pacemaker:remote): Started bl460g8n1
>>>>>>
>>>>>> Node Attributes:
>>>>>> * Node bl460g8n1:
>>>>>> + ringnumber_0 : 192.168.101.21
> is UP
>>>>>> + ringnumber_1 : 192.168.102.21
> is UP
>>>>>> * Node snmp1:
>>>>>>
>>>>>> Migration summary:
>>>>>> * Node bl460g8n1:
>>>>>> * Node snmp1:
>>>>>> --------------------------------
>>>>>>
>>>>>> The gnutls of a host and the remote node was the next
>> version.
>>>>>>
>>>>>> gnutls-devel-3.3.8-12.el7.x86_64
>>>>>> gnutls-dane-3.3.8-12.el7.x86_64
>>>>>> gnutls-c++-3.3.8-12.el7.x86_64
>>>>>> gnutls-3.3.8-12.el7.x86_64
>>>>>> gnutls-utils-3.3.8-12.el7.x86_64
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Hideo Yamauchi.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "renayama19661014 at ybb.ne.jp"
>>>>> <renayama19661014 at ybb.ne.jp>
>>>>>>> To: Cluster Labs - All topics related to open-source
>
>> clustering
>>>>> welcomed
>>>>>> <users at clusterlabs.org>
>>>>>>> Cc:
>>>>>>> Date: 2015/4/28, Tue 14:06
>>>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question]
> About
>> movement of
>>>>>> pacemaker_remote.
>>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Even if the result changed the remote node to
> RHEL7.1, it
>> was the
>>>> same.
>>>>>>>
>>>>>>>
>>>>>>> I try it with a host node of pacemaker as RHEL7.1
> this
>> time.
>>>>>>>
>>>>>>>
>>>>>>> I noticed an interesting phenomenon.
>>>>>>> The remote node fails in a reconnection in the first
>
>> crm_resource.
>>>>>>> However, the remote node succeeds in a reconnection
> in
>> the second
>>>>>> crm_resource.
>>>>>>>
>>>>>>> I think that I have some problem with the point
> where I
>> cut the
>>>>> connection
>>>>>> with
>>>>>>> the remote node first.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Hideo Yamauchi.
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> From: "renayama19661014 at ybb.ne.jp"
>>>>>>> <renayama19661014 at ybb.ne.jp>
>>>>>>>> To: Cluster Labs - All topics related to
> open-source
>>>> clustering
>>>>> welcomed
>>>>>>> <users at clusterlabs.org>
>>>>>>>> Cc:
>>>>>>>> Date: 2015/4/28, Tue 11:52
>>>>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question]
> About
>>
>>>> movement of
>>>>>>> pacemaker_remote.
>>>>>>>>
>>>>>>>> Hi David,
>>>>>>>> Thank you for comments.
>>>>>>>>> At first glance this looks gnutls related.
>> GNUTLS is
>>>>> returning -50
>>>>>>> during
>>>>>>>> receive
>>>>>>>>
>>>>>>>>> on the client side (pacemaker's side).
> -50
>> maps to
>>>>> 'invalid
>>>>>>>> request'. >debug: crm_remote_recv_once:
>
>> TLS
>>>> receive
>>>>> failed: The
>>>>>>>> request is invalid. >We treat this error as
> fatal
>> and
>>>> destroy
>>>>> the
>>>>>>> connection.
>>>>>>>> I've never encountered
>>>>>>>>> this error and I don't know what causes
> it.
>> It's
>>>>> possible
>>>>>>>> there's a bug in
>>>>>>>>> our gnutls usage... it's also possible
>> there's a
>>>> bug
>>>>> in the
>>>>>>> version
>>>>>>>> of gnutls
>>>>>>>>> that is in use as well.
>>>>>>>> We built the remote node in RHEL6.5.
>>>>>>>> Because it may be a problem of gnutls, I confirm
> it
>> in
>>>> RHEL7.1.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Hideo Yamauchi.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list