[ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.

Mon May 11 09:45:37 CEST 2015

Hi Ulrich,

Thank you for comments.

> So your host and you resource are both named "snmp1"? I also don't 
> have much experience with cleaning up resources for a node that is offline. What 
> change should it make (while the node is offline)?

The name of the remote resource and the name of the remote node make same "snmp1".

(snip)
primitive snmp1 ocf:pacemaker:remote \
        params \
                server="snmp1" \
        op start interval="0s" timeout="60s" on-fail="ignore" \
        op monitor interval="3s" timeout="15s" \
        op stop interval="0s" timeout="60s" on-fail="ignore"

primitive Host-rsc1 ocf:heartbeat:Dummy \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="ignore"

primitive Remote-rsc1 ocf:heartbeat:Dummy \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="ignore"

location loc1 Remote-rsc1 \
        rule 200: #uname eq snmp1
location loc3 Host-rsc1 \
        rule 200: #uname eq bl460g8n1
(snip)

The pacemaker_remoted of the snmp1 node stops in SIGTERM.
I reboot pacemaker_remoted of the snmp1 node afterwards.
And I execute crm_resource command, but the snmp1 node remains off-line.

After having executed crm_resource command, the remote node thinks that it is right movement to become the snmp1 online.

Best Regards,
Hideo Yamauchi.

----- Original Message -----
> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
> Cc: 
> Date: 2015/5/11, Mon 15:39
> Subject: Antw: Re: [ClusterLabs] Antw: Re: [Question] About movement of pacemaker_remote.
> 
>>>>  <renayama19661014 at ybb.ne.jp> schrieb am 11.05.2015 um 06:22 
> in Nachricht
> <361916.15877.qm at web200006.mail.kks.yahoo.co.jp>:
>>  Hi All,
>> 
>>  I matched the OS version of the remote node with a host once again and 
>>  confirmed it in Pacemaker1.1.13-rc2.
>> 
>>  It was the same even if I made a host RHEL7.1.(bl460g8n1)
>>  I made the remote host RHEL7.1.(snmp1)
>> 
>>  The first crm_resource -C fails.
>>  --------------------------------
>>  [root at bl460g8n1 ~]# crm_resource -C -r snmp1
>>  Cleaning up snmp1 on bl460g8n1
>>  Waiting for 1 replies from the CRMd. OK
>> 
>>  [root at bl460g8n1 ~]# crm_mon -1 -Af
>>  Last updated: Mon May 11 12:44:31 2015
>>  Last change: Mon May 11 12:43:30 2015
>>  Stack: corosync
>>  Current DC: bl460g8n1 - partition WITHOUT quorum
>>  Version: 1.1.12-7a2e3ae
>>  2 Nodes configured
>>  3 Resources configured
>> 
>> 
>>  Online: [ bl460g8n1 ]
>>  RemoteOFFLINE: [ snmp1 ]
> 
> So your host and you resource are both named "snmp1"? I also don't 
> have much experience with cleaning up resources for a node that is offline. What 
> change should it make (while the node is offline)?
> 
>> 
>>   Host-rsc1      (ocf::heartbeat:Dummy): Started bl460g8n1
>>   Remote-rsc1    (ocf::heartbeat:Dummy): Started bl460g8n1 (failure ignored)
>> 
>>  Node Attributes:
>>  * Node bl460g8n1:
>>      + ringnumber_0                      : 192.168.101.21 is UP
>>      + ringnumber_1                      : 192.168.102.21 is UP
>> 
>>  Migration summary:
>>  * Node bl460g8n1:
>>     snmp1: migration-threshold=1 fail-count=1000000 last-failure='Mon 
> May 11 
>>  12:44:28 2015'
>> 
>>  Failed actions:
>>      snmp1_start_0 on bl460g8n1 'unknown error' (1): call=5, 
> status=Timed 
>>  Out, exit-reason='none', last-rc-change='Mon May 11 12:43:31 
> 2015', queued=0ms, 
>>  exec=0ms
>>  --------------------------------
>> 
>> 
>>  The second crm_resource -C succeeded and was connected to the remote host.
> 
> Then the node was online it seems.
> 
> Regards,
> Ulrich
> 
>>  --------------------------------
>>  [root at bl460g8n1 ~]# crm_mon -1 -Af
>>  Last updated: Mon May 11 12:44:54 2015
>>  Last change: Mon May 11 12:44:48 2015
>>  Stack: corosync
>>  Current DC: bl460g8n1 - partition WITHOUT quorum
>>  Version: 1.1.12-7a2e3ae
>>  2 Nodes configured
>>  3 Resources configured
>> 
>> 
>>  Online: [ bl460g8n1 ]
>>  RemoteOnline: [ snmp1 ]
>> 
>>   Host-rsc1      (ocf::heartbeat:Dummy): Started bl460g8n1
>>   Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
>>   snmp1  (ocf::pacemaker:remote):        Started bl460g8n1
>> 
>>  Node Attributes:
>>  * Node bl460g8n1:
>>      + ringnumber_0                      : 192.168.101.21 is UP
>>      + ringnumber_1                      : 192.168.102.21 is UP
>>  * Node snmp1:
>> 
>>  Migration summary:
>>  * Node bl460g8n1:
>>  * Node snmp1:
>>  --------------------------------
>> 
>>  The gnutls of a host and the remote node was the next version.
>> 
>>  gnutls-devel-3.3.8-12.el7.x86_64
>>  gnutls-dane-3.3.8-12.el7.x86_64
>>  gnutls-c++-3.3.8-12.el7.x86_64
>>  gnutls-3.3.8-12.el7.x86_64
>>  gnutls-utils-3.3.8-12.el7.x86_64
>> 
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
>> 
>> 
>> 
>> 
>>  ----- Original Message -----
>>>  From: "renayama19661014 at ybb.ne.jp" 
> <renayama19661014 at ybb.ne.jp>
>>>  To: Cluster Labs - All topics related to open-source clustering 
> welcomed 
>>  <users at clusterlabs.org>
>>>  Cc: 
>>>  Date: 2015/4/28, Tue 14:06
>>>  Subject: Re: [ClusterLabs] Antw: Re: [Question] About movement of 
>>  pacemaker_remote.
>>> 
>>>  Hi David,
>>> 
>>>  Even if the result changed the remote node to RHEL7.1, it was the same.
>>> 
>>> 
>>>  I try it with a host node of pacemaker as RHEL7.1 this time.
>>> 
>>> 
>>>  I noticed an interesting phenomenon.
>>>  The remote node fails in a reconnection in the first crm_resource.
>>>  However, the remote node succeeds in a reconnection in the second 
>>  crm_resource.
>>> 
>>>  I think that I have some problem with the point where I cut the 
> connection 
>>  with 
>>>  the remote node first.
>>> 
>>>  Best Regards,
>>>  Hideo Yamauchi.
>>> 
>>> 
>>>  ----- Original Message -----
>>>>   From: "renayama19661014 at ybb.ne.jp" 
>>>  <renayama19661014 at ybb.ne.jp>
>>>>   To: Cluster Labs - All topics related to open-source clustering 
> welcomed 
>>>  <users at clusterlabs.org>
>>>>   Cc: 
>>>>   Date: 2015/4/28, Tue 11:52
>>>>   Subject: Re: [ClusterLabs] Antw: Re: [Question] About movement of 
>>>  pacemaker_remote.
>>>> 
>>>>   Hi David,
>>>>   Thank you for comments.
>>>>>   At first glance this looks gnutls related.  GNUTLS is 
> returning -50 
>>>  during 
>>>>   receive
>>>> 
>>>>>   on the client side (pacemaker's side). -50 maps to 
> 'invalid 
>>>>   request'. >debug: crm_remote_recv_once:     TLS receive 
> failed: The 
>>>>   request is invalid. >We treat this error as fatal and destroy 
> the 
>>>  connection. 
>>>>   I've never encountered
>>>>>   this error and I don't know what causes it. It's 
> possible 
>>>>   there's a bug in
>>>>>   our gnutls usage... it's also possible there's a bug 
> in the 
>>>  version 
>>>>   of gnutls
>>>>>   that is in use as well. 
>>>>   We built the remote node in RHEL6.5.
>>>>   Because it may be a problem of gnutls, I confirm it in RHEL7.1.
>>>> 
>>>>   Best Regards,
>>>>   Hideo Yamauchi.
>>>> 
>>>>   _______________________________________________
>>>>   Users mailing list: Users at clusterlabs.org 
>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>> 
>>>>   Project Home: http://www.clusterlabs.org 
>>>>   Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>   Bugs: http://bugs.clusterlabs.org 
>>>> 
>>> 
>>>  _______________________________________________
>>>  Users mailing list: Users at clusterlabs.org 
>>>  http://clusterlabs.org/mailman/listinfo/users 
>>> 
>>>  Project Home: http://www.clusterlabs.org 
>>>  Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>  Bugs: http://bugs.clusterlabs.org 
>>> 
>> 
>>  _______________________________________________
>>  Users mailing list: Users at clusterlabs.org 
>>  http://clusterlabs.org/mailman/listinfo/users 
>> 
>>  Project Home: http://www.clusterlabs.org 
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>  Bugs: http://bugs.clusterlabs.org 
>