[Pacemaker] [PATCH] pingd calls "goto retry" if it gets EAGAIN or EINTR

Fri Apr 6 14:15:53 UTC 2012

----- Original Message -----
> From: "Junko IKEDA" <tsukishima.ha at gmail.com>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Wednesday, April 4, 2012 1:23:07 AM
> Subject: [Pacemaker] [PATCH] pingd calls "goto retry" if it gets EAGAIN or	EINTR
> 
> Hi,
> 
> When I run pingd + Pacemaker 1.0.12, I can see this message
> sometimes.
> 
> stand_alone_ping: Node XXX.XXX.XXX.XXX is unreachable (read)
> 
> XXX.XXX.XXX.XXX is pingd's target IP (make sense), or 127.0.0.1
> somehow.
> I found that in cases some applications call "ping" (OS command)
> without any relation to Pacemaker, pingd manages to pick up their
> ping's error messages.
> These packets are not for pingd, so pingd says "unreachable".

I don't understand what this means.  Do you mean there is there some signal interrupting the pingd recvmsg?  I'd like to understand in more concrete terms what exactly could cause this. 

> pingd can retry the next packet and it will work well if there is no
> network problems.
> 
> I referred to Linux "ping command" and modified "pingd" to ignore the
> above message because it's confusable.
> To sum up, pingd will call "goto retry" if it gets EAGAIN or EINTR.
> 
> diff --git a/tools/pingd.c b/tools/pingd.c
> 
> index 5e64ba2..b90d26d 100644
> --- a/tools/pingd.c
> +++ b/tools/pingd.c
> @@ -862,7 +862,10 @@ ping_read(ping_node *node, int *lenp)
> 
>      if(bytes < 0) {
>  	crm_perror(LOG_DEBUG, "Read failed");
> -	if (saved_errno != EAGAIN && saved_errno != EINTR) {
> +	if (saved_errno == EAGAIN || saved_errno == EINTR) {

I'm not sure if this is correct.  I believe EAGAIN is the return code we get when the read timeout occurs.  With this logic would we not get get stuck in a retry loop if we never received anything.

It might be safe to do this for the EINTR return code though. I don't know enough off the top of my head to understand why this would occur in your situation though.

Do you know what return code you are getting that causes this?

-- Vossel

> +		crm_info("Retrying...");
> +		goto retry;
> +	} else {
>  	    int rc = 0;
>  	    if(node->type == AF_INET6) {
>  		rc = process_icmp6_error(node, (struct
>  		sockaddr_in6*)&(node->addr));
> @@ -898,6 +901,9 @@ ping_read(ping_node *node, int *lenp)
>  	} else if(rc > 0) {
>  	    crm_free(packet);
>  	    return TRUE;
> +	} else {
> +	    crm_info("Retrying...");
> +	    goto retry;
>  	}
>  	
>      } else {
> 
> 
> 
> This is a peculiarly pingd problem, and I know Pacemaker 1.1.x
> recommends to use ping RA.
> So if there is no opposition, I'll ask Mori-san to commit this into
> pacemaker-1.0 repo.
> 
> Thanks,
> Junko IKEDA
> 
> NTT DATA INTELLILINK CORPORATION
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>