[Pacemaker] [PATCH] pingd calls "goto retry" if it gets EAGAIN or EINTR
David Vossel
dvossel at redhat.com
Fri Apr 6 16:15:53 CEST 2012
----- Original Message -----
> From: "Junko IKEDA" <tsukishima.ha at gmail.com>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Wednesday, April 4, 2012 1:23:07 AM
> Subject: [Pacemaker] [PATCH] pingd calls "goto retry" if it gets EAGAIN or EINTR
>
> Hi,
>
> When I run pingd + Pacemaker 1.0.12, I can see this message
> sometimes.
>
> stand_alone_ping: Node XXX.XXX.XXX.XXX is unreachable (read)
>
> XXX.XXX.XXX.XXX is pingd's target IP (make sense), or 127.0.0.1
> somehow.
> I found that in cases some applications call "ping" (OS command)
> without any relation to Pacemaker, pingd manages to pick up their
> ping's error messages.
> These packets are not for pingd, so pingd says "unreachable".
I don't understand what this means. Do you mean there is there some signal interrupting the pingd recvmsg? I'd like to understand in more concrete terms what exactly could cause this.
> pingd can retry the next packet and it will work well if there is no
> network problems.
>
> I referred to Linux "ping command" and modified "pingd" to ignore the
> above message because it's confusable.
> To sum up, pingd will call "goto retry" if it gets EAGAIN or EINTR.
>
> diff --git a/tools/pingd.c b/tools/pingd.c
>
> index 5e64ba2..b90d26d 100644
> --- a/tools/pingd.c
> +++ b/tools/pingd.c
> @@ -862,7 +862,10 @@ ping_read(ping_node *node, int *lenp)
>
> if(bytes < 0) {
> crm_perror(LOG_DEBUG, "Read failed");
> - if (saved_errno != EAGAIN && saved_errno != EINTR) {
> + if (saved_errno == EAGAIN || saved_errno == EINTR) {
I'm not sure if this is correct. I believe EAGAIN is the return code we get when the read timeout occurs. With this logic would we not get get stuck in a retry loop if we never received anything.
It might be safe to do this for the EINTR return code though. I don't know enough off the top of my head to understand why this would occur in your situation though.
Do you know what return code you are getting that causes this?
-- Vossel
> + crm_info("Retrying...");
> + goto retry;
> + } else {
> int rc = 0;
> if(node->type == AF_INET6) {
> rc = process_icmp6_error(node, (struct
> sockaddr_in6*)&(node->addr));
> @@ -898,6 +901,9 @@ ping_read(ping_node *node, int *lenp)
> } else if(rc > 0) {
> crm_free(packet);
> return TRUE;
> + } else {
> + crm_info("Retrying...");
> + goto retry;
> }
>
> } else {
>
>
>
> This is a peculiarly pingd problem, and I know Pacemaker 1.1.x
> recommends to use ping RA.
> So if there is no opposition, I'll ask Mori-san to commit this into
> pacemaker-1.0 repo.
>
> Thanks,
> Junko IKEDA
>
> NTT DATA INTELLILINK CORPORATION
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list