[Pacemaker] wrong device in stonith_admin -l

laurent+pacemaker at u-picardie.fr laurent+pacemaker at u-picardie.fr
Tue Dec 18 10:45:35 EST 2012


David Vossel <dvossel at redhat.com> writes:

>> Dec 12 01:12:37 elasticsearch-06 stonith-ng[18181]:   notice:
>> dynamic_list_search_cb: Disabling port list queries for
>> stonith-xen-eddu (1): failed:  255
>
> We discover what hosts a agent can fence by running this command internally in stonith.
>
> # agent -o list
>
>>From there we expect a exit-code of 0 and the list of node to be in the output.
> https://fedorahosted.org/cluster/wiki/FenceAgentAPI
>
> Looking at your logs, stonith-xen-eddu is returning -1 (255) as the return code when we issue the 'list' action.  That means we don't try to get the dynamic list again, we assume the 'list' action isn't supported. From there we fall back to using the 'status' action to dynamically determine if agent can fence a particular host.  I'm guessing the 'status' action is returning true (return codes 0 or 2) for hosts you wouldn't expect the agent to be able to fence for some reason.

Hi,

Ok it makes sense.
The FenceAgentAPI doc gives extra information on top of this one:
http://hg.linux-ha.org/glue/file/67224d37df80/doc/stonith/README.external

returning 1 when hostlist is empty does the trick (gethosts action)
so does returning 1 to the status action.

So I guess that's the explanation to both of my issues :
- after the timeout issue, the port list queries were disabled,
  failing back to the status action that was always returning rc=0
- gethosts returning rc=0 with an empty hostlist also disables the
  port list queries

so I guess there's no need to fill a new ticket :)
Thanks,

-- 
Laurent




More information about the Pacemaker mailing list