[Pacemaker] wrong device in stonith_admin -l

laurent+pacemaker at u-picardie.fr laurent+pacemaker at u-picardie.fr
Tue Dec 11 19:51:20 EST 2012


Hi,

I've just observed something weird.
A node is running a stonith resource for which gethosts gives an empty
node list. The result of stonith_admin -l does include it in the
device list !

result of "stonith_admin -l elasticsearch-05" run from
elasticsearch-06 :
 stonith-xen-peatbull
 stonith-xen-eddu
2 devices found

stonith-xen-peatbull is a correct fencing device
stonith-xen-eddu is a fencing device with an empty hostlist

running "my-xen0 gethosts" with the stonith-xen-eddu params by hand
doesn't return any host, and it does exit with 0 (is that correct to
return 0 with an empty host list ?)

logs :
Dec 12 01:09:10 elasticsearch-06 stonith-ng[18181]:   notice: stonith_device_register: Added 'stonith-cluster-xen' to the device list (6 active devices)
Dec 12 01:09:10 elasticsearch-06 attrd[18183]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Dec 12 01:09:10 elasticsearch-06 attrd[18183]:   notice: attrd_perform_update: Sent update 5: probe_complete=true
Dec 12 01:09:11 elasticsearch-06 stonith-ng[18181]:   notice: stonith_device_register: Added 'stonith-xen-eddu' to the device list (6 active devices)
Dec 12 01:09:11 elasticsearch-06 stonith-ng[18181]:   notice: stonith_device_register: Added 'stonith-xen-peatbull' to the device list (6 active devices)
Dec 12 01:09:12 elasticsearch-06 stonith: [18434]: info: external/my-xen0-ha device OK.
Dec 12 01:09:12 elasticsearch-06 crmd[18185]:   notice: process_lrm_event: LRM operation stonith-cluster-xen_start_0 (call=61,rc=0, cib-update=27, confirmed=true) ok
Dec 12 01:09:14 elasticsearch-06 stonith: [18465]: info: external_run_cmd: '/usr/lib/stonith/plugins/external/my-xen0 status' output: elasticsearch-05
Dec 12 01:09:14 elasticsearch-06 stonith: [18465]: info: external_run_cmd: '/usr/lib/stonith/plugins/external/my-xen0 status' output: elasticsearch-06
Dec 12 01:09:15 elasticsearch-06 stonith: [18465]: info: external/my-xen0 device OK.
Dec 12 01:09:15 elasticsearch-06 crmd[18185]:   notice: process_lrm_event: LRM operation stonith-xen-peatbull_start_0 (call=68, rc=0, cib-update=28, confirmed=true) ok
Dec 12 01:09:15 elasticsearch-06 stonith: [18458]: info: external/my-xen0 device OK.
Dec 12 01:09:15 elasticsearch-06 crmd[18185]:   notice: process_lrm_event: LRM operation stonith-xen-eddu_start_0 (call=66, rc=0, cib-update=29, confirmed=true) ok
Dec 12 01:12:34 elasticsearch-06 stonith-ng[18181]:   notice: dynamic_list_search_cb: Disabling port list queries for stonith-xen-kornog (1): (null) 
Dec 12 01:12:34 elasticsearch-06 stonith-ng[18181]:   notice: dynamic_list_search_cb: Disabling port list queries for stonith-xen-nikka (1): (null)
Dec 12 01:12:34 elasticsearch-06 stonith-ng[18181]:   notice: dynamic_list_search_cb: Disabling port list queries for stonith-xen-yoichi (1): (null)
Dec 12 01:12:34 elasticsearch-06 stonith: [19301]: CRIT: external_hostlist: 'my-xen0 gethosts' returned an empty hostlist
Dec 12 01:12:34 elasticsearch-06 stonith: [19301]: ERROR: Could not list hosts for external/my-xen0.
Dec 12 01:12:37 elasticsearch-06 stonith: [19332]: CRIT: external_hostlist: 'my-xen0 gethosts' returned an empty hostlist
Dec 12 01:12:37 elasticsearch-06 stonith: [19332]: ERROR: Could not list hosts for external/my-xen0.
Dec 12 01:12:37 elasticsearch-06 stonith-ng[18181]:   notice: dynamic_list_search_cb: Disabling port list queries for stonith-xen-eddu (1): failed:  255

David, I mentioned a node being wrongly fenced in the "stonith-timeout
duration 0 is too low" bug, could it be related ?


-- 




More information about the Pacemaker mailing list