[Pacemaker] [Patch] An error may occur to be behind with a stop of pingd.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Wed Apr 10 03:54:02 EDT 2013


Hi All,

We confirmed the phenomenon that an error generated to be behind with a stop of pingd.

The problem seems to be to be behind with receiving of SIGTERM of pingd until stand_alone_ping processing is completed.

------------------------------------------------------------------------------------------------------------------------
Apr 11 00:48:33 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
Apr 11 00:48:36 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
Apr 11 00:48:39 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
Apr 11 00:48:42 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
Apr 11 00:48:45 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
Apr 11 00:48:48 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
(snip)
Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing /usr/lib64/heartbeat/crmd process group 2427 with signal 15
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_shutdown: Requesting shutdown
(snip)
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: te_rsc_command: Initiating action 9: stop prmPingd:0_stop_0 on rh64-heartbeat1 (local)
Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: cancel_op: operation monitor[5] on prmPingd:0 for client 2427, its parameters: CRM_meta_clone=[0] host_list=[192.168.40.1] name=[default_ping_set] attempts=[2] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[1] timeout=[2] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] multiplier=[100] CRM_meta_interval=[10000] CRM_meta_timeout=[60000]  cancelled
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: do_lrm_rsc_op: Performing key=9:4:0:948901c2-4e97-4715-9f6b-1611810f8ef7 op=prmPingd:0_stop_0 )
Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: rsc:prmPingd:0 stop[9] (pid 2570)
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=5, status=1, cib-update=0, confirmed=true) Cancelled
Apr 11 00:48:50 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: operation stop[9] on prmPingd:0 for client 2427: pid 2570 exited with return code 0
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=9, rc=0, cib-update=59, confirmed=true) ok
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: match_graph_event: Action prmPingd:0_stop_0 (9) confirmed on rh64-heartbeat1 (rc=0)
(snip)
Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing /usr/lib64/heartbeat/ccm process group 2422 with signal 15
Apr 11 00:48:50 rh64-heartbeat1 ccm: [2422]: info: received SIGTERM, going to shut down
Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: ERROR: send_ipc_message: IPC Channel to 2426 is not connected                        -------> ERROR
Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: info: attrd_update: Could not send update: default_ping_set=0 for localhost
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBWRITE process 2418 with signal 15
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBREAD process 2419 with signal 15
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBFIFO process 2417 with signal 15
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2417 exited. 3 remaining
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2418 exited. 2 remaining
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2419 exited. 1 remaining
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: rh64-heartbeat1 Heartbeat shutdown complete.
Apr 11 00:48:53 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 4 retries remaining                --------> Pingd do not yet stop
Apr 11 00:48:55 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 3 retries remaining
Apr 11 00:48:57 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 2 retries remaining
Apr 11 00:48:59 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 1 retries remaining
Apr 11 00:49:01 rh64-heartbeat1 pingd: [2505]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Apr 11 00:49:01 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 5 retries remaining
Apr 11 00:49:03 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 4 retries remaining
Apr 11 00:49:05 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 3 retries remaining
Apr 11 00:49:07 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 2 retries remaining
Apr 11 00:49:09 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 1 retries remaining
------------------------------------------------------------------------------------------------------------------------

I added the end confirmation of the pingd process to solve this problem.

I attached a patch.
Please take this patch in Pacemaker1.0.

Best Reargds,
Hideo Yamauchi.



-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 2369.patch
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130410/d97ab0c8/attachment-0002.ksh>


More information about the Pacemaker mailing list