[Pacemaker] [Patch] An error may occur to be behind with a stop of pingd.

Thu Apr 18 01:54:57 UTC 2013

Hi All,

I sent the pull request of this patch.

 * https://github.com/ClusterLabs/pacemaker-1.0/pull/13

Best Regards,
Hideo Yamauchi.

--- On Wed, 2013/4/10, renayama19661014 at ybb.ne.jp <renayama19661014 at ybb.ne.jp> wrote:

> Hi All,
> 
> We confirmed the phenomenon that an error generated to be behind with a stop of pingd.
> 
> The problem seems to be to be behind with receiving of SIGTERM of pingd until stand_alone_ping processing is completed.
> 
> ------------------------------------------------------------------------------------------------------------------------
> Apr 11 00:48:33 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
> Apr 11 00:48:36 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
> Apr 11 00:48:39 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
> Apr 11 00:48:42 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
> Apr 11 00:48:45 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
> Apr 11 00:48:48 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
> (snip)
> Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing /usr/lib64/heartbeat/crmd process group 2427 with signal 15
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_shutdown: Requesting shutdown
> (snip)
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: te_rsc_command: Initiating action 9: stop prmPingd:0_stop_0 on rh64-heartbeat1 (local)
> Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: cancel_op: operation monitor[5] on prmPingd:0 for client 2427, its parameters: CRM_meta_clone=[0] host_list=[192.168.40.1] name=[default_ping_set] attempts=[2] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[1] timeout=[2] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] multiplier=[100] CRM_meta_interval=[10000] CRM_meta_timeout=[60000]  cancelled
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: do_lrm_rsc_op: Performing key=9:4:0:948901c2-4e97-4715-9f6b-1611810f8ef7 op=prmPingd:0_stop_0 )
> Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: rsc:prmPingd:0 stop[9] (pid 2570)
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=5, status=1, cib-update=0, confirmed=true) Cancelled
> Apr 11 00:48:50 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 192.168.40.1 is unreachable (read)
> Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: operation stop[9] on prmPingd:0 for client 2427: pid 2570 exited with return code 0
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=9, rc=0, cib-update=59, confirmed=true) ok
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: match_graph_event: Action prmPingd:0_stop_0 (9) confirmed on rh64-heartbeat1 (rc=0)
> (snip)
> Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing /usr/lib64/heartbeat/ccm process group 2422 with signal 15
> Apr 11 00:48:50 rh64-heartbeat1 ccm: [2422]: info: received SIGTERM, going to shut down
> Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: ERROR: send_ipc_message: IPC Channel to 2426 is not connected                        -------> ERROR
> Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: info: attrd_update: Could not send update: default_ping_set=0 for localhost
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBWRITE process 2418 with signal 15
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBREAD process 2419 with signal 15
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBFIFO process 2417 with signal 15
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2417 exited. 3 remaining
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2418 exited. 2 remaining
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2419 exited. 1 remaining
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: rh64-heartbeat1 Heartbeat shutdown complete.
> Apr 11 00:48:53 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 4 retries remaining                --------> Pingd do not yet stop
> Apr 11 00:48:55 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 3 retries remaining
> Apr 11 00:48:57 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 2 retries remaining
> Apr 11 00:48:59 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 1 retries remaining
> Apr 11 00:49:01 rh64-heartbeat1 pingd: [2505]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> Apr 11 00:49:01 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 5 retries remaining
> Apr 11 00:49:03 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 4 retries remaining
> Apr 11 00:49:05 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 3 retries remaining
> Apr 11 00:49:07 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 2 retries remaining
> Apr 11 00:49:09 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: Connecting to cluster... 1 retries remaining
> ------------------------------------------------------------------------------------------------------------------------
> 
> I added the end confirmation of the pingd process to solve this problem.
> 
> I attached a patch.
> Please take this patch in Pacemaker1.0.
> 
> Best Reargds,
> Hideo Yamauchi.
> 
> 
> 
>