[Pacemaker] 1.1.12: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Andrew Beekhof
andrew at beekhof.net
Fri Aug 1 04:30:38 UTC 2014
On 1 Aug 2014, at 2:04 pm, Andrew Beekhof <andrew at beekhof.net> wrote:
>
> On 1 Aug 2014, at 7:47 am, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>>
>> On 31 Jul 2014, at 4:46 pm, Cédric Dufour - Idiap Research Institute <cedric.dufour at idiap.ch> wrote:
>>
>>> On 31/07/14 00:17, Andrew Beekhof wrote:
>>>> On 31 Jul 2014, at 2:48 am, Cédric Dufour - Idiap Research Institute <cedric.dufour at idiap.ch> wrote:
>>>>
>>>>> After packaging pacemaker 1.1.12 for Debian/Wheezy (along corosync 1.4.6 and libqb 0.17.0), I have successfully initialized a new cluster.
>>>>>
>>>>> Back to a very simple test cluster, the only problem I have is with fencing, which fails altogether with "route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)" messages:
>>>>>
>>>>> root at bc1hs22a01:~ # tail /var/log/corosync.rsyslog
>>>>> Jul 30 18:41:41 bc1hs22a01 stonith_admin[5411]: notice: crm_log_args: Invoked: stonith_admin -F bc1hs22a02
>>>>> Jul 30 18:41:41 bc1hs22a01 stonithd[4754]: notice: handle_request: Client stonith_admin.5411.fe1388ed wants to fence (off) 'bc1hs22a02' with device '(any)'
>>>>> Jul 30 18:41:41 bc1hs22a01 stonithd[4754]: notice: initiate_remote_stonith_op: Initiating remote operation off for bc1hs22a02: 48b69f82-29ad-4c9a-af57-0e60ae5242e4 (0)
>>>>> Jul 30 18:41:41 bc1hs22a01 corosync[4686]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>>>> rc=-2 is coming from send_client_ipc(void *conn, const AIS_Message * ais_msg)
>>>>
>>>> specifically:
>>>>
>>>> if (conn == NULL) {
>>>> rc = -2;
>>>>
>>>> So the plugin thinks that stonith-ng isn't connected.
>>>> More logs?
>>>>
>>>
>>> I have completed a full restart of the cluster in order to provide the logs at each step; see attached log files:
>>> (from node_1/DC)
>>> - node_1-corosync-start.log
>>> - node_1-pacemaker-start.log
>>> - node_1-corosync-node_2_join.log
>>> - node_1-pacemaker-node_2_join.log
>>> (from node_2)
>>> - node_2-corosync-start.log
>>> - node_2-pacemaker-start.log
>>>
>>> The problem manifests itself already in DC start log - because of previous fencing attempt - at 08:19:21 and 08:19:42:
>>>
>>> root at bc1hs22a01:~ # fgrep 'ipc delivery failed' node_1-corosync-start.log
>>> Jul 31 08:19:21 bc1hs22a01 corosync[31057]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>>> Jul 31 08:19:42 bc1hs22a01 corosync[31057]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>>>
>>> While it would seem (to me) that the stonith plugin successfully connected to the CIB:
>>
>> Its not the CIB thats the issue:
>>
>>>>> Jul 30 18:41:41 bc1hs22a01 corosync[4686]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>>
>> Thats the pacemaker plugin inside corosync (which uses a completely different IPC mechanism).
>
> It looks like there is a name mismatch:
>
> Jul 31 08:19:20 bc1hs22a01 corosync[31057]: [pcmk ] info: pcmk_ipc: Recorded connection 0x2543e30 for stonithd/0
> Jul 31 08:19:20 bc1hs22a01 corosync[31057]: [pcmk ] debug: process_ais_message: Msg[1] (dest=local:ais, from=bc1hs22a01:stonithd.31092, remote=true, size=6): 31092
> ...
> Jul 31 08:19:21 bc1hs22a01 corosync[31057]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
> Jul 31 08:19:42 bc1hs22a01 corosync[31057]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>
> Could you try the following patch?
Actually, try this one instead:
https://github.com/beekhof/pacemaker/commit/21830a0
>
> diff --git a/lib/ais/plugin.c b/lib/ais/plugin.c
> index 3d4f369..560e18b 100644
> --- a/lib/ais/plugin.c
> +++ b/lib/ais/plugin.c
> @@ -1508,6 +1508,9 @@ route_ais_message(const AIS_Message * msg, gboolean local_origin)
> /* te messages are routed via the crm */
> dest = crm_msg_crmd;
>
> + } else if (dest == crm_msg_stonith_ng) {
> + dest = crm_msg_stonithd;
> +
> } else if (dest >= SIZEOF(pcmk_children)) {
> /* Transient client */
>
>
>
>
>
>>
>> FWIW, the plugin is extremely deprecated, you're encouraged to use pacemaker+cman or begin working towards corosync2 + pacemakerd.
>>
>>>
>>> root at bc1hs22a01:~ # fgrep cib_native_signon_raw node_1-pacemaker-start.log
>>> Jul 31 08:19:20 [31096] bc1hs22a01 crmd: debug: cib_native_signon_raw: Connection unsuccessful (0 (nil))
>>> Jul 31 08:19:20 [31096] bc1hs22a01 crmd: debug: cib_native_signon_raw: Connection to CIB failed: Transport endpoint is not connected
>>> Jul 31 08:19:20 [31092] bc1hs22a01 stonithd: debug: cib_native_signon_raw: Connection unsuccessful (0 (nil))
>>> Jul 31 08:19:20 [31092] bc1hs22a01 stonithd: debug: cib_native_signon_raw: Connection to CIB failed: Transport endpoint is not connected
>>> Jul 31 08:19:21 [31096] bc1hs22a01 crmd: debug: cib_native_signon_raw: Connection to CIB successful
>>> Jul 31 08:19:21 [31092] bc1hs22a01 stonithd: debug: cib_native_signon_raw: Connection to CIB successful
>>> Jul 31 08:19:25 [31094] bc1hs22a01 attrd: debug: cib_native_signon_raw: Connection to CIB successful
>>>
>>> Best,
>>>
>>> Cédric
>>>
>>> <node_1-corosync-start.log><node_1-pacemaker-start.log><node_1-corosync-node_2_join.log><node_1-pacemaker-node_2_join.log><node_2-corosync-start.log><node_2-pacemaker-start.log>_______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140801/edcc540b/attachment-0004.sig>
More information about the Pacemaker
mailing list