[Pacemaker] crmd does abort if a stopped node is specified

Thu May 8 06:19:40 EDT 2014

Hi, Andrew

I read the code.
In the present processing, a setup of "startup-fencing" is read only
once after starting.
https://github.com/ClusterLabs/pacemaker/blob/master/lib/pengine/unpack.c#L455

In Pacemaker-1.0, whenever unpack_nodes() was called, a setup was read.
https://github.com/ClusterLabs/pacemaker-1.0/blob/master/lib/pengine/unpack.c#L194

While a cluster starts, a setup of "startup-fencing" cannot be changed.
It seems to it that the function has deteriorated.

I made the correction to this problem below.
https://github.com/ClusterLabs/pacemaker/pull/512

Will it be good in this fix?

Regards,
Yusuke

2014-05-08 15:59 GMT+09:00 Yusuke Iida <yusk.iida at gmail.com>:
> Hi, Andrew
>
> I am the method shown above and made a setup read.
>
> crmd was able to be added as a node of OFFLINE, without core dumping.
>
> However, the node of OFFLINE added although "startup-fencing=false"
> was set up has been fenced.
> I do not expect fence here.
> Why is it that "startup-fencing=false" is not effective?
>
> I attach crm_report when a problem occurs.
>
> The version of used Pacemaker is as follows.
> https://github.com/ClusterLabs/pacemaker/commit/9fa1ed36e373768e84bee47b5d21b0bf80f608b7
>
> Regards,
> Yusuke
>
> 2014-05-08 8:58 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>
>> On 7 May 2014, at 7:53 pm, Yusuke Iida <yusk.iida at gmail.com> wrote:
>>
>>> Hi, Andrew
>>>
>>> I would also like to describe the node which has not participated in a
>>> cluster to a crmsh file.
>>>
>>> I understood that uuid was required for a setup of a node as follows
>>> from this mail thread.
>>>
>>> # cat node.crm
>>> ### Cluster Option ###
>>> property no-quorum-policy="ignore" \
>>>        stonith-enabled="true" \
>>>        startup-fencing="false" \
>>>        crmd-transition-delay="2s"
>>>
>>> node $id=131 vm01
>>> node $id=132 vm02
>>> (snip)
>>>
>>> Is the method of setting up ID of the node which has not participated
>>> in a cluster using a corosync stack like this?
>>
>> I don;t know how crmsh works, sorry
>>
>>> It is sufficient to describe the nodelist and nodeid to corosync.conf?
>>
>> That is my understanding, yes.
>>
>>>
>>> # cat corosync.conf
>>> (snip)
>>> nodelist {
>>>  node {
>>>    ring0_addr: 192.168.101.131
>>>    ring1_addr: 192.168.102.131
>>>    nodeid: 131
>>>  }
>>>  node {
>>>    ring0_addr: 192.168.101.132
>>>    ring1_addr: 192.168.101.132
>>>    nodeid: 132
>>>  }
>>> }
>>>
>>> Regards,
>>> Yusuke
>>>
>>> 2014-04-24 12:33 GMT+09:00 Kazunori INOUE <kazunori.inoue3 at gmail.com>:
>>>> 2014-04-23 19:32 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>>>>
>>>>> On 23 Apr 2014, at 7:17 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>>>>>
>>>>>> 2014-04-22 0:45 GMT+09:00 David Vossel <dvossel at redhat.com>:
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Kazunori INOUE" <kazunori.inoue3 at gmail.com>
>>>>>>>> To: "pm" <pacemaker at oss.clusterlabs.org>
>>>>>>>> Sent: Friday, April 18, 2014 4:49:42 AM
>>>>>>>> Subject: [Pacemaker] crmd does abort if a stopped node is specified
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> crmd does abort if I load CIB which specified a stopped node.
>>>>>>>>
>>>>>>>> # crm_mon -1
>>>>>>>> Last updated: Fri Apr 18 11:51:36 2014
>>>>>>>> Last change: Fri Apr 18 11:51:30 2014
>>>>>>>> Stack: corosync
>>>>>>>> Current DC: pm103 (3232261519) - partition WITHOUT quorum
>>>>>>>> Version: 1.1.11-cf82673
>>>>>>>> 1 Nodes configured
>>>>>>>> 0 Resources configured
>>>>>>>>
>>>>>>>> Online: [ pm103 ]
>>>>>>>>
>>>>>>>> # cat test.cli
>>>>>>>> node pm103
>>>>>>>> node pm104
>>>>>>>>
>>>>>>>> # crm configure load update test.cli
>>>>>>>>
>>>>>>>> Apr 18 11:52:42 pm103 crmd[11672]:    error: crm_int_helper:
>>>>>>>> Characters left over after parsing 'pm104': 'pm104'
>>>>>>>> Apr 18 11:52:42 pm103 crmd[11672]:    error: crm_abort: crm_get_peer:
>>>>>>>> Triggered fatal assert at membership.c:420 : id > 0 || uname != NULL
>>>>>>>> Apr 18 11:52:42 pm103 pacemakerd[11663]:    error: child_waitpid:
>>>>>>>> Managed process 11672 (crmd) dumped core
>>>>>>>>
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x00000033da432925 in raise () from /lib64/libc.so.6
>>>>>>>> #1  0x00000033da434105 in abort () from /lib64/libc.so.6
>>>>>>>> #2  0x00007f30241b7027 in crm_abort (file=0x7f302440b0b3
>>>>>>>> "membership.c", function=0x7f302440b5d0 "crm_get_peer", line=420,
>>>>>>>> assert_condition=0x7f302440b27e "id > 0 || uname != NULL", do_core=1,
>>>>>>>> do_fork=0) at utils.c:1177
>>>>>>>> #3  0x00007f30244048ee in crm_get_peer (id=0, uname=0x0) at membership.c:420
>>>>>>>> #4  0x00007f3024402238 in crm_peer_uname (uuid=0x113e7c0 "pm104") at
>>>>>>>
>>>>>>> is the uuid for your cluster nodes supposed to be the same as the uname?  We're treating the uuid in this situation as if it should be a number, which it clearly is not.
>>>>>>
>>>>>> OK, I got it.
>>>>>>
>>>>>> By the way, is there a method to know id of the node before starting pacemaker?
>>>>>
>>>>> Normally it comes from corosync, so not really :-(
>>>>
>>>> It seems the only way is to specify the nodeid to nodelist directive
>>>> in corosync.conf.
>>>>
>>>> nodelist {
>>>>  node {
>>>>    ring0_addr: 192.168.101.143
>>>>    nodeid: 3
>>>>  }
>>>>  node {
>>>>    ring0_addr: 192.168.101.144
>>>>    nodeid: 4
>>>>  }
>>>> }
>>>>
>>>> Thanks!
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> -- Vossel
>>>>>>>
>>>>>>>
>>>>>>>> cluster.c:386
>>>>>>>> #5  0x000000000043afbd in abort_transition_graph
>>>>>>>> (abort_priority=1000000, abort_action=tg_restart, abort_text=0x44d2f4
>>>>>>>> "Non-status change", reason=0x113e4b0, fn=0x44df07 "te_update_diff",
>>>>>>>> line=382) at te_utils.c:518
>>>>>>>> #6  0x000000000043caa4 in te_update_diff (event=0x10f2240
>>>>>>>> "cib_diff_notify", msg=0x1137660) at te_callbacks.c:382
>>>>>>>> #7  0x00007f302461d1bc in cib_native_notify (data=0x10ef750,
>>>>>>>> user_data=0x1137660) at cib_utils.c:733
>>>>>>>> #8  0x00000033db83d6bc in g_list_foreach () from /lib64/libglib-2.0.so.0
>>>>>>>> #9  0x00007f3024620191 in cib_native_dispatch_internal
>>>>>>>> (buffer=0xe61ea8 "<notify t=\"cib_notify\" subt=\"cib_diff_notify\"
>>>>>>>> cib_op=\"cib_apply_diff\" cib_rc=\"0\"
>>>>>>>> cib_object_type=\"diff\"><cib_generation><generation_tuple epoch=\"4\"
>>>>>>>> num_updates=\"0\" admin_epoch=\"0\" validate-with=\"pacem"...,
>>>>>>>> length=1708, userdata=0xe5eb90) at cib_native.c:123
>>>>>>>> #10 0x00007f30241dee72 in mainloop_gio_callback (gio=0xf61ea0,
>>>>>>>> condition=G_IO_IN, data=0xe601b0) at mainloop.c:639
>>>>>>>> #11 0x00000033db83feb2 in g_main_context_dispatch () from
>>>>>>>> /lib64/libglib-2.0.so.0
>>>>>>>> #12 0x00000033db843d68 in ?? () from /lib64/libglib-2.0.so.0
>>>>>>>> #13 0x00000033db844275 in g_main_loop_run () from /lib64/libglib-2.0.so.0
>>>>>>>> #14 0x0000000000406469 in crmd_init () at main.c:154
>>>>>>>> #15 0x00000000004062b0 in main (argc=1, argv=0x7fff908829f8) at main.c:121
>>>>>>>>
>>>>>>>> Is this all right?
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Kazunori INOUE
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> --
>>> ----------------------------------------
>>> METRO SYSTEMS CO., LTD
>>>
>>> Yusuke Iida
>>> Mail: yusk.iida at gmail.com
>>> ----------------------------------------
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> --
> ----------------------------------------
> METRO SYSTEMS CO., LTD
>
> Yusuke Iida
> Mail: yusk.iida at gmail.com
> ----------------------------------------

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------