[Pacemaker] crmd does abort if a stopped node is specified

Andrew Beekhof andrew at beekhof.net
Wed May 7 23:58:41 UTC 2014


On 7 May 2014, at 7:53 pm, Yusuke Iida <yusk.iida at gmail.com> wrote:

> Hi, Andrew
> 
> I would also like to describe the node which has not participated in a
> cluster to a crmsh file.
> 
> I understood that uuid was required for a setup of a node as follows
> from this mail thread.
> 
> # cat node.crm
> ### Cluster Option ###
> property no-quorum-policy="ignore" \
>        stonith-enabled="true" \
>        startup-fencing="false" \
>        crmd-transition-delay="2s"
> 
> node $id=131 vm01
> node $id=132 vm02
> (snip)
> 
> Is the method of setting up ID of the node which has not participated
> in a cluster using a corosync stack like this?

I don;t know how crmsh works, sorry

> It is sufficient to describe the nodelist and nodeid to corosync.conf?

That is my understanding, yes.

> 
> # cat corosync.conf
> (snip)
> nodelist {
>  node {
>    ring0_addr: 192.168.101.131
>    ring1_addr: 192.168.102.131
>    nodeid: 131
>  }
>  node {
>    ring0_addr: 192.168.101.132
>    ring1_addr: 192.168.101.132
>    nodeid: 132
>  }
> }
> 
> Regards,
> Yusuke
> 
> 2014-04-24 12:33 GMT+09:00 Kazunori INOUE <kazunori.inoue3 at gmail.com>:
>> 2014-04-23 19:32 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>> 
>>> On 23 Apr 2014, at 7:17 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>>> 
>>>> 2014-04-22 0:45 GMT+09:00 David Vossel <dvossel at redhat.com>:
>>>>> 
>>>>> ----- Original Message -----
>>>>>> From: "Kazunori INOUE" <kazunori.inoue3 at gmail.com>
>>>>>> To: "pm" <pacemaker at oss.clusterlabs.org>
>>>>>> Sent: Friday, April 18, 2014 4:49:42 AM
>>>>>> Subject: [Pacemaker] crmd does abort if a stopped node is specified
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> crmd does abort if I load CIB which specified a stopped node.
>>>>>> 
>>>>>> # crm_mon -1
>>>>>> Last updated: Fri Apr 18 11:51:36 2014
>>>>>> Last change: Fri Apr 18 11:51:30 2014
>>>>>> Stack: corosync
>>>>>> Current DC: pm103 (3232261519) - partition WITHOUT quorum
>>>>>> Version: 1.1.11-cf82673
>>>>>> 1 Nodes configured
>>>>>> 0 Resources configured
>>>>>> 
>>>>>> Online: [ pm103 ]
>>>>>> 
>>>>>> # cat test.cli
>>>>>> node pm103
>>>>>> node pm104
>>>>>> 
>>>>>> # crm configure load update test.cli
>>>>>> 
>>>>>> Apr 18 11:52:42 pm103 crmd[11672]:    error: crm_int_helper:
>>>>>> Characters left over after parsing 'pm104': 'pm104'
>>>>>> Apr 18 11:52:42 pm103 crmd[11672]:    error: crm_abort: crm_get_peer:
>>>>>> Triggered fatal assert at membership.c:420 : id > 0 || uname != NULL
>>>>>> Apr 18 11:52:42 pm103 pacemakerd[11663]:    error: child_waitpid:
>>>>>> Managed process 11672 (crmd) dumped core
>>>>>> 
>>>>>> (gdb) bt
>>>>>> #0  0x00000033da432925 in raise () from /lib64/libc.so.6
>>>>>> #1  0x00000033da434105 in abort () from /lib64/libc.so.6
>>>>>> #2  0x00007f30241b7027 in crm_abort (file=0x7f302440b0b3
>>>>>> "membership.c", function=0x7f302440b5d0 "crm_get_peer", line=420,
>>>>>> assert_condition=0x7f302440b27e "id > 0 || uname != NULL", do_core=1,
>>>>>> do_fork=0) at utils.c:1177
>>>>>> #3  0x00007f30244048ee in crm_get_peer (id=0, uname=0x0) at membership.c:420
>>>>>> #4  0x00007f3024402238 in crm_peer_uname (uuid=0x113e7c0 "pm104") at
>>>>> 
>>>>> is the uuid for your cluster nodes supposed to be the same as the uname?  We're treating the uuid in this situation as if it should be a number, which it clearly is not.
>>>> 
>>>> OK, I got it.
>>>> 
>>>> By the way, is there a method to know id of the node before starting pacemaker?
>>> 
>>> Normally it comes from corosync, so not really :-(
>> 
>> It seems the only way is to specify the nodeid to nodelist directive
>> in corosync.conf.
>> 
>> nodelist {
>>  node {
>>    ring0_addr: 192.168.101.143
>>    nodeid: 3
>>  }
>>  node {
>>    ring0_addr: 192.168.101.144
>>    nodeid: 4
>>  }
>> }
>> 
>> Thanks!
>> 
>>> 
>>>> 
>>>>> 
>>>>> -- Vossel
>>>>> 
>>>>> 
>>>>>> cluster.c:386
>>>>>> #5  0x000000000043afbd in abort_transition_graph
>>>>>> (abort_priority=1000000, abort_action=tg_restart, abort_text=0x44d2f4
>>>>>> "Non-status change", reason=0x113e4b0, fn=0x44df07 "te_update_diff",
>>>>>> line=382) at te_utils.c:518
>>>>>> #6  0x000000000043caa4 in te_update_diff (event=0x10f2240
>>>>>> "cib_diff_notify", msg=0x1137660) at te_callbacks.c:382
>>>>>> #7  0x00007f302461d1bc in cib_native_notify (data=0x10ef750,
>>>>>> user_data=0x1137660) at cib_utils.c:733
>>>>>> #8  0x00000033db83d6bc in g_list_foreach () from /lib64/libglib-2.0.so.0
>>>>>> #9  0x00007f3024620191 in cib_native_dispatch_internal
>>>>>> (buffer=0xe61ea8 "<notify t=\"cib_notify\" subt=\"cib_diff_notify\"
>>>>>> cib_op=\"cib_apply_diff\" cib_rc=\"0\"
>>>>>> cib_object_type=\"diff\"><cib_generation><generation_tuple epoch=\"4\"
>>>>>> num_updates=\"0\" admin_epoch=\"0\" validate-with=\"pacem"...,
>>>>>> length=1708, userdata=0xe5eb90) at cib_native.c:123
>>>>>> #10 0x00007f30241dee72 in mainloop_gio_callback (gio=0xf61ea0,
>>>>>> condition=G_IO_IN, data=0xe601b0) at mainloop.c:639
>>>>>> #11 0x00000033db83feb2 in g_main_context_dispatch () from
>>>>>> /lib64/libglib-2.0.so.0
>>>>>> #12 0x00000033db843d68 in ?? () from /lib64/libglib-2.0.so.0
>>>>>> #13 0x00000033db844275 in g_main_loop_run () from /lib64/libglib-2.0.so.0
>>>>>> #14 0x0000000000406469 in crmd_init () at main.c:154
>>>>>> #15 0x00000000004062b0 in main (argc=1, argv=0x7fff908829f8) at main.c:121
>>>>>> 
>>>>>> Is this all right?
>>>>>> 
>>>>>> Best Regards,
>>>>>> Kazunori INOUE
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>> 
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> -- 
> ----------------------------------------
> METRO SYSTEMS CO., LTD
> 
> Yusuke Iida
> Mail: yusk.iida at gmail.com
> ----------------------------------------
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140508/5b419a0a/attachment-0004.sig>


More information about the Pacemaker mailing list