[Pacemaker] pacemaker service start failed.

Thu Oct 25 20:31:23 EDT 2012

On Thu, Oct 25, 2012 at 11:14 PM, Yuusuke Iida
<iidayuus at intellilink.co.jp> wrote:
> Hi, Andrew
>
>
> (2012/10/25 9:54), Andrew Beekhof wrote:
>>
>> On Mon, Oct 22, 2012 at 10:29 PM, Yuusuke Iida
>> <iidayuus at intellilink.co.jp> wrote:
>>>
>>> Hi, Vossel
>>>
>>> (2012/10/20 0:42), David Vossel wrote:
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>>
>>>>> From: "Yuusuke Iida" <iidayuus at intellilink.co.jp>
>>>>> To: "pacemaker at oss" <pacemaker at oss.clusterlabs.org>
>>>>> Cc: shimazakik at intellilink.co.jp
>>>>> Sent: Friday, October 19, 2012 1:43:25 AM
>>>>> Subject: [Pacemaker] pacemaker service start failed.
>>>>>
>>>>> Hi, Andrew
>>>>>
>>>>> I made a version of Pacemaker latest.
>>>>> Then pacemaker came to fail in start.
>>>>>
>>>>> I think that this came to be caused by the following changes.
>>>>>
>>>>> https://github.com/ClusterLabs/pacemaker/commit/4f88cb1049e898726472a91fff834dcccbd6f665
>>>>>
>>>>> I confirm movement in the following versions now.
>>>>>
>>>>> OS: RHEL6.3
>>>>> # pacemakerd -F
>>>>> Pacemaker 1.1.8 (Build: bd68c20)
>>>>>    Supporting:  agent-manpages ncurses libqb-logging libqb-ipc
>>>>>    lha-fencing
>>>>>    heartbeat corosync-native
>>>>> # corosync -v
>>>>> Corosync Cluster Engine, version '2.1.0.1-20c58'
>>>>>
>>>>> I collected crm_report of this time.
>>>>>
>>>>> Did how to use pacemaker change?
>>>>> Does my setting have a problem?
>>>>
>>>>
>>>> Looking at your corosync.conf, what happens if you un-comment the udpu
>>>> transport and node list? I'm curious to know if this is a problem limited to
>>>> the use of corosync with multicast for some reason.
>>>
>>> I un-comment and started pacemaker, but have failed like the last time.
>>>
>>> nodeid seems to be always handled with 0 as far as I watch
>>> "lib/cluster/corosync.c".
>>
>>
>> nodeid 0 is a way of saying "our node"
>>
>>>
>>> Like the patch which I attached, should not I use pcmk_nodeid here?
>>
>>
>> It shouldn't be necessary.
>> I run the exact same setup (multicast, no nodelist) and it seems to work
>> fine.
>>
>> How are node names mapped to ip addresses?  DNS or /etc/hosts
>
> The IP which ring0 uses is not mapped with respect with a node name in my
> environment.
>
>
>>
>> Could you turn on debug and see if you're getting this message please?
>>
>>              crm_debug("Unable to get node address for nodeid %u: %s",
>> nodeid, cs_strerror(rc));
>>
>> I'm using DNS but I thought /etc/hosts worked too.
>
> When I described the IP which I used in ring0 in /etc/hosts, I confirmed
> that start of pacemaker succeeded.
>

[moved first question to the end]

> Was there any problem with a conventional method to use uname()?

The problem with uname() is that your peers don't know the value until
you send it to them.
Which creates a conceptual race condition - how do you send a message
to (or fence) a peer who's name you don't know yet?

> Will setting to convert IP of such ring0 into the name be necessary by all
> means in future?

In a word "no" :-)

There are a couple of options:

- you can specify the names to use in corosync.conf (nodelist)
  using a nodelist doesn't prevent you from using multicast

- you can setup /etc/hosts as you did above

- I have just now re-instated the uname() default for corosync 2.0
cluster types.  It didn't occur to me that people wouldn't set up
anything :-)
  The patch is: https://github.com/beekhof/pacemaker/commit/9a81945
can you give it a try?