[ClusterLabs] Antw: Delayed first monitoring
Andrew Beekhof
andrew at beekhof.net
Sun Aug 16 22:22:43 UTC 2015
> On 13 Aug 2015, at 5:01 pm, Miloš Kozák <milos.kozak at lejmr.com> wrote:
>
> However,
> this does not make sense at all. Presumably, the pacemaker should get along with lsb scripts which comes from system repository, right?
Explicitly no.
We get along only with /LSB compliant/ init scripts.
Not all meet this criteria.
Debian’s init scripts were some of the biggest offenders for many many years.
A program such as Pacemaker needs (for example) sane return codes, for start to actually complete before returning, for starting something thats already started not to be an error.
A human can gloss over these things, Pacemaker is not quite as smart enough to know when these kinds of errors are ok.
>
> Therefore, there is not way how to modify lsb script because changes is lsb script erase after every package update.
>
>
> I believe, the systematical approach is in introducing of delayed monitoring or something like this into Pacemaker. I quite wonder that nobody has come around this problem already?
>
>
> Milos
>
>
>
>
>
> Dne 13.8.2015 v 08:44 Ulrich Windl napsal(a):
>> I think the start script has to be fixed to return success when httpd is
>> actually running.
>>
>>>>> Miloš Kozák <milos.kozak at lejmr.com> schrieb am 12.08.2015 um 16:03 in
>> Nachricht
>> <55CB521A.8090304 at lejmr.com>:
>>> Hi,
>>>
>>> I have set up and CoroSync+CMAN+Pacemaker at CentOS 6.5 in order to
>>> provide high-availability of opennebula. However, I am facing to a
>>> strange problem which raises from my lack of knowleadge..
>>>
>>> In the log I can see that when I create a resource based on an init
>>> script, typically:
>>>
>>> pcs resource create httpd lsb:httpd
>>>
>>> The httpd daemon gets started, but monitor is initiated at the same time
>>> and the resource is identified as not running. This behaviour makes
>>> sense since we realize that the daemon starting takes some time. In this
>>> particular case, I get error code 2 which means that process is running,
>>> but environment is not locked. The effect of this is that httpd resource
>>> gets restarted.
>>>
>>> My workaround is extra sleep in status function of the init script, but
>>> I dont like this solution at all! Do you have idea how to tackle this
>>> problem in a proper way? I expected an op attribut which would specify
>>> delay after service start and first monitoring, but I could not find it..
>>>
>>> Thank you, Milos
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list