[Pacemaker] Java application failover problem
Andrew Beekhof
andrew at beekhof.net
Wed Jul 10 00:25:16 UTC 2013
On 09/07/2013, at 10:29 PM, Martin Gazak <martin.gazak at microstep-mis.sk> wrote:
> Dňa 7/9/2013 12:56 PM Andrew Beekhof wrote / napísal(a):
>>
>> On 09/07/2013, at 8:49 PM, Martin Gazak <martin.gazak at microstep-mis.sk> wrote:
>>
>>> Dňa 7/9/2013 12:42 PM Andrew Beekhof wrote / napísal(a):
>>>>
>>>> On 09/07/2013, at 5:05 PM, Martin Gazak <martin.gazak at microstep-mis.sk> wrote:
>>
>> It looks to be a bug in 1.1.7, you'll want to contact SUSE so they can get the fix from upstream.
>
> Dear Andrew,
> thanks for your effort.
>
> May I have 3 questions:
>
> - what version did you use to detect a bug ? - you labeled it just
> "current version" ?
1.1.10-rc6
>
> - we have downloaded corosync SuSE packages 1.1.8 and 1.1.9 - could you
> please confirm one (or both) SuSE versions have this bug fixed ?
I have no idea.
If you install them and run:
crm_simulate -Sx /var/lib/pengine/pe-input-2819.bz2
and it returns the same as what I got, then its fixed.
> Or you need the package itself as attachment to inspect it ?
> Or is there a way how to check our package has the bug fixed ?
>
> - we are going to test the package 1.1.9 anyway with the stress tests.
> As I wrote you, such situation happened extremely rarely on the testing
> cluster (however often enough to make troubles in production environment).
> Do you have any idea how to reproduce this situation in a deterministic
> way ?
It might be a timing issue.
> Just blind killing of master instance of the application from cron does
> not help - the system survived correct 70+ failovers over the weekend.
>
> Best regards
>
> Martin Gazak
>
>
>>
>> Your version:
>>
>> Jul 04 23:45:02 ims0 pengine: [3933]: WARN: unpack_rsc_op: Processing failed op ims:0_last_failure_0 on ims0: not running (7)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Recover ims:0 (Master ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip (Started ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip-src (Started ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4036: PEngine Input stored in: /var/lib/pengine/pe-input-2819.bz2
>>
>> vs. the current version:
>>
>> notice: LogActions: Demote ims:0 (Master -> Stopped ims0)
>> notice: LogActions: Promote ims:1 (Slave -> Master ims1)
>> notice: LogActions: Start ims-ip (ims1)
>> notice: LogActions: Start ims-ip-src (ims1)
>>
>> and
>>
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Recover ims:0 (Master ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip (Started ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Start ims-ip-src (ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4037: PEngine Input stored in: /var/lib/pengine/pe-input-2820.bz2
>>
>>
>> vs. the current version:
>>
>> notice: LogActions: Demote ims:0 (Master -> Stopped ims0)
>> notice: LogActions: Promote ims:1 (Slave -> Master ims1)
>> notice: LogActions: Start ims-ip (ims1)
>> notice: LogActions: Start ims-ip-src (ims1)
>>
>
>
> --
>
> Regards,
>
> Martin Gazak
> MicroStep-MIS, spol. s r.o.
> System Development Manager
> Tel.: +421 2 602 00 128
> Fax: +421 2 602 00 180
> martin.gazak at microstep-mis.sk
> http://www.microstep-mis.com
More information about the Pacemaker
mailing list