[Pacemaker] Java application failover problem

Martin Gazak martin.gazak at microstep-mis.sk
Tue Jul 9 08:29:14 EDT 2013


Dňa 7/9/2013 12:56 PM Andrew Beekhof  wrote / napísal(a):
> 
> On 09/07/2013, at 8:49 PM, Martin Gazak <martin.gazak at microstep-mis.sk> wrote:
> 
>> Dňa 7/9/2013 12:42 PM Andrew Beekhof  wrote / napísal(a):
>>>
>>> On 09/07/2013, at 5:05 PM, Martin Gazak <martin.gazak at microstep-mis.sk> wrote:
> 
> It looks to be a bug in 1.1.7, you'll want to contact SUSE so they can get the fix from upstream.

Dear Andrew,
thanks for your effort.

May I have 3 questions:

- what version did you use to detect a bug ? - you labeled it just
"current version" ?

- we have downloaded corosync SuSE packages 1.1.8 and 1.1.9 - could you
please confirm one (or both) SuSE versions have this bug fixed ?
Or you need the package itself as attachment to inspect it ?
Or is there a way how to check our package has the bug fixed ?

- we are going to test the package 1.1.9 anyway with the stress tests.
As I wrote you, such situation happened extremely rarely on the testing
cluster (however often enough to make troubles in production environment).
Do you have any idea how to reproduce this situation in a deterministic
way ?
Just blind killing of master instance of the application from cron does
not help - the system survived correct 70+ failovers over the weekend.

Best regards

Martin Gazak


> 
> Your version:
> 
> Jul 04 23:45:02 ims0 pengine: [3933]: WARN: unpack_rsc_op: Processing failed op ims:0_last_failure_0 on ims0: not running (7)
> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Recover ims:0	(Master ims0)
> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip	(Started ims0)
> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip-src	(Started ims0)
> Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4036: PEngine Input stored in: /var/lib/pengine/pe-input-2819.bz2
> 
> vs. the current version:
> 
>   notice: LogActions: 	Demote  ims:0	(Master -> Stopped ims0)
>   notice: LogActions: 	Promote ims:1	(Slave -> Master ims1)
>   notice: LogActions: 	Start   ims-ip	(ims1)
>   notice: LogActions: 	Start   ims-ip-src	(ims1)
> 
> and
> 
> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Recover ims:0	(Master ims0)
> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip	(Started ims0)
> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Start   ims-ip-src	(ims0)
> Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4037: PEngine Input stored in: /var/lib/pengine/pe-input-2820.bz2
> 
> 
> vs. the current version:
> 
>   notice: LogActions: 	Demote  ims:0	(Master -> Stopped ims0)
>   notice: LogActions: 	Promote ims:1	(Slave -> Master ims1)
>   notice: LogActions: 	Start   ims-ip	(ims1)
>   notice: LogActions: 	Start   ims-ip-src	(ims1)
> 


-- 

Regards,

Martin Gazak
MicroStep-MIS, spol. s r.o.
System Development Manager
Tel.: +421 2 602 00 128
Fax: +421 2 602 00 180
martin.gazak at microstep-mis.sk
http://www.microstep-mis.com




More information about the Pacemaker mailing list