[Pacemaker] Lots of Issues with Live Pacemaker Cluster

Dejan Muhamedagic dejanmm at fastmail.fm
Mon Mar 14 12:35:49 EDT 2011


Hi,

On Mon, Mar 14, 2011 at 10:57:27AM -0000, Darren.Mansell at opengi.co.uk wrote:
> Hello everyone.
> 
>  
> 
> I built and put into production without adequate testing a 2 node
> cluster running Ubuntu 10.04 LTS with Pacemaker and associated packages
> from the Ubuntu-HA-maintainers repo
> (https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa). 

Not good to go live without sufficient testing. Testing is as
important as anything else. Or even more important. If there
isn't enough time for testing, then better to go without
clustering.

> I've always had many problems with my build, mainly because it was
> over-complicated and I didn't have adequate time to test it and tweak it
> before putting it live. If I list my problems below, could anyone have a
> look and see if there is anything obvious? Thanks.
>  
[...]

> 2.       Crm shell won't load from a text file. When I use crm configure
> < crm.txt, it will run through the file, complaining about the default
> timeout being less than 240, but doesn't load anything. So I go into the
> crm shell and set default-action-timeout to 240, commit and exit and do
> the same. This time it just exits silently, without loading the config.

Strange. I assume that you run version 1.0.x which I don't use
very often, but I cannot recall seeing this problem.

> If I go into the crm shell and use load replace crm.txt it will work.

Loading from a file was really meant to be done with "configure
load". Now, if there are errors/warnings in the configuration,
what happens depends on check-* options for semantic checks.

> 3.       Crm shell tab completes don't work unless you put an incorrect
> entry in first. I'm sure this is a python readline problem, as it also
> happens in SLE 11 HAE SP1 (but not in pre-SP1). I assume everyone
> associated (Dejan?) is aware of the problem, but highlighting it just in
> case.

No, I'm not aware of it. Tab completion works here, though a bit
differently from 1.0 due to lazy creation of the completion
tables. You need to enter another level at least once before the
tab completion is going to work for that level. For instance,
it won't work in this case:

crm(live)# resource <TAB><TAB>

But it would once the user enters the resource level:

crm(live)resource# <TAB><TAB>
bye           failcount     move          restart	unmigrate 
cd            help          param         show          unmove 
cleanup       list          promote       start         up 
demote        manage        quit          status	utilization 
end           meta          refresh       stop          
exit          migrate       reprobe       unmanage      

Can you elaborate "put incorrect entry first"?

Thanks,

Dejan

> I've attached my crm config, cib XML, /etc/drbd.conf for reference.
> Please forgive my SSH STONITH, I've not had chance to get the IBM RSA
> configured on it yet.
> 
>  
> 
> Thanks all!
> 
> Best regards,
> 
> Darren Mansell




More information about the Pacemaker mailing list