[Pacemaker] Issues with constraints - working for start/stop, being ignored on "failures"

Wed Jun 9 19:12:36 EDT 2010

Am 07.06.2010 03:07, schrieb Tim Serong:
> On 6/2/2010 at 11:10 AM, Cnut Jansen<work at cnutjansen.eu>  wrote:
>    
>> About those ":start" specifiers on the mount-resources's order
>> constraints you're of course right, and I also allready knew about that.
>> They're just remains from some tests (probably seek for (other?)
>> workarounds or something) I did, which I only - due to their (to my
>> knowledge) harmless redundancy - so far allways forgot to remove again
>> when doing other, more relevant/important changes. you know, due to the
>> crm-shell's (which I currently use for editing my configuration)
>> canceling all resource monitor operations on the node the crm-shell is
>> started on, I prefer to avoid starting it as much as possible for
>> allways having to make sure I afterwards made all monitor operations run
>> again (i.e. switch cluster's maintenance-mode on/off or switch node to
>> standby and back online).
>>      
> Say what?  The CRM shell shouldn't be canceling ops...
>    
That's what I had expected too, even though - the more I got used to it 
while still haven't found anything just mentioning it at all, and thus 
making me make assumptions about it - I also allready considered 
possible that it was just simply intended behaviour - maybe since one 
shouldn't call the crm-shell on a live CIB anyway, but only on shadow 
CIBs, or something - and as such just that obvious for everyone else 
that no one even thought about just wasting time for a warning note 
about it in any of the step-by-step-tutorials, for dumbheads like me. d-#
But it's perfectly, 100%ly reproducable in our office's current 
testing-cluster (SLES 11 SP0, kept up-to-date).

Meanwhile I got to "enjoy" some unexpected "holidays" (sick at home) and 
used some of it productively to start setting up a cluster with a little 
more recent software (i.e. Pacemaker 1.0.8; shipped with Debian 
Squeeze/testing), and here I so far couldn't find any unexpected cancels 
of monitor ops. So I guess that it might really be just due to a bug in 
elder Pacemaker/s or something.
We'll see when I'm back in office and upgrading our testing-cluster to 
SLES 11 SP1.

>> About those 0-scores, unfortunately they're necessary, since they're the
>> - afaik - official workaround for to prevent instances of clone
>> resources being also restarted on nodes where it's unnecessary to do so.
>> So with scores set to "inf" instead, when I for example put one node
>> into standby and/or back to online, most clone resources would also be
>> restarted on the other node. That's not acceptable for production.
>> This behaviour is according to what I remember having read only changed
>> in Pacemaker 1.0.7, which isn't shipped with SLES 11 yet. I'm hoping for
>> SLES 11 SP1 to change that, but haven't found any reliable informations
>> about its version of Pacemaker yet.
>>      
> SLES 11 SP1 and the SLE High Availability Extension 11 SP1 are now
> available for download fromhttp://download.novell.com/  - this includes
> Pacemaker 1.1.2.
>    
Yeah, I know. And it's what we finally decided to wait for about all so 
far unresolved problems, hoping that many of them would get solved with 
more recent cluster software. (-;

For example, I expect to - about the order constraints - be able to 
change the scores back to inf then, without having clones unnecessarily 
be restarted too (changed in Pacemaker 1.0.7). Then also my order 
constraints issues might(!) allready be solved too, since they (as far 
as I remember my testing) were also allready ok in SP0 with inf-scores.

p.s.: Even though wrong newsgroup; but since there are Novell guys here 
and just mentioned to upgrade to SP1: d-;
Why does the SLES 11 Upgrade-HowTo ( 
http://www.novell.com/support/documentLink.do?externalID=7005410 ; tried 
the zypper way) work correctly for SLES itself and does even show 
SLE-HAE SP1 stuff during that "<product>"-grep - and even install 
something about it; got output from 2nd try on that the HAE SP1 product 
stuff (don't remember the exact name currently) was allready installed 
-, but afterwards I only see SP1-repositories for SLES, not for SLE-HAE 
(still only SP0)... while our company's Novell-account allready shows 
5-6 HAE-installations on that one machine?! o_O