[Pacemaker] new node causes spurious evil

Matthew O'Connor matt at ecsorl.com
Sat May 12 04:49:04 UTC 2012


My question:  Why will a node that is not allowed to start a resource 
attempt to start a monitor on that resource?  Is there a way to change 
this behavior?  (Specific monitors in question: 
ocf:heartbeat:iSCSITarget and ocf:heartbeat:iSCSILogicalUnit)

The details:
I have two nodes, ds01 and ds02, running and happy, and when adding a 
third node called gw05, things start falling apart.  I've configured an 
asymmetric opt-in cluster per the documentation, and have explicit rules 
about what can start where.  ds01 and ds02 are configured with a variety 
of resources.  gw05 is not configured with any - it's effectively a 
blank node.

With ds01 and ds02 running and in a stable state with their resources, 
bringing gw05 online (even in standby-mode) causes many things to fall 
apart.  First, a monitor error on gw05 for a resource that wasn't 
supposed to even run there.  The monitor error belonged to a group that 
was alive and well on ds01; the group died, but one of the group members 
was left alive on ds01 (?!).  Nothing could be migrated to ds02, or away 
from gw05.  After pulling a "service pacemaker stop" on the command line 
and doing a resource cleanup on the group from one of the remaining ds?? 
nodes, everything went back to normal.

(I've simplified the details here - the actual configuration is slightly 
more complex with two resource groups instead of one.  Both groups die, 
one group completely and the other has the dangling ip-address resource 
on the node it started on.  gw05 never starts anything, and isn't 
supposed to, but it's the one reporting the errors and evidently killing 
the resources.)

Now, I've tried location statements to explicitly exclude gw05 from 
starting any of the resources it's complaining about, and used copious 
order and colocation statements, to no avail.  The kicker is: when I 
finally gave in and installed one "missing" package (that should not 
have been required on gw05), the monitor worked again and things stopped 
failing.

More Specifics: packages iscsitarget and iscsitarget-dkms were required 
for gw05 to stop killing my resources.  I have an ocf:iSCSITarget, 
iSCSILogicalUnit, and virtual ip address in each of two groups.  ds01 
and ds02 share the load for these groups, and are the ONLY nodes allowed 
to run them.  gw05 should not even be trying to start these, let alone 
ANY resources/monitors in those groups IMO.  Using -inf location 
statements for both the group and for the group members had no effect.  
This effectively suggests to me that any new node I bring into the 
cluster will need to have these extra packages installed.

If this is a RTFM question, I apologize.  I've been reading it, 
honestly, and this behavior totally bewilders me.  Would setting  
is-managed="false" in the resource defaults help?  I almost loathe to 
add another step to the current "turn this resource on here" chain.

Thanks!
-- Matt





More information about the Pacemaker mailing list