[Pacemaker] The effects of /var being full on failure detection

Ryan Thomson ryan at pet.ubc.ca
Mon Feb 7 17:24:18 UTC 2011


Hi Brett,

>> My question is this: Would /var being full on the passive node have played a role in the cluster's inability to failover during the soft lockup condition on the active node? Or perhaps we hit a condition in which our configuration of pacemaker was unable to detect this type of failure? I'm basically trying to figure out if /var being full on the passive node played a role in the lack of failover or if our configuration is inadequate at detecting the type of failure we experienced.
>
> I'd say absolutely yes. /var being full probably stopped cluster
> traffic or at the least, changes to the cib from being accepted (from
> memory cib changes are written to temp files in /var/lib/heartbeat/crm/...).

Thanks for the feedback. This is what I suspected but I wasn't sure if 
my suspicions were correct. Too bad I don't have a test/dev pacemaker 
environment to test this situation with, otherwise I could be 100% sure 
instead of 99% sure.

> It can certainly stop ssh sessions from being established.

That it did!

>>
>> Thoughts?
>
> Just for the list (since I'm sure you've done this or similar already)
> I'd suggest you use SNMP monitoring and add an SNMP trap for /var
> being 95% full.

Yep, it's something we're on top of.

> A useful addition is to mount /var/log on a different
> disk/partition/logical volume from /var, that way even if your logs
> fill up, the system should still continue to function for a while.

We have /var mounted separately, but not /var/log. Interesting idea. 
Part of our /var problem was two fold: We had enabled debug logging and 
iptables logging to diagnose a previous problem and neglected to turn 
them off again after the diagnosis session which caused unusually high 
log volume, plus we never enabled logrotate for the firewall so it just 
grew and grew without being rotated out. Tough way to be reminded of 
improper configuration...

--Ryan




More information about the Pacemaker mailing list