[Pacemaker] ifstatus OCF RA

Vladislav Bogdanov bubble at hoster-ok.com
Sat Mar 19 13:10:39 EDT 2011


Hi,

just bumping this to be not forgotten.
RA runs fine for almost a month, several simulated network outages were
passed with full success, so it could be included in some package. I
think pacemaker, because this RA uses pacemaker-specific calls.

Andrew?

(I can support this one)

23.02.2011 11:53, Vladislav Bogdanov wrote:
> Hi Lars,
> 
> thank you for your time and for so detailed review.
> 
> Just to dot half of i's (where it is about coding style):
> 1. I strongly prefer to cleanly separate data access from main logic by API.
> 2. I prefer to have non-void functions to return result explicitly
> ("main" too). This will prevent "correct" return code from being lost on
> subsequent code modifications.
> 3. I insist on all variables inside functions to be local. This helps to
> avoid some hardly-debugable logic errors.
> 4. I prefer not to touch global variables inside of lower-layer
> functions. To be honest, I prefer to pass everything needed by function
> in its parameters. This sometimes is hard to do, so sometimes I prefer
> to use global variables relatively high at stack rather then pass them
> down through several more functions.
> 5) I prefer to use sub-shells for recursive function calls.
> 
> Please look at one more attached revision, and also in comments inline.
> 
> 
> ...
>>> <longdesc lang="en">
>>> Every time the monitor action is run, this resource agent records (in the CIB) speed of active network interfaces from a list.
>>
>>
>> Where "active" is ...
>>  what, exactly?
>>  Add some hint on the intended use case and purpose,
>>  maybe add an example or two. This is the long description, after all.
> 
> Done.
> 
>>
>>  Also note that this is
>>   - linux specific
>>   - requires kernel >= 2.6.33,
>>   afaict no /sys/class/net/*/speed before that
> 
> Ahm, was not aware about that. I need to look again at this because I
> need this to run on RHEL6 too. Anybody knows, does it has this sysfs?
> Let's delay with this for a bit.
> 
>>
>>> </longdesc>
>>> <shortdesc lang="en">Network interface status</shortdesc>
>>
>> again, "interface status" may be a bit misleading.
> 
> Fixed.
> 
>>
>>> Weight of each 10Mbps in interface speed (1Gbps = 100 * 10Mbps).
>>
>> The way you implemented it, you get the speed, then do some math with
>> it, just to arrive at the same value you just read...
>> Why not just give one point per Mbps by default?
> 
> Rewrote.
>>
>>> With default value 1Gbps interface will be counted as 1000.
>>> </longdesc>
>>> <shortdesc lang="en">Weight of 10Mbps interface</shortdesc>
>>
>> can someone come up with better wording here?
> 
> I tried.
> 
>>
>>> <parameter name="dampen" unique="0">
>>> <longdesc lang="en">
>>> The time to wait (dampening) further changes occur
>>
>> This english apparently needs fixing
>> already wherever it was copied from ;-)
> 
> Hopefully done (although english is not my native lang).
> 
>>
>>
>>> </longdesc>
>>> <shortdesc lang="en">Dampening interval</shortdesc>
>>> <content type="integer" default="${OCF_RESKEY_dampen_default}"/>
>>> </parameter>
>>>
>>> <parameter name="debug" unique="0">
>>> <longdesc lang="en">
>>> Enables to use default attrd_updater verbose logging on every call.
>>
>>  "If enabled, ..."
> 
> Re-worded.
>>
>>> </longdesc>
>>> <shortdesc lang="en">Verbose logging</shortdesc>
>>> <content type="string" default="false"/>
>>> </parameter>
>>>
>>> </parameters>
>>>
>>> <actions>
>>> <action name="start"   timeout="30" />
>>> <action name="stop"    timeout="30" />
>>> <action name="reload"  timeout="30" />
>>> <action name="monitor" depth="0"  timeout="30" interval="10"/>
>>> <action name="meta-data"  timeout="5" />
>>> <action name="validate-all"  timeout="30" />
>>> </actions>
>>> </resource-agent>
>>> END
>>> }
>>>
>>> usage() {
>>>     cat <<END
>>> usage: $0 {start|stop|monitor|migrate_to|migrate_from|validate-all|meta-data}
>>>
>>> Expects to have a fully populated OCF RA-compliant environment set.
>>> END
>>
>> BTW:
>> cat does an extra fork/exec, and "here documents" go via tmpfiles,
>> which can break things, or slow things down quite a bit.
>> most of the time, I prefer the typically built in echo "
>> ...
>> .... "
>>
>> But that has nothing to do with this RA, even less with its usage().
>> Actually it comes down to a matter of taste entirely,
>> so just ignore this statement ;-)
> 
> Let's leave it as-is.
> 
>>
>>> }
>>>
>>> start() {
>>>     monitor
>>>     if [ $? -eq $OCF_SUCCESS ]; then
>>>         return $OCF_SUCCESS
>>>     fi
>>
>> 	monitor && return
> 
> Ahm, I'd leave it as-is. It is a bit more readable and probably does not
> cost any extra CPU cycles.
> 
>>
>>>     ha_pseudo_resource ${ha_pseudo_resource_name} start
>>>     update
>>
>> 	the update has been done in monitor already?
> 
> Nope. ha_pseudo_resource ${ha_pseudo_resource_name} monitor returns 7
> there so update is not called.
> 
>>
>>> }
>>>
>>> stop() {
>>>     ha_pseudo_resource ${ha_pseudo_resource_name} stop
>>>     attrd_updater -D -n ${OCF_RESKEY_name} -d ${OCF_RESKEY_dampen} ${attrd_options}
>>>     return $OCF_SUCCESS
>>
>> do we want to consider the exit code of attrd_updater, or not?
>> if not, what do we do about non-zero exit code?
>> does it have a useful reliable exit code, at all?
>> you don't ignore it in update.
>> Though you ignore the return value of update ...
>>
>> Yeah, right, ping does it the same way probably (I did not check; does
>> it?) -- good to have a reason to review that, too ;-)
> 
> We did our best, and failure of attr_updater means much more serious
> problems than this RA han handle.
> 
>>
>>> }
>>>
>>> monitor() {
>>> 	local ret
>>>     ha_pseudo_resource ${ha_pseudo_resource_name} monitor
>>>     ret=$?
>>>     if [ ${ret} -eq $OCF_SUCCESS ] ; then
>>>         update
>>>     fi
>>>     return ${ret}
>>
>> 	ha_pseudo_resource $ha_pseudo_resource_name monitor || return
>> 	update
>>
>> 	and, if that's really necessary, ignore the return code of update:
>> 	return $OCF_SUCCESS
> 
> No, we are interested in NOT_RUNNING too.
> Let's leave as-is.
> 
>>
>>> }
>>>
>>> validate() {
>>
>> check for linux,
>> and kernel >= 2.6.33,
>> and sysfs present,
>> and if not, exit with not installed or something?
> 
> Again, let's delay it for a bit.
> 
>>
>> Ah. just noticed you do that explicitly early on, anyways.
>>>     # Check the check interval
>>>     if ocf_is_decimal "${OCF_RESKEY_CRM_meta_interval}" && [ ${OCF_RESKEY_CRM_meta_interval} -gt 0 ]; then
>>>         :
>>>     else
>>>         ocf_log err "Invalid check interval ${OCF_RESKEY_interval}. It should be positive integer!"
>>>         exit $OCF_ERR_CONFIGURED
>>>     fi
>>>
>>>     # Check the intarfaces list
>>
>> s/tar/ter
> 
> Thanks.
> 
>>
>>>     if [ "x" = "x${OCF_RESKEY_iface}" ]; then 
>>
>> You declared yourself as bash.
>> no need for this "x" = "x$VAR" nonsense in bash...
>> in fact, even in sh, I find test -z preferable.
> 
> Blindly copied it. Done.
> 
>>
>>>         ocf_log err "Empty iface parameter.  Please specify some network interface to check"
>>
>> also check for strange characters in iface,
>> it will be used as regex/sed/awk expression later.
> 
> Let it be TODO.
> 
>>
>>>         exit $OCF_ERR_CONFIGURED
>>>     fi
>>>
>>>     return $OCF_SUCCESS
>>> }
>>>
>>> iface_get_speed() {
>>>     local iface=$1
>>>     local operstate
>>>     local carrier
>>>     local speed
>>>
>>>     if [ ! -e "/sys/class/net/${iface}" ] ; then
>>>         echo 0
>>>     elif iface_is_bridge ${iface} ; then
>>>         bridge_get_speed ${iface}
>>>     elif iface_is_bond ${iface} ; then
>>>         bond_get_speed ${iface}
>>>     elif iface_is_vlan ${iface} ; then
>>>         iface_get_speed $( vlan_get_phy ${iface} )
>>>     else
>>>         read operstate < "/sys/class/net/${iface}/operstate"
>>>         read carrier < "/sys/class/net/${iface}/carrier"
>>>         if [ "${operstate}" != "up" ] || [ "${carrier}" != "1" ] ; then
>>>             speed="0"
>>>         else
>>>             read speed < "/sys/class/net/${iface}/speed"
>>>         fi
>>>         echo ${speed}
>>
>> I'd prefer to have speed non-local,
> 
> I'd not. Let's be safe with recursions.
> 
>> and be a return value.
> 
> Does bash already support 10000 to be function return code? ;)
> 
>>
>> then you can do away with the $(iface_get_speed) later as well.
>>
>>>     fi
>>> }
>>>
>>> iface_is_vlan() {
>>>     local iface=$1
>>>     [ -e "/proc/net/vlan/${iface}" ] && return 0 || return 1
>>> }
>>>
>>> iface_is_bridge() {
>>>     local iface=$1
>>>     [ -e "/sys/class/net/${iface}/bridge" ] && return 0 || return 1
>>> }
>>>
>>> iface_is_bond() {
>>>     local iface=$1
>>>     [ -e "/sys/class/net/${iface}/bonding" ] && return 0 || return 1
>>> }
>>
>> All these functions:
>> iface_is_vlan()		{ [ -e "/proc/net/vlan/$1" ]; }
>> iface_is_bridge()	{ [ -e "/sys/class/net/$1/bridge" ]; }
>> iface_is_bond()		{ [ -e "/sys/class/net/$1/bonding" ]; }
> 
> Please see above about implicit return codes. This should not take any
> CPU cycles.
> 
>>
>>> vlan_get_phy() {
>>>     local iface=$1
>>>     grep "^${iface} " "/proc/net/vlan/config" | sed -r 's/.*\| +(.*)/\1/'
>>
>> probably safe to assume that on a box running linux kernel >= 2.6.33
>> the sed supports -r, but you never know ;-)
>> also, if you use sed aleady, have it do the pattern matching as well?
>>
>> vlan_get_phy() { sed -ne "s/^$1 .*| *//p" < /proc/net/vlan/config; }
> 
> Ahm, good point, thanks.
> 
>>
>>> }
>>>
>>> bridge_is_stp_enabled() {
>>>     local iface=$1
>>>     local stp
>>>     read stp < "/sys/class/net/${iface}/bridge/stp_state"
>>>     [ "${stp}" = "1" ] && return 0 || return 1
>>
>> how about
>> bridge_is_stp_enabled() { grep 1 < /sys/class/net/$1/bridge/stp_state; }
> 
> This will be slower.
> 
>>
>> maybe add 2>/dev/null ?
>>
>> or, as we are bash:
>> bridge_is_stp_enabled()	{ [[ $(</sys/class/net/$1/bridge/stp_state) = 1 ]]; }
> 
> I'd say this is unreadable for me. I'd avoid non-trivial constructs with
> no performance gain. And again, implicit vs explicit.
> 
>>
>> but some may prefer the more verbose thing.
>> if so, at least do away with the && return || return.
>> shell function return value is the "exit status" of the last command
>> executed, so [[ $stp = 1 ]] is sufficient.
> 
> But more error-prone on subsequent code modifications.
> 
>>
>>> }
>>>
>>> bridge_get_root_ports() {
>>>     local bridge=$1
>>>     local root_id
>>>     local root_ports=""
>>>     local bridge_id
>>>
>>>     read root_id < "/sys/class/net/${bridge}/bridge/root_id"
>>>
>>>     for port in /sys/class/net/${bridge}/brif/* ; do
>>>         read bridge_id < "${port}/designated_bridge"
>>>         if [ "${bridge_id}" = "${root_id}" ] ; then
>>>             root_ports="${root_ports} ${port##*/}"
>>>         fi
>>>     done
>>
>> Yes, I think here the more verbose style is appropriate ;-)
>>
>>>     echo "${root_ports# }"
> 
> I'd leave this as-is. And, I remade this chunk of code, so this
> construct is (probably) really needed now. Anyways, I hate implicit
> assumptions.
> 
>>
>> Though this is not necessary: echo $root_ports would do this just fine.
>> and, again, I'd prefer a non-local variable to pass the value
>> over some $(subshell) thingy.
> 
> Did it another way.
> 
>>
>>> }
>>>
>>> # From /inlude/linux/if_bridge.h:
>>> #define BR_STATE_DISABLED 0
>>> #define BR_STATE_LISTENING 1
>>> #define BR_STATE_LEARNING 2
>>> #define BR_STATE_FORWARDING 3
>>> #define BR_STATE_BLOCKING 4
>>>
>>> bridge_get_active_ports() {
>>>     local bridge=$1
>>>     shift 1
>>>     local ports="$*"
>>>     local active_ports=""
>>>     local port_state
>>>     local stp_state=bridge_is_stp_enabled ${bridge}
>>
>> some double quote missing there?
> 
> I should be temporarily insane while writing that ;)
> Fixed.
> 
>>
>>>     local warn=0
>>>
>>>     if [ -z "${ports}" ] || [ "${ports}" = "detect" ] ; then
>>>         ports=$( bridge_get_root_ports ${bridge} )
>>
>> 	if you did non-local root_ports, this would become
>> 	bridge_get_root_ports $bridge
>> 	# root ports now in $root_ports,
>> 	# so you can assign port=$root_ports,
>> 	# or just use $root_ports as is.
> 
> Rewrote this.
> 
>>
>>>     fi
>>>
>>>     for port in $ports ; do
>>
>> there. no need to trim white space anyways, as I thought.
> 
> I'd leave it.
> 
>>
>>>         if [ ! -e "/sys/class/net/${bridge}/brif/${port}" ] ; then
>>>             ocf_log warning "Port ${port} doesn't belong to bridge ${bridge}"
>>>             continue
>>>         fi
>>>         read port_state < "/sys/class/net/${bridge}/brif/${port}/state"
>>>         if [ "${port_state}" = "3" ] ; then
>>>             if [ -n "${active_ports}" ] && ${stp_state} ; then
>>
>> no need to re-check stp state for each iteration.
>> do that once, and reference the result here.
> 
> This nonsense is fixed.
> 
>>
>>>                 warn=1
>>>             fi
>>>             active_ports="${active_ports} ${port}"
>>>         fi
>>>     done
>>>     if [ ${warn} -eq 1 ] ; then
>>>         ocf_log warning "More then one upstream port in bridge '${bridge}' is in forwarding state while STP is enabled: ${active_ports}" 
>>>     fi
>>>     echo "${active_ports# }"
>>
>> again, no need to trim white space, and I prefer non-local documented
>> variables over $(subshell echo) style assignments.
>>
>>> }
>>>
>>> bridge_get_speed() {
>>>     local $iface=$1
>>>
>>>     if ! iface_is_bridge ${iface} ; then
>>>         echo 0
>>>         return
>>>     fi
>>>
>>>     local ports=$( bridge_get_active_ports ${iface} ${OCF_RESKEY_bridge_ports} )
> 
> Still leaving this as subshell, will rethink later.
> 
>>>     for port in ${ports} ; do
>>>         : $(( aggregate_speed += $( iface_get_speed ${port} ) ))
>>>     done
>>>     echo ${aggregate_speed}
>>
>> This can be rewritten without subshells.
> 
> Error-prone because of recursions.
> I'd leave it as-is.
> 
>>
>> It could even be done without bashisms,
>> (there is only one bashism in there, afaics)
>> but who cares, we are bash anyways ;-)
>>
>>> }
>>>
>>> bond_get_slaves() {
>>>     local iface=$1
>>>     local slaves
>>>     read slaves < "/sys/class/net/${iface}/bonding/slaves"
>>>     echo ${slaves}
>>
>> if that is what you wanted, why not just say cat?
> 
> It is slower.
> 
>> or, drop this function completely, as it only adds noise,
>> and later do
>> 	slaves=$(< /sys/class/net/${iface}/bonding/slaves)
>> avoiding any subshells?
> 
> Data access separation.
> 
>>
>>> }
>>>
>>> bond_get_active_iface() {
>>>     local iface=$1
>>>     local active
>>>     read active < "/sys/class/net/${iface}/bonding/active_slave"
>>>     echo ${active}
>>> }
>>
>> similar.
>>
>>>
>>> bond_is_balancing() {
>>>     local iface=$1
>>>     read mode mode_index < "/sys/class/net/${iface}/bonding/mode"
>>
>>
>> You declare all variables in all your tiny to-be-used-as-subshell
>> functions as local (needlessly, as they are visible in that subshell
>> only anyways... but keep them local, we want to get rid of the subshell
>> echo style assignments), but then here omit the local? how come?
> 
> Having function variables local is simply a good habit I think. Like
> having curly braces around one-statement blocks in C.
> I rewrote some functions so they can be used without subshell at a cost
> of additional variable.
> 
>>
>> not that it matters, really, they are used only here, anyways.
>>
>>>     case ${mode} in
>>>         "balance-rr"|"balance-xor"|"802.3ad"|"balance-tlb"|"balance-alb")
>>>             return 0
>>>             ;;
>>>         *)
>>>             return 1
>>>             ;;
>>>     esac
>>> }
>>>
>>> bond_get_speed() {
>>>     local iface=$1
>>>     local aggregate_speed=0
>>>
>>>     if ! iface_is_bond ${iface} ; then
>>>         echo 0
>>>         return
>>>     fi
>>>
>>>     local slaves=$( bond_get_slaves ${iface} )
>>>     if bond_is_balancing ${iface} ; then
>>>         for slave in ${slaves} ; do
>>>             : $(( aggregate_speed += $( iface_get_speed ${slave} ) ))
>>>         done
>>>         # Bonding is unable to get speed*n
>>>         : $(( aggregate_speed = aggregate_speed*8/10 ))
>>>     else
>>>         : $(( aggregate_speed = $( iface_get_speed $( bond_get_active_iface ${iface} ) ) ))
>>>     fi
>>>     echo ${aggregate_speed} 
>>
>> Again, I'd prefer this to be handled without subshells.
> 
> Recursion.
> 
>>
>>> }
>>>
>>> update() {
>>>     local speed=$( iface_get_speed ${OCF_RESKEY_iface} )
>>>
>>>     : $(( score = speed * ${OCF_RESKEY_weight_base} / 10 ))
>>
>> hmmm...
>> score = speed * 10 / 10
>> but I complained about that earlier already ;-)
>>
>>>     attrd_updater -n ${OCF_RESKEY_name} -v ${score} -d ${OCF_RESKEY_dampen} ${attrd_options}
>>>     rc=$?
>>>     case ${rc} in
>>>         0)
>>>             ocf_is_true ${OCF_RESKEY_debug} && ocf_log debug "Updated ${OCF_RESKEY_name} = ${score}"
>>>             ;;
>>>         *)
>>>             ocf_log warn "Could not update ${OCF_RESKEY_name} = ${score}: rc=${rc}"
>>>             ;;
>>>     esac
>>>     return ${rc}
>>
>> all that fun just to ignore the return value of this function anyways?
> 
> No. It was not ignored.
> But I fixed implicit to explicit return in start().
> 
>>
>>
>>> }
>>>
>>> if [ `uname` != "Linux" ] ; then
>>>     ocf_log err "This RA works only on linux."
>>>     exit $OCF_ERR_INSTALLED
>>> fi
>>>
>>> if ! ocf_is_true ${OCF_RESKEY_CRM_meta_globally_unique} ; then
>>>     : ${ha_pseudo_resource_name:="ifspeed-${OCF_RESKEY_name}"}
>>> else
>>>     : ${ha_pseudo_resource_name:="ifspeed-${OCF_RESOURCE_INSTANCE}"}
>>> fi
>>
>> At least one of those clone possibilities is probably nonsense,
>> but wellm, that's how this ha_pseudo_resource stuff works.
> 
> Dunno, just blindly copied this.
> 
>>
>>> attrd_options='-q'
>>> if ocf_is_true ${OCF_RESKEY_debug} ; then
>>>     attrd_options=''
>>> fi
>>>
>>> case $__OCF_ACTION in
>>>     meta-data)
>>>         meta_data
>>>         exit $OCF_SUCCESS
>>>         ;;
>>>     start)
>>>         start
>>>         ;;
>>>     stop)
>>>         stop
>>>         ;;
>>>     monitor)
>>>         monitor
>>>         ;;
>>>     reload)
>>>         start
>>>         ;;
>>>     validate-all)
>>>         validate
>>>         ;;
>>>     usage|help)
>>>         usage
>>>         exit $OCF_SUCCESS
>>>         ;;
>>>     *)
>>>         usage
>>>         exit $OCF_ERR_UNIMPLEMENTED
>>>         ;;
>>> esac
>>> exit $?
>>
>> The $? is not necessary there.
> 
> I prefer to have it explicitly.
> 
>>
>>
>>
>> Good job!
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list