[Pacemaker] ifstatus OCF RA

Wed Apr 13 09:06:38 UTC 2011

On Sat, Mar 19, 2011 at 6:10 PM, Vladislav Bogdanov
<bubble at hoster-ok.com> wrote:
> Hi,
>
> just bumping this to be not forgotten.

Actually I'd missed that this was a Pacemaker specific one and
therefore something I needed to look at :-)

> RA runs fine for almost a month, several simulated network outages were
> passed with full success, so it could be included in some package. I
> think pacemaker, because this RA uses pacemaker-specific calls.
>
> Andrew?

Looks pretty reasonable.
Can you resend with hg headers (ie. "hg export") so that you can have
the appropriate credit?
It should be added to extra/resources

>
> (I can support this one)
>
> 23.02.2011 11:53, Vladislav Bogdanov wrote:
>> Hi Lars,
>>
>> thank you for your time and for so detailed review.
>>
>> Just to dot half of i's (where it is about coding style):
>> 1. I strongly prefer to cleanly separate data access from main logic by API.
>> 2. I prefer to have non-void functions to return result explicitly
>> ("main" too). This will prevent "correct" return code from being lost on
>> subsequent code modifications.
>> 3. I insist on all variables inside functions to be local. This helps to
>> avoid some hardly-debugable logic errors.
>> 4. I prefer not to touch global variables inside of lower-layer
>> functions. To be honest, I prefer to pass everything needed by function
>> in its parameters. This sometimes is hard to do, so sometimes I prefer
>> to use global variables relatively high at stack rather then pass them
>> down through several more functions.
>> 5) I prefer to use sub-shells for recursive function calls.
>>
>> Please look at one more attached revision, and also in comments inline.
>>
>>
>> ...
>>>> <longdesc lang="en">
>>>> Every time the monitor action is run, this resource agent records (in the CIB) speed of active network interfaces from a list.
>>>
>>>
>>> Where "active" is ...
>>>  what, exactly?
>>>  Add some hint on the intended use case and purpose,
>>>  maybe add an example or two. This is the long description, after all.
>>
>> Done.
>>
>>>
>>>  Also note that this is
>>>   - linux specific
>>>   - requires kernel >= 2.6.33,
>>>   afaict no /sys/class/net/*/speed before that
>>
>> Ahm, was not aware about that. I need to look again at this because I
>> need this to run on RHEL6 too. Anybody knows, does it has this sysfs?
>> Let's delay with this for a bit.
>>
>>>
>>>> </longdesc>
>>>> <shortdesc lang="en">Network interface status</shortdesc>
>>>
>>> again, "interface status" may be a bit misleading.
>>
>> Fixed.
>>
>>>
>>>> Weight of each 10Mbps in interface speed (1Gbps = 100 * 10Mbps).
>>>
>>> The way you implemented it, you get the speed, then do some math with
>>> it, just to arrive at the same value you just read...
>>> Why not just give one point per Mbps by default?
>>
>> Rewrote.
>>>
>>>> With default value 1Gbps interface will be counted as 1000.
>>>> </longdesc>
>>>> <shortdesc lang="en">Weight of 10Mbps interface</shortdesc>
>>>
>>> can someone come up with better wording here?
>>
>> I tried.
>>
>>>
>>>> <parameter name="dampen" unique="0">
>>>> <longdesc lang="en">
>>>> The time to wait (dampening) further changes occur
>>>
>>> This english apparently needs fixing
>>> already wherever it was copied from ;-)
>>
>> Hopefully done (although english is not my native lang).
>>
>>>
>>>
>>>> </longdesc>
>>>> <shortdesc lang="en">Dampening interval</shortdesc>
>>>> <content type="integer" default="${OCF_RESKEY_dampen_default}"/>
>>>> </parameter>
>>>>
>>>> <parameter name="debug" unique="0">
>>>> <longdesc lang="en">
>>>> Enables to use default attrd_updater verbose logging on every call.
>>>
>>>  "If enabled, ..."
>>
>> Re-worded.
>>>
>>>> </longdesc>
>>>> <shortdesc lang="en">Verbose logging</shortdesc>
>>>> <content type="string" default="false"/>
>>>> </parameter>
>>>>
>>>> </parameters>
>>>>
>>>> <actions>
>>>> <action name="start"   timeout="30" />
>>>> <action name="stop"    timeout="30" />
>>>> <action name="reload"  timeout="30" />
>>>> <action name="monitor" depth="0"  timeout="30" interval="10"/>
>>>> <action name="meta-data"  timeout="5" />
>>>> <action name="validate-all"  timeout="30" />
>>>> </actions>
>>>> </resource-agent>
>>>> END
>>>> }
>>>>
>>>> usage() {
>>>>     cat <<END
>>>> usage: $0 {start|stop|monitor|migrate_to|migrate_from|validate-all|meta-data}
>>>>
>>>> Expects to have a fully populated OCF RA-compliant environment set.
>>>> END
>>>
>>> BTW:
>>> cat does an extra fork/exec, and "here documents" go via tmpfiles,
>>> which can break things, or slow things down quite a bit.
>>> most of the time, I prefer the typically built in echo "
>>> ...
>>> .... "
>>>
>>> But that has nothing to do with this RA, even less with its usage().
>>> Actually it comes down to a matter of taste entirely,
>>> so just ignore this statement ;-)
>>
>> Let's leave it as-is.
>>
>>>
>>>> }
>>>>
>>>> start() {
>>>>     monitor
>>>>     if [ $? -eq $OCF_SUCCESS ]; then
>>>>         return $OCF_SUCCESS
>>>>     fi
>>>
>>>      monitor && return
>>
>> Ahm, I'd leave it as-is. It is a bit more readable and probably does not
>> cost any extra CPU cycles.
>>
>>>
>>>>     ha_pseudo_resource ${ha_pseudo_resource_name} start
>>>>     update
>>>
>>>      the update has been done in monitor already?
>>
>> Nope. ha_pseudo_resource ${ha_pseudo_resource_name} monitor returns 7
>> there so update is not called.
>>
>>>
>>>> }
>>>>
>>>> stop() {
>>>>     ha_pseudo_resource ${ha_pseudo_resource_name} stop
>>>>     attrd_updater -D -n ${OCF_RESKEY_name} -d ${OCF_RESKEY_dampen} ${attrd_options}
>>>>     return $OCF_SUCCESS
>>>
>>> do we want to consider the exit code of attrd_updater, or not?
>>> if not, what do we do about non-zero exit code?
>>> does it have a useful reliable exit code, at all?
>>> you don't ignore it in update.
>>> Though you ignore the return value of update ...
>>>
>>> Yeah, right, ping does it the same way probably (I did not check; does
>>> it?) -- good to have a reason to review that, too ;-)
>>
>> We did our best, and failure of attr_updater means much more serious
>> problems than this RA han handle.
>>
>>>
>>>> }
>>>>
>>>> monitor() {
>>>>     local ret
>>>>     ha_pseudo_resource ${ha_pseudo_resource_name} monitor
>>>>     ret=$?
>>>>     if [ ${ret} -eq $OCF_SUCCESS ] ; then
>>>>         update
>>>>     fi
>>>>     return ${ret}
>>>
>>>      ha_pseudo_resource $ha_pseudo_resource_name monitor || return
>>>      update
>>>
>>>      and, if that's really necessary, ignore the return code of update:
>>>      return $OCF_SUCCESS
>>
>> No, we are interested in NOT_RUNNING too.
>> Let's leave as-is.
>>
>>>
>>>> }
>>>>
>>>> validate() {
>>>
>>> check for linux,
>>> and kernel >= 2.6.33,
>>> and sysfs present,
>>> and if not, exit with not installed or something?
>>
>> Again, let's delay it for a bit.
>>
>>>
>>> Ah. just noticed you do that explicitly early on, anyways.
>>>>     # Check the check interval
>>>>     if ocf_is_decimal "${OCF_RESKEY_CRM_meta_interval}" && [ ${OCF_RESKEY_CRM_meta_interval} -gt 0 ]; then
>>>>         :
>>>>     else
>>>>         ocf_log err "Invalid check interval ${OCF_RESKEY_interval}. It should be positive integer!"
>>>>         exit $OCF_ERR_CONFIGURED
>>>>     fi
>>>>
>>>>     # Check the intarfaces list
>>>
>>> s/tar/ter
>>
>> Thanks.
>>
>>>
>>>>     if [ "x" = "x${OCF_RESKEY_iface}" ]; then
>>>
>>> You declared yourself as bash.
>>> no need for this "x" = "x$VAR" nonsense in bash...
>>> in fact, even in sh, I find test -z preferable.
>>
>> Blindly copied it. Done.
>>
>>>
>>>>         ocf_log err "Empty iface parameter.  Please specify some network interface to check"
>>>
>>> also check for strange characters in iface,
>>> it will be used as regex/sed/awk expression later.
>>
>> Let it be TODO.
>>
>>>
>>>>         exit $OCF_ERR_CONFIGURED
>>>>     fi
>>>>
>>>>     return $OCF_SUCCESS
>>>> }
>>>>
>>>> iface_get_speed() {
>>>>     local iface=$1
>>>>     local operstate
>>>>     local carrier
>>>>     local speed
>>>>
>>>>     if [ ! -e "/sys/class/net/${iface}" ] ; then
>>>>         echo 0
>>>>     elif iface_is_bridge ${iface} ; then
>>>>         bridge_get_speed ${iface}
>>>>     elif iface_is_bond ${iface} ; then
>>>>         bond_get_speed ${iface}
>>>>     elif iface_is_vlan ${iface} ; then
>>>>         iface_get_speed $( vlan_get_phy ${iface} )
>>>>     else
>>>>         read operstate < "/sys/class/net/${iface}/operstate"
>>>>         read carrier < "/sys/class/net/${iface}/carrier"
>>>>         if [ "${operstate}" != "up" ] || [ "${carrier}" != "1" ] ; then
>>>>             speed="0"
>>>>         else
>>>>             read speed < "/sys/class/net/${iface}/speed"
>>>>         fi
>>>>         echo ${speed}
>>>
>>> I'd prefer to have speed non-local,
>>
>> I'd not. Let's be safe with recursions.
>>
>>> and be a return value.
>>
>> Does bash already support 10000 to be function return code? ;)
>>
>>>
>>> then you can do away with the $(iface_get_speed) later as well.
>>>
>>>>     fi
>>>> }
>>>>
>>>> iface_is_vlan() {
>>>>     local iface=$1
>>>>     [ -e "/proc/net/vlan/${iface}" ] && return 0 || return 1
>>>> }
>>>>
>>>> iface_is_bridge() {
>>>>     local iface=$1
>>>>     [ -e "/sys/class/net/${iface}/bridge" ] && return 0 || return 1
>>>> }
>>>>
>>>> iface_is_bond() {
>>>>     local iface=$1
>>>>     [ -e "/sys/class/net/${iface}/bonding" ] && return 0 || return 1
>>>> }
>>>
>>> All these functions:
>>> iface_is_vlan()              { [ -e "/proc/net/vlan/$1" ]; }
>>> iface_is_bridge()    { [ -e "/sys/class/net/$1/bridge" ]; }
>>> iface_is_bond()              { [ -e "/sys/class/net/$1/bonding" ]; }
>>
>> Please see above about implicit return codes. This should not take any
>> CPU cycles.
>>
>>>
>>>> vlan_get_phy() {
>>>>     local iface=$1
>>>>     grep "^${iface} " "/proc/net/vlan/config" | sed -r 's/.*\| +(.*)/\1/'
>>>
>>> probably safe to assume that on a box running linux kernel >= 2.6.33
>>> the sed supports -r, but you never know ;-)
>>> also, if you use sed aleady, have it do the pattern matching as well?
>>>
>>> vlan_get_phy() { sed -ne "s/^$1 .*| *//p" < /proc/net/vlan/config; }
>>
>> Ahm, good point, thanks.
>>
>>>
>>>> }
>>>>
>>>> bridge_is_stp_enabled() {
>>>>     local iface=$1
>>>>     local stp
>>>>     read stp < "/sys/class/net/${iface}/bridge/stp_state"
>>>>     [ "${stp}" = "1" ] && return 0 || return 1
>>>
>>> how about
>>> bridge_is_stp_enabled() { grep 1 < /sys/class/net/$1/bridge/stp_state; }
>>
>> This will be slower.
>>
>>>
>>> maybe add 2>/dev/null ?
>>>
>>> or, as we are bash:
>>> bridge_is_stp_enabled()      { [[ $(</sys/class/net/$1/bridge/stp_state) = 1 ]]; }
>>
>> I'd say this is unreadable for me. I'd avoid non-trivial constructs with
>> no performance gain. And again, implicit vs explicit.
>>
>>>
>>> but some may prefer the more verbose thing.
>>> if so, at least do away with the && return || return.
>>> shell function return value is the "exit status" of the last command
>>> executed, so [[ $stp = 1 ]] is sufficient.
>>
>> But more error-prone on subsequent code modifications.
>>
>>>
>>>> }
>>>>
>>>> bridge_get_root_ports() {
>>>>     local bridge=$1
>>>>     local root_id
>>>>     local root_ports=""
>>>>     local bridge_id
>>>>
>>>>     read root_id < "/sys/class/net/${bridge}/bridge/root_id"
>>>>
>>>>     for port in /sys/class/net/${bridge}/brif/* ; do
>>>>         read bridge_id < "${port}/designated_bridge"
>>>>         if [ "${bridge_id}" = "${root_id}" ] ; then
>>>>             root_ports="${root_ports} ${port##*/}"
>>>>         fi
>>>>     done
>>>
>>> Yes, I think here the more verbose style is appropriate ;-)
>>>
>>>>     echo "${root_ports# }"
>>
>> I'd leave this as-is. And, I remade this chunk of code, so this
>> construct is (probably) really needed now. Anyways, I hate implicit
>> assumptions.
>>
>>>
>>> Though this is not necessary: echo $root_ports would do this just fine.
>>> and, again, I'd prefer a non-local variable to pass the value
>>> over some $(subshell) thingy.
>>
>> Did it another way.
>>
>>>
>>>> }
>>>>
>>>> # From /inlude/linux/if_bridge.h:
>>>> #define BR_STATE_DISABLED 0
>>>> #define BR_STATE_LISTENING 1
>>>> #define BR_STATE_LEARNING 2
>>>> #define BR_STATE_FORWARDING 3
>>>> #define BR_STATE_BLOCKING 4
>>>>
>>>> bridge_get_active_ports() {
>>>>     local bridge=$1
>>>>     shift 1
>>>>     local ports="$*"
>>>>     local active_ports=""
>>>>     local port_state
>>>>     local stp_state=bridge_is_stp_enabled ${bridge}
>>>
>>> some double quote missing there?
>>
>> I should be temporarily insane while writing that ;)
>> Fixed.
>>
>>>
>>>>     local warn=0
>>>>
>>>>     if [ -z "${ports}" ] || [ "${ports}" = "detect" ] ; then
>>>>         ports=$( bridge_get_root_ports ${bridge} )
>>>
>>>      if you did non-local root_ports, this would become
>>>      bridge_get_root_ports $bridge
>>>      # root ports now in $root_ports,
>>>      # so you can assign port=$root_ports,
>>>      # or just use $root_ports as is.
>>
>> Rewrote this.
>>
>>>
>>>>     fi
>>>>
>>>>     for port in $ports ; do
>>>
>>> there. no need to trim white space anyways, as I thought.
>>
>> I'd leave it.
>>
>>>
>>>>         if [ ! -e "/sys/class/net/${bridge}/brif/${port}" ] ; then
>>>>             ocf_log warning "Port ${port} doesn't belong to bridge ${bridge}"
>>>>             continue
>>>>         fi
>>>>         read port_state < "/sys/class/net/${bridge}/brif/${port}/state"
>>>>         if [ "${port_state}" = "3" ] ; then
>>>>             if [ -n "${active_ports}" ] && ${stp_state} ; then
>>>
>>> no need to re-check stp state for each iteration.
>>> do that once, and reference the result here.
>>
>> This nonsense is fixed.
>>
>>>
>>>>                 warn=1
>>>>             fi
>>>>             active_ports="${active_ports} ${port}"
>>>>         fi
>>>>     done
>>>>     if [ ${warn} -eq 1 ] ; then
>>>>         ocf_log warning "More then one upstream port in bridge '${bridge}' is in forwarding state while STP is enabled: ${active_ports}"
>>>>     fi
>>>>     echo "${active_ports# }"
>>>
>>> again, no need to trim white space, and I prefer non-local documented
>>> variables over $(subshell echo) style assignments.
>>>
>>>> }
>>>>
>>>> bridge_get_speed() {
>>>>     local $iface=$1
>>>>
>>>>     if ! iface_is_bridge ${iface} ; then
>>>>         echo 0
>>>>         return
>>>>     fi
>>>>
>>>>     local ports=$( bridge_get_active_ports ${iface} ${OCF_RESKEY_bridge_ports} )
>>
>> Still leaving this as subshell, will rethink later.
>>
>>>>     for port in ${ports} ; do
>>>>         : $(( aggregate_speed += $( iface_get_speed ${port} ) ))
>>>>     done
>>>>     echo ${aggregate_speed}
>>>
>>> This can be rewritten without subshells.
>>
>> Error-prone because of recursions.
>> I'd leave it as-is.
>>
>>>
>>> It could even be done without bashisms,
>>> (there is only one bashism in there, afaics)
>>> but who cares, we are bash anyways ;-)
>>>
>>>> }
>>>>
>>>> bond_get_slaves() {
>>>>     local iface=$1
>>>>     local slaves
>>>>     read slaves < "/sys/class/net/${iface}/bonding/slaves"
>>>>     echo ${slaves}
>>>
>>> if that is what you wanted, why not just say cat?
>>
>> It is slower.
>>
>>> or, drop this function completely, as it only adds noise,
>>> and later do
>>>      slaves=$(< /sys/class/net/${iface}/bonding/slaves)
>>> avoiding any subshells?
>>
>> Data access separation.
>>
>>>
>>>> }
>>>>
>>>> bond_get_active_iface() {
>>>>     local iface=$1
>>>>     local active
>>>>     read active < "/sys/class/net/${iface}/bonding/active_slave"
>>>>     echo ${active}
>>>> }
>>>
>>> similar.
>>>
>>>>
>>>> bond_is_balancing() {
>>>>     local iface=$1
>>>>     read mode mode_index < "/sys/class/net/${iface}/bonding/mode"
>>>
>>>
>>> You declare all variables in all your tiny to-be-used-as-subshell
>>> functions as local (needlessly, as they are visible in that subshell
>>> only anyways... but keep them local, we want to get rid of the subshell
>>> echo style assignments), but then here omit the local? how come?
>>
>> Having function variables local is simply a good habit I think. Like
>> having curly braces around one-statement blocks in C.
>> I rewrote some functions so they can be used without subshell at a cost
>> of additional variable.
>>
>>>
>>> not that it matters, really, they are used only here, anyways.
>>>
>>>>     case ${mode} in
>>>>         "balance-rr"|"balance-xor"|"802.3ad"|"balance-tlb"|"balance-alb")
>>>>             return 0
>>>>             ;;
>>>>         *)
>>>>             return 1
>>>>             ;;
>>>>     esac
>>>> }
>>>>
>>>> bond_get_speed() {
>>>>     local iface=$1
>>>>     local aggregate_speed=0
>>>>
>>>>     if ! iface_is_bond ${iface} ; then
>>>>         echo 0
>>>>         return
>>>>     fi
>>>>
>>>>     local slaves=$( bond_get_slaves ${iface} )
>>>>     if bond_is_balancing ${iface} ; then
>>>>         for slave in ${slaves} ; do
>>>>             : $(( aggregate_speed += $( iface_get_speed ${slave} ) ))
>>>>         done
>>>>         # Bonding is unable to get speed*n
>>>>         : $(( aggregate_speed = aggregate_speed*8/10 ))
>>>>     else
>>>>         : $(( aggregate_speed = $( iface_get_speed $( bond_get_active_iface ${iface} ) ) ))
>>>>     fi
>>>>     echo ${aggregate_speed}
>>>
>>> Again, I'd prefer this to be handled without subshells.
>>
>> Recursion.
>>
>>>
>>>> }
>>>>
>>>> update() {
>>>>     local speed=$( iface_get_speed ${OCF_RESKEY_iface} )
>>>>
>>>>     : $(( score = speed * ${OCF_RESKEY_weight_base} / 10 ))
>>>
>>> hmmm...
>>> score = speed * 10 / 10
>>> but I complained about that earlier already ;-)
>>>
>>>>     attrd_updater -n ${OCF_RESKEY_name} -v ${score} -d ${OCF_RESKEY_dampen} ${attrd_options}
>>>>     rc=$?
>>>>     case ${rc} in
>>>>         0)
>>>>             ocf_is_true ${OCF_RESKEY_debug} && ocf_log debug "Updated ${OCF_RESKEY_name} = ${score}"
>>>>             ;;
>>>>         *)
>>>>             ocf_log warn "Could not update ${OCF_RESKEY_name} = ${score}: rc=${rc}"
>>>>             ;;
>>>>     esac
>>>>     return ${rc}
>>>
>>> all that fun just to ignore the return value of this function anyways?
>>
>> No. It was not ignored.
>> But I fixed implicit to explicit return in start().
>>
>>>
>>>
>>>> }
>>>>
>>>> if [ `uname` != "Linux" ] ; then
>>>>     ocf_log err "This RA works only on linux."
>>>>     exit $OCF_ERR_INSTALLED
>>>> fi
>>>>
>>>> if ! ocf_is_true ${OCF_RESKEY_CRM_meta_globally_unique} ; then
>>>>     : ${ha_pseudo_resource_name:="ifspeed-${OCF_RESKEY_name}"}
>>>> else
>>>>     : ${ha_pseudo_resource_name:="ifspeed-${OCF_RESOURCE_INSTANCE}"}
>>>> fi
>>>
>>> At least one of those clone possibilities is probably nonsense,
>>> but wellm, that's how this ha_pseudo_resource stuff works.
>>
>> Dunno, just blindly copied this.
>>
>>>
>>>> attrd_options='-q'
>>>> if ocf_is_true ${OCF_RESKEY_debug} ; then
>>>>     attrd_options=''
>>>> fi
>>>>
>>>> case $__OCF_ACTION in
>>>>     meta-data)
>>>>         meta_data
>>>>         exit $OCF_SUCCESS
>>>>         ;;
>>>>     start)
>>>>         start
>>>>         ;;
>>>>     stop)
>>>>         stop
>>>>         ;;
>>>>     monitor)
>>>>         monitor
>>>>         ;;
>>>>     reload)
>>>>         start
>>>>         ;;
>>>>     validate-all)
>>>>         validate
>>>>         ;;
>>>>     usage|help)
>>>>         usage
>>>>         exit $OCF_SUCCESS
>>>>         ;;
>>>>     *)
>>>>         usage
>>>>         exit $OCF_ERR_UNIMPLEMENTED
>>>>         ;;
>>>> esac
>>>> exit $?
>>>
>>> The $? is not necessary there.
>>
>> I prefer to have it explicitly.
>>
>>>
>>>
>>>
>>> Good job!
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>