[ClusterLabs] custom resource agent FAILED (blocked)

Bishoy Mikhael b.s.mikhael at gmail.com
Thu Apr 12 17:38:21 EDT 2018


Hi All,

I'm trying to create a resource agent to promote a standby HDFS namenode to
active when the virtual IP failover to another node.

I've taken the skeleton from the Dummy OCF agent.

The modifications I've done to the Dummy agent are as follows:

HDFSHA_start() {
    HDFSHA_monitor
    if [ $? =  $OCF_SUCCESS ]; then
/opt/hadoop/sbin/hdfs-ha.sh start
return $OCF_SUCCESS
    fi
}

HDFSHA_stop() {
    HDFSHA_monitor
    if [ $? =  $OCF_SUCCESS ]; then
/opt/hadoop/sbin/hdfs-ha.sh stop
    fi
    return $OCF_SUCCESS
}

HDFSHA_monitor() {
# Monitor _MUST!_ differentiate correctly between running
# (SUCCESS), failed (ERROR) or _cleanly_ stopped (NOT RUNNING).
# That is THREE states, not just yes/no.
active_nn=$(hdfs haadmin -getAllServiceState | grep active | cut -d":" -f 1)
current_node=$(uname -n)
if [[ ${active_nn} == ${current_node} ]]; then
   return $OCF_SUCCESS
fi
}

HDFSHA_validate() {

    return $OCF_SUCCESS
}


I've created the resource as follows:

# pcs resource create hdfs-ha ocf:heartbeat:HDFSHA op monitor interval=30s


The resource fails right away as follows:


# pcs status

Cluster name: hdfs_cluster

Stack: corosync

Current DC: taulog (version 1.1.16-12.el7_4.8-94ff4df) - partition with
quorum

Last updated: Thu Apr 12 03:30:57 2018

Last change: Thu Apr 12 03:30:54 2018 by root via cibadmin on lingcod


3 nodes configured

2 resources configured


Online: [ dentex lingcod taulog ]


Full list of resources:


 VirtualIP (ocf::heartbeat:IPaddr2): Started taulog

 hdfs-ha (ocf::heartbeat:HDFSHA): FAILED (blocked)[ taulog dentex ]


Failed Actions:

* hdfs-ha_stop_0 on taulog 'insufficient privileges' (4): call=12,
status=complete, exitreason='none',

    last-rc-change='Thu Apr 12 03:17:37 2018', queued=0ms, exec=1ms

* hdfs-ha_stop_0 on dentex 'insufficient privileges' (4): call=10,
status=complete, exitreason='none',

    last-rc-change='Thu Apr 12 03:17:43 2018', queued=0ms, exec=1ms



Daemon Status:

  corosync: active/enabled

  pacemaker: active/enabled

  pcsd: active/enabled

I debug the resource as follows, and it returns 0

# pcs resource debug-monitor hdfs-ha

Operation monitor for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0

 >  stderr: DEBUG: hdfs-ha monitor : 0


# pcs resource debug-stop hdfs-ha

Operation stop for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0

 >  stderr: DEBUG: hdfs-ha stop : 0


# pcs resource debug-start hdfs-ha

Operation start for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0

 >  stderr: DEBUG: hdfs-ha start : 0



I don't understand what am I doing wrong!


Regards,

Bishoy Mikhael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180412/6795728d/attachment.html>


More information about the Users mailing list