[ClusterLabs] custom resource agent FAILED (blocked)

Fri Apr 13 01:29:40 EDT 2018

the start function, need to start the resource when monitor doesn't return
success

2018-04-12 23:38 GMT+02:00 Bishoy Mikhael <b.s.mikhael at gmail.com>:

> Hi All,
>
> I'm trying to create a resource agent to promote a standby HDFS namenode
> to active when the virtual IP failover to another node.
>
> I've taken the skeleton from the Dummy OCF agent.
>
> The modifications I've done to the Dummy agent are as follows:
>
> HDFSHA_start() {
>     HDFSHA_monitor
>     if [ $? =  $OCF_SUCCESS ]; then
> /opt/hadoop/sbin/hdfs-ha.sh start
> return $OCF_SUCCESS
>     fi
> }
>
> HDFSHA_stop() {
>     HDFSHA_monitor
>     if [ $? =  $OCF_SUCCESS ]; then
> /opt/hadoop/sbin/hdfs-ha.sh stop
>     fi
>     return $OCF_SUCCESS
> }
>
> HDFSHA_monitor() {
> # Monitor _MUST!_ differentiate correctly between running
> # (SUCCESS), failed (ERROR) or _cleanly_ stopped (NOT RUNNING).
> # That is THREE states, not just yes/no.
> active_nn=$(hdfs haadmin -getAllServiceState | grep active | cut -d":" -f
> 1)
> current_node=$(uname -n)
> if [[ ${active_nn} == ${current_node} ]]; then
>    return $OCF_SUCCESS
> fi
> }
>
> HDFSHA_validate() {
>
>     return $OCF_SUCCESS
> }
>
>
> I've created the resource as follows:
>
> # pcs resource create hdfs-ha ocf:heartbeat:HDFSHA op monitor interval=30s
>
>
> The resource fails right away as follows:
>
>
> # pcs status
>
> Cluster name: hdfs_cluster
>
> Stack: corosync
>
> Current DC: taulog (version 1.1.16-12.el7_4.8-94ff4df) - partition with
> quorum
>
> Last updated: Thu Apr 12 03:30:57 2018
>
> Last change: Thu Apr 12 03:30:54 2018 by root via cibadmin on lingcod
>
>
> 3 nodes configured
>
> 2 resources configured
>
>
> Online: [ dentex lingcod taulog ]
>
>
> Full list of resources:
>
>
>  VirtualIP (ocf::heartbeat:IPaddr2): Started taulog
>
>  hdfs-ha (ocf::heartbeat:HDFSHA): FAILED (blocked)[ taulog dentex ]
>
>
> Failed Actions:
>
> * hdfs-ha_stop_0 on taulog 'insufficient privileges' (4): call=12,
> status=complete, exitreason='none',
>
>     last-rc-change='Thu Apr 12 03:17:37 2018', queued=0ms, exec=1ms
>
> * hdfs-ha_stop_0 on dentex 'insufficient privileges' (4): call=10,
> status=complete, exitreason='none',
>
>     last-rc-change='Thu Apr 12 03:17:43 2018', queued=0ms, exec=1ms
>
>
>
> Daemon Status:
>
>   corosync: active/enabled
>
>   pacemaker: active/enabled
>
>   pcsd: active/enabled
>
> I debug the resource as follows, and it returns 0
>
> # pcs resource debug-monitor hdfs-ha
>
> Operation monitor for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0
>
>  >  stderr: DEBUG: hdfs-ha monitor : 0
>
>
> # pcs resource debug-stop hdfs-ha
>
> Operation stop for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0
>
>  >  stderr: DEBUG: hdfs-ha stop : 0
>
>
> # pcs resource debug-start hdfs-ha
>
> Operation start for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0
>
>  >  stderr: DEBUG: hdfs-ha start : 0
>
>
>
> I don't understand what am I doing wrong!
>
>
> Regards,
>
> Bishoy Mikhael
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180413/655ff3d7/attachment-0002.html>