[ClusterLabs] custom resource agent FAILED (blocked)
emmanuel segura
emi2fast at gmail.com
Fri Apr 13 01:29:40 EDT 2018
the start function, need to start the resource when monitor doesn't return
success
2018-04-12 23:38 GMT+02:00 Bishoy Mikhael <b.s.mikhael at gmail.com>:
> Hi All,
>
> I'm trying to create a resource agent to promote a standby HDFS namenode
> to active when the virtual IP failover to another node.
>
> I've taken the skeleton from the Dummy OCF agent.
>
> The modifications I've done to the Dummy agent are as follows:
>
> HDFSHA_start() {
> HDFSHA_monitor
> if [ $? = $OCF_SUCCESS ]; then
> /opt/hadoop/sbin/hdfs-ha.sh start
> return $OCF_SUCCESS
> fi
> }
>
> HDFSHA_stop() {
> HDFSHA_monitor
> if [ $? = $OCF_SUCCESS ]; then
> /opt/hadoop/sbin/hdfs-ha.sh stop
> fi
> return $OCF_SUCCESS
> }
>
> HDFSHA_monitor() {
> # Monitor _MUST!_ differentiate correctly between running
> # (SUCCESS), failed (ERROR) or _cleanly_ stopped (NOT RUNNING).
> # That is THREE states, not just yes/no.
> active_nn=$(hdfs haadmin -getAllServiceState | grep active | cut -d":" -f
> 1)
> current_node=$(uname -n)
> if [[ ${active_nn} == ${current_node} ]]; then
> return $OCF_SUCCESS
> fi
> }
>
> HDFSHA_validate() {
>
> return $OCF_SUCCESS
> }
>
>
> I've created the resource as follows:
>
> # pcs resource create hdfs-ha ocf:heartbeat:HDFSHA op monitor interval=30s
>
>
> The resource fails right away as follows:
>
>
> # pcs status
>
> Cluster name: hdfs_cluster
>
> Stack: corosync
>
> Current DC: taulog (version 1.1.16-12.el7_4.8-94ff4df) - partition with
> quorum
>
> Last updated: Thu Apr 12 03:30:57 2018
>
> Last change: Thu Apr 12 03:30:54 2018 by root via cibadmin on lingcod
>
>
> 3 nodes configured
>
> 2 resources configured
>
>
> Online: [ dentex lingcod taulog ]
>
>
> Full list of resources:
>
>
> VirtualIP (ocf::heartbeat:IPaddr2): Started taulog
>
> hdfs-ha (ocf::heartbeat:HDFSHA): FAILED (blocked)[ taulog dentex ]
>
>
> Failed Actions:
>
> * hdfs-ha_stop_0 on taulog 'insufficient privileges' (4): call=12,
> status=complete, exitreason='none',
>
> last-rc-change='Thu Apr 12 03:17:37 2018', queued=0ms, exec=1ms
>
> * hdfs-ha_stop_0 on dentex 'insufficient privileges' (4): call=10,
> status=complete, exitreason='none',
>
> last-rc-change='Thu Apr 12 03:17:43 2018', queued=0ms, exec=1ms
>
>
>
> Daemon Status:
>
> corosync: active/enabled
>
> pacemaker: active/enabled
>
> pcsd: active/enabled
>
> I debug the resource as follows, and it returns 0
>
> # pcs resource debug-monitor hdfs-ha
>
> Operation monitor for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0
>
> > stderr: DEBUG: hdfs-ha monitor : 0
>
>
> # pcs resource debug-stop hdfs-ha
>
> Operation stop for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0
>
> > stderr: DEBUG: hdfs-ha stop : 0
>
>
> # pcs resource debug-start hdfs-ha
>
> Operation start for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0
>
> > stderr: DEBUG: hdfs-ha start : 0
>
>
>
> I don't understand what am I doing wrong!
>
>
> Regards,
>
> Bishoy Mikhael
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
--
.~.
/V\
// \\
/( )\
^`~'^
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180413/655ff3d7/attachment-0002.html>
More information about the Users
mailing list