[Pacemaker] Issue with controling resource Oracle Database Express

Fri May 7 05:54:27 EDT 2010

Hi,

I finally make Pacemaker up and running on CentOS 5.4. Currently using
Heartbeat, but I want to switch to OpenAIS(Corosync). There were some
problems related to Python and XML, strace crm still try to open some
files which don't exist, I also did some symbolic links because of bad
paths. I will try to sumarize these problems in another thread. But now
both nodes of my Active/Passive cluster with shared storage running
OCFS2 as filesystem are online using following configuration.

[root at tidevfnkv1 python2.4]# cat /var/lib/heartbeat/crm/cib.xml
<cib validate-with="pacemaker-1.0" crm_feature_set="3.0.1"
have-quorum="1" dc-uuid="e7cf0526-5304-45f1-b9ee-0ee9fe69c834"
admin_epoch="0" epoch="29" num_updates="0" cib-last-written="Fri May  7
05:05:28 2010">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="Heartbeat"/>
        <nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node type="normal" uname="tidevfnkv2"
id="e7cf0526-5304-45f1-b9ee-0ee9fe69c834">
        <instance_attributes
id="nodes-e7cf0526-5304-45f1-b9ee-0ee9fe69c834">
          <nvpair name="standby"
id="nodes-e7cf0526-5304-45f1-b9ee-0ee9fe69c834-standby" value="on"/>
        </instance_attributes>
      </node>
      <node type="normal" uname="tidevfnkv1"
id="db65bdf6-ecd2-4bcb-9ef5-451681ec2906">
        <instance_attributes
id="nodes-db65bdf6-ecd2-4bcb-9ef5-451681ec2906">
          <nvpair name="standby"
id="nodes-db65bdf6-ecd2-4bcb-9ef5-451681ec2906-standby" value="off"/>
        </instance_attributes>
      </node>
    </nodes>
    <resources>
      <group id="ip_fnkv_cluster">
        <primitive class="ocf" id="failover-ip" provider="heartbeat"
type="IPaddr">
          <instance_attributes id="failover-ip-instance_attributes">
            <nvpair id="failover-ip-instance_attributes-ip" name="ip"
value="172.28.140.113"/>
          </instance_attributes>
          <operations>
            <op id="failover-ip-monitor-10s" interval="10s"
name="monitor"/>
          </operations>
        </primitive>
        <primitive class="lsb" id="failover-apache" type="httpd">
          <operations>
            <op id="failover-apache-monitor-15s" interval="15s"
name="monitor"/>
          </operations>
        </primitive>
      </group>
      <primitive class="ocf" id="pingd" provider="pacemaker"
type="pingd">
        <instance_attributes id="pingd-instance_attributes">
          <nvpair id="pingd-instance_attributes-host_list"
name="host_list" value="172.28.140.10"/>
          <nvpair id="pingd-instance_attributes-multiplier"
name="multiplier" value="100"/>
        </instance_attributes>
        <operations>
          <op id="pingd-monitor-15s" interval="15s" name="monitor"
timeout="5s"/>
        </operations>
      </primitive>
      <primitive class="ocf" id="failover-oracle" provider="heartbeat"
type="oracle">
        <instance_attributes id="failover-oracle-instance_attributes">
          <nvpair id="failover-oracle-instance_attributes-sid"
name="sid" value="XE"/>
          <nvpair id="failover-oracle-instance_attributes-home"
name="home"
value="/usr/lib/oracle/xe/app/oracle/product/10.2.0/server"/>
          <nvpair id="failover-oracle-instance_attributes-user"
name="user" value="oracle"/>
        </instance_attributes>
        <operations>
          <op id="failover-oracle-monitor-5s" interval="5s"
name="monitor" on-fail="restart" timeout="30s"/>
        </operations>
      </primitive>
    </resources>
    <constraints>
      <rsc_location id="ip_fnkv_cluster_on_connected_node"
rsc="ip_fnkv_cluster">
        <rule boolean-op="or"
id="ip_fnkv_cluster_on_connected_node-rule" score="-INFINITY">
          <expression attribute="pingd"
id="ip_fnkv_cluster_on_connected_node-expression"
operation="not_defined"/>
          <expression attribute="pind"
id="ip_fnkv_cluster_on_connected_node-expression-0" operation="lte"
value="0"/>
        </rule>
      </rsc_location>
    </constraints>
    <op_defaults/>
    <rsc_defaults/>
  </configuration>

Ok, output of crm_mon is following:
============
Last updated: Fri May  7 05:27:20 2010
Stack: Heartbeat
Current DC: tidevfnkv2 (e7cf0526-5304-45f1-b9ee-0ee9fe69c834) -
partition with quorum
Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
2 Nodes configured, unknown expected votes
3 Resources configured.
============

Online: [ tidevfnkv2 tidevfnkv1 ]

 Resource Group: ip_fnkv_cluster
     failover-ip        (ocf::heartbeat:IPaddr):        Started
tidevfnkv2
     failover-apache    (lsb:httpd):    Started tidevfnkv2
pingd   (ocf::pacemaker:pingd): Started tidevfnkv1

Failed actions:
    failover-oracle_start_0 (node=tidevfnkv2, call=18, rc=1,
status=complete): unknown error
    failover-oracle_monitor_5000 (node=tidevfnkv1, call=42, rc=7,
status=complete): not running
    failover-oracle_start_0 (node=tidevfnkv1, call=44, rc=1,
status=complete): unknown error

And there is problem with starting up the Oracle Database. I have to say
I selected free Express edition, It was not my decision to select this
type of db, but this is reality. It seems like the resource agent
related to oracle is not ready to use with Express edition, but only
with full version of database. There is also second resource agent for
oracle listener.

But, OraDB Express is installed with built-in scripts to start and stop
db. Each script start/stop both, the listener and instance.
I will provide here code of both scripts>

startdb.sh>
#!/bin/bash
#
#       svaggu 09/28/05 -  Creation
#	svaggu 11/09/05 -  dba groupd check is added
#

xsetroot -cursor_name watch
case $PATH in
    "") PATH=/bin:/usr/bin:/sbin:/etc
        export PATH ;;
esac

SAVE_LLP=$LD_LIBRARY_PATH

ORACLE_HOME=/usr/lib/oracle/xe/app/oracle/product/10.2.0/server
ORACLE_SID=XE
LSNR=$ORACLE_HOME/bin/lsnrctl
SQLPLUS=$ORACLE_HOME/bin/sqlplus
export ORACLE_HOME
export ORACLE_SID
LOG="$ORACLE_HOME_LISTNER/listener.log"
user=`/usr/bin/whoami`
group=`/usr/bin/groups $user | grep dba`
if test -z "$group"
then
	xterm -T "Warning" -n "Warning" -hold -e "echo Operation failed.
$user is not a member of \'dba\' group." 
else
# Starting Oracle Database 10g Express Edition instance and Listener
	$SQLPLUS -s /nolog @$ORACLE_HOME/config/scripts/startdb.sql >
/dev/null 2>&1
	if [ ! `ps -ef | grep tns | cut -f1 -d" " | grep -q oracle` ]
	then
		$LSNR start > /dev/null 2>&1
	else
		echo ""
	fi
fi
	xsetroot -cursor_name left_ptr

startdb.sql>
connect / as sysdba
startup
exit

stopdb.sh>
#!/bin/bash
#
#       svaggu 09/28/05 -  Creation
#       svaggu 11/09/05 -  dba groupd check is added
#

xsetroot -cursor_name watch

case $PATH in
    "") PATH=/bin:/usr/bin:/sbin:/etc
        export PATH ;;
esac

SAVE_LLP=$LD_LIBRARY_PATH

ORACLE_HOME=/usr/lib/oracle/xe/app/oracle/product/10.2.0/server
ORACLE_SID=XE
SQLPLUS=$ORACLE_HOME/bin/sqlplus
export ORACLE_HOME
export ORACLE_SID
user=`/usr/bin/whoami`
group=`/usr/bin/groups $user | grep dba`
if test -z "$group"
then
        xterm -T "Warning" -n "Warning" -hold -e "echo Operation failed.
$user is not a member of \'dba\' group." 
else
# Stop Oracle Database 10g Express Edition instance
	$SQLPLUS -s /nolog @$ORACLE_HOME/config/scripts/stopdb.sql >
/dev/null 2>&1
fi

xsetroot -cursor_name left_ptr

stopdb.sql>
connect / as sysdba
shutdown immediate
exit

Ok, then my resource agent is located at
/usr/lib/ocf/resource.d/heartbeat/oracle, and here is what I want to do.
I want to create my own resource agent, let me name it "oraclexe", and
here are my questions>
1.) Is it possible to create my own resource agent "oraclexe"(which will
start both listener and db instance) only with creating new shell file
in /usr/lib/ocf/resource.d/heartbeat/ directory?
2.) Is there a way to debug/trace resource agents in case the do not
work in expected way?
3.) Do you have another aproach or solution to my issue?

Thank you very much and anyway I have to say, that I went deeper into
documentation of Pacemaker, Corosync, OpenAIS, ClusterGlue, CRM and I
thing this is very good stuff. Thank you for your hard work.

Best regards,

Ladislav Jech