[Pacemaker] Postgres RA won't start
Amar Prasovic
amar at linux.org.ba
Tue Oct 11 14:10:24 UTC 2011
Hello everyone,
I tried to configure postgres RA and I ran into some problems.
I configured several resources in my cluster config where pgsql was set to
run last, after DRBD, Filesystem, IPAddr2 and nginx.
Here is how it looks like in crm configure:
crm(live)configure# show
node webnode01 \
attributes standby="off"
node webnode02 \
attributes standby="off"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.10.80" cidr_netmask="32" \
op monitor interval="30s"
primitive drbd_res ocf:linbit:drbd \
params drbd_resource="yorxs" \
op monitor interval="60s" \
op start interval="0s" timeout="240s" \
op stop interval="0s" timeout="100s"
primitive fs_res ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/srv" fstype="ext4" \
op start interval="0s" timeout="60s" \
op stop interval="0s" timeout="60s" \
op monitor interval="60s" timeout="40s"
primitive nginx_res ocf:heartbeat:nginx \
params configfile="/etc/nginx/nginx.conf"
httpd="/usr/local/sbin/nginx" status10url="http:/127.0.0.1" \
op monitor interval="10s" timeout="30s" \
op start interval="0" timeout="40s" \
op stop interval="0" timeout="60s"
primitive postgres_res ocf:heartbeat:pgsql \
params psql="/bin/psql" pgdata="/var/lib/postgres/8.4/main"
logfile="/var/log/postgres/postgres.log" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="30s" timeout="30s"
group cluster_1 fs_res ClusterIP nginx_res postgres_res
ms drbd_cluster drbd_res \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location prefer_webnode01 cluster_1 50: webnode01
location prefer_webnode01_drbd drbd_cluster 50: webnode01
colocation cluster_1_on_drbd inf: cluster_1 drbd_cluster:Master
order cluster_1_after_drbd inf: drbd_cluster:promote cluster_1:start
property $id="cib-bootstrap-options" \
dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1318326771"
However, when I run this config, everything except for pgsql starts without
problems. For pgsql, I got the following error:
in crm_mon
Online: [ webnode02 webnode01 ]
Master/Slave Set: drbd_cluster
Masters: [ webnode01 ]
Slaves: [ webnode02 ]
Resource Group: cluster_1
fs_res (ocf::heartbeat:Filesystem): Started webnode01
ClusterIP (ocf::heartbeat:IPaddr2): Started webnode01
nginx_res (ocf::heartbeat:nginx): Started webnode01
postgres_res (ocf::heartbeat:pgsql): Stopped
Failed actions:
postgres_res_start_0 (node=webnode01, call=84, rc=5, status=complete):
not installed
postgres_res_start_0 (node=webnode02, call=66, rc=5, status=complete):
not installed
in /var/log/syslog
webnode01 log # cat syslog |grep postgres_res
Oct 11 11:39:34 webnode01 crmd: [921]: info: do_lrm_rsc_op: Performing
key=6:93:7:933bf2ab-00d0-435c-a24f-85897e0c9725 op=postgres_res_monitor_0 )
Oct 11 11:39:34 webnode01 lrmd: [914]: info: rsc:postgres_res:27: probe
Oct 11 11:39:34 webnode01 crmd: [921]: info: process_lrm_event: LRM
operation postgres_res_monitor_0 (call=27, rc=7, cib-update=36,
confirmed=true) not running
Oct 11 11:39:50 webnode01 crmd: [921]: info: do_lrm_rsc_op: Performing
key=39:96:0:933bf2ab-00d0-435c-a24f-85897e0c9725 op=postgres_res_start_0 )
Oct 11 11:39:50 webnode01 lrmd: [914]: info: rsc:postgres_res:39: start
Oct 11 11:39:50 webnode01 crmd: [921]: info: process_lrm_event: LRM
operation postgres_res_start_0 (call=39, rc=5, cib-update=47,
confirmed=true) not installed
Oct 11 11:39:50 webnode01 attrd: [918]: info: find_hash_entry: Creating hash
entry for fail-count-postgres_res
Oct 11 11:39:50 webnode01 attrd: [918]: info: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-postgres_res (INFINITY)
Oct 11 11:39:50 webnode01 attrd: [918]: info: attrd_perform_update: Sent
update 63: fail-count-postgres_res=INFINITY
Oct 11 11:39:50 webnode01 attrd: [918]: info: find_hash_entry: Creating hash
entry for last-failure-postgres_res
Oct 11 11:39:50 webnode01 attrd: [918]: info: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-postgres_res (1318325990)
Oct 11 11:39:50 webnode01 attrd: [918]: info: attrd_perform_update: Sent
update 66: last-failure-postgres_res=1318325990
Oct 11 11:39:50 webnode01 crmd: [921]: info: do_lrm_rsc_op: Performing
key=4:97:0:933bf2ab-00d0-435c-a24f-85897e0c9725 op=postgres_res_stop_0 )
Oct 11 11:39:50 webnode01 lrmd: [914]: info: rsc:postgres_res:40: stop
Oct 11 11:39:50 webnode01 crmd: [921]: info: process_lrm_event: LRM
operation postgres_res_stop_0 (call=40, rc=0, cib-update=49, confirmed=true)
ok
Additional info:
/etc/postgresql, /etc/postgresql-common and /var/lib/postgresql are symlinks
on both nodes. Actual directories are on shared DRBD disk.
Postgres starts without any problems with init script. On both nodes.
Thanks a lot in advance for any advice.
--
Amar Prasovic
Gaißacher Straße 17
D - 81371 München
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111011/9786a733/attachment-0003.html>
More information about the Pacemaker
mailing list