[ClusterLabs] ocf:heartbeat:pgsql not starting
Darren Kinley
dkinley at mdacorporation.com
Thu Aug 11 21:44:03 UTC 2016
Hi,
I have PostgreSQL 9.3 replicated and I'm trying to put it under Pacemaker control
using ocf:heartbeat:pgsql provided by SLES12SP1.
This is the crmsh script that I used to configure Pacemaker.
configure cib new pgsql_cfg --force
configure primitive res-ars-pgsql ocf:heartbeat:pgsql \
pgctl="/usr/lib/postgresql93/bin/pg_ctl" \
psql="/usr/lib/postgresql93/bin/psql" \
pgdata="/var/lib/pgsql/data/" \
rep_mode="sync" \
node_list="ars1 ars2" \
restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \
primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \
master_ip="192.168.244.223" \
restart_on_promote='true' \
pghost="191.168.244.223" \
repuser="postgres" \
check_wal_receiver='true' \
monitor_user='postgres' \
monitor_password='xxx' \
op start timeout="120s" interval="0s" on-fail="restart" \
op monitor timeout="120s" interval="4s" on-fail="restart" \
op monitor timeout="120s" interval="3s" on-fail="restart" role="Master" \
op promote timeout="120s" interval="0s" on-fail="restart" \
op demote timeout="120s" interval="0s" on-fail="stop" \
op stop timeout="120s" interval="0s" on-fail="block" \
op notify timeout="90s" interval="0s"
configure ms ms-ars-pgsql res-ars-pgsql \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
configure colocation col-ars-pgsql-with-drbd inf: ms-ars-pgsql:Master ms-ars-drbd:Master
configure cib commit pgsql_cfg
I have a ~postgres/.pgpass
My nodes remain stopped and only once during the 12 hours I've been working on this
did both nodes try to bring up PG (both in recovery mode) before shutting them both down.
When running ocf-tester I think that I'm to name the master/slave resource.
ars2:/usr/lib/ocf/resource.d/heartbeat # ocf-tester -v -n ms-ars-pgsql `pwd`/pgsql
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/pgsql...
Testing permissions with uid nobody
Testing: meta-data
Testing: meta-data
...
<XML removed/>
...
Testing: validate-all
Checking current state
Testing: stop
INFO: waiting for server to shut down.... done server stopped
INFO: PostgreSQL is down
Testing: monitor
INFO: PostgreSQL is down
Testing: monitor
ocf-exit-reason:Setup problem: couldn't find command: /usr/bin/pg_ctl
Testing: start
INFO: server starting
INFO: PostgreSQL start command sent.
INFO: PostgreSQL is started.
Testing: monitor
Testing: monitor
INFO: Don't check /var/lib/pgsql/data during probe
Testing: notify
Checking for demote action
ocf-exit-reason:Not in a replication mode.
Checking for promote action
ocf-exit-reason:Not in a replication mode.
Testing: demotion of started resource
ocf-exit-reason:Not in a replication mode.
* rc=6: Demoting a start resource should not fail
Testing: promote
ocf-exit-reason:Not in a replication mode.
* rc=6: Promote failed
Testing: demote
ocf-exit-reason:Not in a replication mode.
* rc=6: Demote failed
Aborting tests
'Not in a replication mode' disagrees with the res-ars-pgsql above.
I'm not sure that the pacemaker.log for CIB changes is needed.
Aug 11 09:19:53 [2757] ars2 pengine: info: clone_print: Master/Slave Set: ms-ars-pgsql [res-ars-pgsql]
Aug 11 09:19:53 [2757] ars2 pengine: info: short_print: Stopped: [ ars1 ars2 ]
Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full: res-ars-pgsql:0 has failed INFINITY times on ars1
Aug 11 09:19:53 [2757] ars2 pengine: warning: common_apply_stickiness: Forcing ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)
Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full: ms-ars-pgsql has failed INFINITY times on ars1
Aug 11 09:19:53 [2757] ars2 pengine: warning: common_apply_stickiness: Forcing ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)
Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full: res-ars-pgsql:0 has failed INFINITY times on ars2
Aug 11 09:19:53 [2757] ars2 pengine: warning: common_apply_stickiness: Forcing ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)
Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full: ms-ars-pgsql has failed INFINITY times on ars2
Aug 11 09:19:53 [2757] ars2 pengine: warning: common_apply_stickiness: Forcing ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)
Aug 11 09:19:53 [2757] ars2 pengine: info: rsc_merge_weights: ms-ars-drbd: Rolling back scores from ms-ars-pgsql
Aug 11 09:19:53 [2757] ars2 pengine: info: master_color: Promoting res-ars-drbd:1 (Master ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: master_color: ms-ars-drbd: Promoted 1 instances of a possible 1 to master
Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: res-ars-pgsql:0: Rolling back scores from ms-ars-drbd
Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: Resource res-ars-pgsql:0 cannot run anywhere
Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: res-ars-pgsql:1: Rolling back scores from ms-ars-drbd
Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: Resource res-ars-pgsql:1 cannot run anywhere
Aug 11 09:19:53 [2757] ars2 pengine: info: master_color: ms-ars-pgsql: Promoted 0 instances of a possible 1 to master
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-mgmt-vip (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-mgmt-app (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-vip (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-drbd:0 (Slave ars1)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-drbd:1 (Master ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-lvm (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-fs_dropbox (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-fs_svndata (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-pgsql:0 (Stopped)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-pgsql:1 (Stopped)
Aug 11 09:19:53 [2758] ars2 crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Aug 11 09:19:53 [2758] ars2 crmd: notice: do_te_invoke: Processing graph 222 (ref=pe_calc-dc-1470932393-1349) derived from /var/lib/pacemaker/pengine/pe-input-625.bz2
and /var/log/messages
2016-08-11T09:19:53.146603-07:00 ars-2 crmd[2758]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
2016-08-11T09:19:53.152322-07:00 ars-2 pengine[2757]: notice: On loss of CCM Quorum: Ignore
2016-08-11T09:19:53.153078-07:00 ars-2 pengine[2757]: warning: Forcing ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)
2016-08-11T09:19:53.153266-07:00 ars-2 pengine[2757]: warning: Forcing ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)
2016-08-11T09:19:53.153395-07:00 ars-2 pengine[2757]: warning: Forcing ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)
2016-08-11T09:19:53.153547-07:00 ars-2 pengine[2757]: warning: Forcing ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)
2016-08-11T09:19:53.155568-07:00 ars-2 crmd[2758]: notice: Processing graph 222 (ref=pe_calc-dc-1470932393-1349) derived from /var/lib/pacemaker/pengine/pe-input-625.bz2
2016-08-11T09:19:53.155768-07:00 ars-2 pengine[2757]: notice: Calculated Transition 222: /var/lib/pacemaker/pengine/pe-input-625.bz2
2016-08-11T09:19:53.155927-07:00 ars-2 crmd[2758]: notice: Transition 222 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-625.bz2): Complete
2016-08-11T09:19:53.156085-07:00 ars-2 crmd[2758]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Can anyone provide thoughs on how to debug this?
Should I give up with the SLES provided RA and use PAF instead?
Thanks,
Darren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160811/b24c5ebf/attachment-0003.html>
More information about the Users
mailing list