[Pacemaker] Help with config please
Alex Samad - Yieldbroker
Alex.Samad at yieldbroker.com
Wed Jul 9 08:10:45 UTC 2014
Hi
Config pacemaker on centos 6.5
pacemaker-cli-1.1.10-14.el6_5.3.x86_64
pacemaker-1.1.10-14.el6_5.3.x86_64
pacemaker-libs-1.1.10-14.el6_5.3.x86_64
pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64
this is my config
Cluster Name: ybrp
Corosync Nodes:
Pacemaker Nodes:
devrp1 devrp2
Resources:
Resource: ybrpip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.172.214.50 cidr_netmask=24 nic=eth0 clusterip_hash=sourceip-sourceport
Meta Attrs: stickiness=0,migration-threshold=3,failure-timeout=600s
Operations: monitor on-fail=restart interval=5s timeout=20s (ybrpip-monitor-interval-5s)
Clone: ybrpstat-clone
Meta Attrs: globally-unique=false clone-max=2 clone-node-max=1
Resource: ybrpstat (class=ocf provider=yb type=proxy)
Operations: monitor on-fail=restart interval=5s timeout=20s (ybrpstat-monitor-interval-5s)
Stonith Devices:
Fencing Levels:
Location Constraints:
Ordering Constraints:
start ybrpstat-clone then start ybrpip (Mandatory) (id:order-ybrpstat-clone-ybrpip-mandatory)
Colocation Constraints:
ybrpip with ybrpstat-clone (INFINITY) (id:colocation-ybrpip-ybrpstat-clone-INFINITY)
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.10-14.el6_5.3-368c726
last-lrm-refresh: 1404892739
no-quorum-policy: ignore
stonith-enabled: false
I have my own resource file and I start stop the proxy service outside of pacemaker!
I had an interesting problem, where I did a vmware update on the linux box, which interrupted network activity.
Part of my monitor function on my script is to 1) test if the proxy process is running, 2) get a status page from the proxy and confirm it is 200
This is what I got in /var/log/messages
Jul 9 06:16:13 devrp1 crmd[6849]: warning: update_failcount: Updating failcount for ybrpstat on devrp2 after failed monitor: rc=7 (update
=value++, time=1404850573)
Jul 9 06:16:13 devrp1 crmd[6849]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_
INTERNAL origin=abort_transition_graph ]
Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM Quorum: Ignore
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op monitor for ybrpstat:0 on devrp2: not running (7)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op start for ybrpstat:1 on devrp1: unknown error (1)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma
x=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma
x=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart ybrpip#011(Started devrp2)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover ybrpstat:0#011(Started devrp2)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: process_pe_message: Calculated Transition 1054: /var/lib/pacemaker/pengine/pe-input-235.bz2
Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM Quorum: Ignore
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op monitor for ybrpstat:0 on devrp2: not running (7)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op start for ybrpstat:1 on devrp1: unknown error (1)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart ybrpip#011(Started devrp2)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover ybrpstat:0#011(Started devrp2)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: process_pe_message: Calculated Transition 1055: /var/lib/pacemaker/pengine/pe-input-236.bz2
Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM Quorum: Ignore
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op monitor for ybrpstat:0 on devrp2: not running (7)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op start for ybrpstat:1 on devrp1: unknown error (1)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart ybrpip#011(Started devrp2)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover ybrpstat:0#011(Started devrp2)
And it stay this way for the next 12 hours, until I got on.
I poked around and to fix it I ran this
/usr/sbin/pcs resource cleanup ybrpip
/usr/sbin/pcs resource cleanup ybrpstat
Bascially I cleaned up the errors and off it went all by itself.
So my question is how do I configure it or what do I need to change in the resource script file to send a temp error back to pacemaker so that it should have kept trying to check the status of proxy ?
It seems to me it tried once and then failed... although the log says filed after 1000000 failures .... how can I change that to infinite and where is the interval setting for this, cause in the config above it looks to me like it should be infinite ?
Thanks
Alex
More information about the Pacemaker
mailing list