[Pacemaker] Help with config please

Wed Jul 9 08:17:46 UTC 2014

Please ignore some of my assumptions are wrong found some more info :)


> -----Original Message-----
> From: Alex Samad - Yieldbroker [mailto:Alex.Samad at yieldbroker.com]
> Sent: Wednesday, 9 July 2014 6:11 PM
> To: pacemaker at oss.clusterlabs.org
> Subject: [Pacemaker] Help with config please
> 
> Hi
> 
> Config pacemaker on centos 6.5
> pacemaker-cli-1.1.10-14.el6_5.3.x86_64
> pacemaker-1.1.10-14.el6_5.3.x86_64
> pacemaker-libs-1.1.10-14.el6_5.3.x86_64
> pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64
> 
> this is my config
> Cluster Name: ybrp
> Corosync Nodes:
> 
> Pacemaker Nodes:
>  devrp1 devrp2
> 
> Resources:
>  Resource: ybrpip (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=10.172.214.50 cidr_netmask=24 nic=eth0
> clusterip_hash=sourceip-sourceport
>   Meta Attrs: stickiness=0,migration-threshold=3,failure-timeout=600s
>   Operations: monitor on-fail=restart interval=5s timeout=20s (ybrpip-
> monitor-interval-5s)
>  Clone: ybrpstat-clone
>   Meta Attrs: globally-unique=false clone-max=2 clone-node-max=1
>   Resource: ybrpstat (class=ocf provider=yb type=proxy)
>    Operations: monitor on-fail=restart interval=5s timeout=20s (ybrpstat-
> monitor-interval-5s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
>   start ybrpstat-clone then start ybrpip (Mandatory) (id:order-ybrpstat-clone-
> ybrpip-mandatory)
> Colocation Constraints:
>   ybrpip with ybrpstat-clone (INFINITY) (id:colocation-ybrpip-ybrpstat-clone-
> INFINITY)
> 
> Cluster Properties:
>  cluster-infrastructure: cman
>  dc-version: 1.1.10-14.el6_5.3-368c726
>  last-lrm-refresh: 1404892739
>  no-quorum-policy: ignore
>  stonith-enabled: false
> 
> 
> I have my own resource file and I start stop the proxy service outside of
> pacemaker!
> 
> I had an interesting problem, where I did a vmware update on the linux box,
> which interrupted network activity.
> 
> Part of my monitor function on my script is to 1) test if the proxy process is
> running, 2) get a status page from the proxy and confirm it is 200
> 
> 
> This is what I got in /var/log/messages
> 
> Jul  9 06:16:13 devrp1 crmd[6849]:  warning: update_failcount: Updating
> failcount for ybrpstat on devrp2 after failed monitor: rc=7 (update
> =value++, time=1404850573)
> Jul  9 06:16:13 devrp1 crmd[6849]:   notice: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_
> INTERNAL origin=abort_transition_graph ]
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: unpack_config: On loss of CCM
> Quorum: Ignore
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing
> failed op monitor for ybrpstat:0 on devrp2: not running (7)
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing
> failed op start for ybrpstat:1 on devrp1: unknown error (1)
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness:
> Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma
> x=1000000)
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness:
> Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma
> x=1000000)
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Restart
> ybrpip#011(Started devrp2)
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Recover
> ybrpstat:0#011(Started devrp2)
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: process_pe_message:
> Calculated Transition 1054: /var/lib/pacemaker/pengine/pe-input-235.bz2
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: unpack_config: On loss of CCM
> Quorum: Ignore
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing
> failed op monitor for ybrpstat:0 on devrp2: not running (7)
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing
> failed op start for ybrpstat:1 on devrp1: unknown error (1)
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness:
> Forcing ybrpstat-clone away from devrp1 after 1000000 failures
> (max=1000000)
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness:
> Forcing ybrpstat-clone away from devrp1 after 1000000 failures
> (max=1000000)
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Restart
> ybrpip#011(Started devrp2)
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Recover
> ybrpstat:0#011(Started devrp2)
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: process_pe_message:
> Calculated Transition 1055: /var/lib/pacemaker/pengine/pe-input-236.bz2
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: unpack_config: On loss of CCM
> Quorum: Ignore
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing
> failed op monitor for ybrpstat:0 on devrp2: not running (7)
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing
> failed op start for ybrpstat:1 on devrp1: unknown error (1)
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness:
> Forcing ybrpstat-clone away from devrp1 after 1000000 failures
> (max=1000000)
> Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness:
> Forcing ybrpstat-clone away from devrp1 after 1000000 failures
> (max=1000000)
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Restart
> ybrpip#011(Started devrp2)
> Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Recover
> ybrpstat:0#011(Started devrp2)
> 
> 
> And it stay this way for the next 12 hours, until I got on.
> 
> I poked around and to fix it I ran this
>         /usr/sbin/pcs resource cleanup ybrpip
>         /usr/sbin/pcs resource cleanup ybrpstat
> 
> Bascially I cleaned up the errors and off it went all by itself.
> 
> So my question is how do I configure it or what do I need to change in the
> resource script file to send a temp error back to pacemaker so that it should
> have kept trying to check the status of proxy ?
> 
> It seems to me it tried once and then failed... although the log says filed after
> 1000000 failures ....  how can I change that to infinite and where is the
> interval setting for this, cause in the config above it looks to me like it should
> be infinite ?
> 
> 
> Thanks
> Alex
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org