[Pacemaker] Restart of resources
Frank Brendel
Frank.Brendel at eurolog.com
Tue Jan 28 13:44:22 UTC 2014
No one with an idea?
Or can someone tell me if it is even possible?
Thanks
Frank
Am 23.01.2014 10:50, schrieb Frank Brendel:
> Hi list,
>
> I have some trouble configuring a resource that is allowed to fail
> once in two minutes.
> The documentation states that I have to configure migration-threshold
> and failure-timeout to achieve this.
> Here is the configuration for the resource.
>
> # pcs config
> Cluster Name: mycluster
> Corosync Nodes:
>
> Pacemaker Nodes:
> Node1 Node2 Node3
>
> Resources:
> Clone: resClamd-clone
> Meta Attrs: clone-max=3 clone-node-max=1 interleave=true
> Resource: resClamd (class=lsb type=clamd)
> Meta Attrs: failure-timeout=120s migration-threshold=2
> Operations: monitor on-fail=restart interval=60s
> (resClamd-monitor-on-fail-restart)
>
> Stonith Devices:
> Fencing Levels:
>
> Location Constraints:
> Ordering Constraints:
> Colocation Constraints:
>
> Cluster Properties:
> cluster-infrastructure: cman
> dc-version: 1.1.10-14.el6_5.1-368c726
> last-lrm-refresh: 1390468150
> stonith-enabled: false
>
> # pcs resource defaults
> resource-stickiness: INFINITY
>
> # pcs status
> Cluster name: mycluster
> Last updated: Thu Jan 23 10:12:49 2014
> Last change: Thu Jan 23 10:11:40 2014 via cibadmin on Node2
> Stack: cman
> Current DC: Node2 - partition with quorum
> Version: 1.1.10-14.el6_5.1-368c726
> 3 Nodes configured
> 3 Resources configured
>
>
> Online: [ Node1 Node2 Node3 ]
>
> Full list of resources:
>
> Clone Set: resClamd-clone [resClamd]
> Started: [ Node1 Node2 Node3 ]
>
>
> Stopping the clamd daemon sets the failcount to 1 and the daemon is
> started again. Ok.
>
>
> # service clamd stop
> Stopping Clam AntiVirus Daemon: [ OK ]
>
> /var/log/messages
> Jan 23 10:15:20 Node1 crmd[6075]: notice: process_lrm_event:
> Node1-resClamd_monitor_60000:305 [ clamd is stopped\n ]
> Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_cs_dispatch: Update
> relayed from Node2
> Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_trigger_update:
> Sending flush op to all hosts for: fail-count-resClamd (1)
> Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_perform_update:
> Sent update 177: fail-count-resClamd=1
> Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_cs_dispatch: Update
> relayed from Node2
> Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_trigger_update:
> Sending flush op to all hosts for: last-failure-resClamd (1390468520)
> Jan 23 10:15:20 Node1 attrd[6073]: notice: attrd_perform_update:
> Sent update 179: last-failure-resClamd=1390468520
> Jan 23 10:15:20 Node1 crmd[6075]: notice: process_lrm_event:
> Node1-resClamd_monitor_60000:305 [ clamd is stopped\n ]
> Jan 23 10:15:21 Node1 crmd[6075]: notice: process_lrm_event: LRM
> operation resClamd_stop_0 (call=310, rc=0, cib-update=110,
> confirmed=true) ok
> Jan 23 10:15:30 elmailtst1 crmd[6075]: notice: process_lrm_event:
> LRM operation resClamd_start_0 (call=314, rc=0, cib-update=111,
> confirmed=true) ok
> Jan 23 10:15:30 elmailtst1 crmd[6075]: notice: process_lrm_event:
> LRM operation resClamd_monitor_60000 (call=317, rc=0, cib-update=112,
> confirmed=false) ok
>
> # pcs status
> Cluster name: mycluster
> Last updated: Thu Jan 23 10:16:48 2014
> Last change: Thu Jan 23 10:11:40 2014 via cibadmin on Node1
> Stack: cman
> Current DC: Node2 - partition with quorum
> Version: 1.1.10-14.el6_5.1-368c726
> 3 Nodes configured
> 3 Resources configured
>
>
> Online: [ Node1 Node2 Node3 ]
>
> Full list of resources:
>
> Clone Set: resClamd-clone [resClamd]
> Started: [ Node1 Node2 Node3 ]
>
> Failed actions:
> resClamd_monitor_60000 on Node1 'not running' (7): call=305,
> status=complete, last-rc-change='Thu Jan 23 10:15:20 2014',
> queued=0ms, exec=0ms
>
> # pcs resource failcount show resClamd
> Failcounts for resClamd
> Node1: 1
>
>
> After 7 Minutes I let it fail again and as I understood it should be
> started as well. But it doesn't.
>
>
> # service clamd stop
> Stopping Clam AntiVirus Daemon: [ OK ]
>
> Jan 23 10:22:30 Node1 crmd[6075]: notice: process_lrm_event: LRM
> operation resClamd_monitor_60000 (call=317, rc=7, cib-update=113,
> confirmed=false) not running
> Jan 23 10:22:30 Node1 crmd[6075]: notice: process_lrm_event:
> Node1-resClamd_monitor_60000:317 [ clamd is stopped\n ]
> Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_cs_dispatch: Update
> relayed from Node2
> Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_trigger_update:
> Sending flush op to all hosts for: fail-count-resClamd (2)
> Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_perform_update:
> Sent update 181: fail-count-resClamd=2
> Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_cs_dispatch: Update
> relayed from Node2
> Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_trigger_update:
> Sending flush op to all hosts for: last-failure-resClamd (1390468950)
> Jan 23 10:22:30 Node1 attrd[6073]: notice: attrd_perform_update:
> Sent update 183: last-failure-resClamd=1390468950
> Jan 23 10:22:30 Node1 crmd[6075]: notice: process_lrm_event:
> Node1-resClamd_monitor_60000:317 [ clamd is stopped\n ]
> Jan 23 10:22:30 Node1 crmd[6075]: notice: process_lrm_event: LRM
> operation resClamd_stop_0 (call=322, rc=0, cib-update=114,
> confirmed=true) ok
>
> # pcs status
> Cluster name: mycluster
> Last updated: Thu Jan 23 10:22:41 2014
> Last change: Thu Jan 23 10:11:40 2014 via cibadmin on Node1
> Stack: cman
> Current DC: Node2 - partition with quorum
> Version: 1.1.10-14.el6_5.1-368c726
> 3 Nodes configured
> 3 Resources configured
>
>
> Online: [ Node1 Node2 Node3 ]
>
> Full list of resources:
>
> Clone Set: resClamd-clone [resClamd]
> Started: [ Node2 Node3 ]
> Stopped: [ Node1 ]
>
> Failed actions:
> resClamd_monitor_60000 on Node1 'not running' (7): call=317,
> status=complete, last-rc-change='Thu Jan 23 10:22:30 2014',
> queued=0ms, exec=0ms
>
>
> What's wrong with my configuration?
>
>
> Thanks in advance
> Frank
>
More information about the Pacemaker
mailing list