[Pacemaker] Problem with pingd.

Tue Feb 23 01:05:54 EST 2010

Hi Andrew,
I am adding the log messages which i get when I commit the crm configuration
and crm_verify  -LM output for your consideration. My crm configuration is
attached ..It is showing that the resources cannot run anywhere. What should
I do??

crm_verify -LV snippet
-------------------------------
root at node1:~# crm_verify -LV
crm_verify[10393]: 2010/02/23_11:27:44 WARN: native_color: Resource vir-ip
cannot run anywhere
crm_verify[10393]: 2010/02/23_11:27:44 WARN: native_color: Resource
slony-fail cannot run anywhere
crm_verify[10393]: 2010/02/23_11:27:44 WARN: native_color: Resource
slony-fail2 cannot run anywhere
Warnings found during check: config may not be valid
root at node1:~# crm_verify -LV
crm_verify[10760]: 2010/02/23_11:32:50 WARN: native_color: Resource vir-ip
cannot run anywhere
crm_verify[10760]: 2010/02/23_11:32:50 WARN: native_color: Resource
slony-fail cannot run anywhere
crm_verify[10760]: 2010/02/23_11:32:50 WARN: native_color: Resource
slony-fail2 cannot run anywhere
Warnings found during check: config may not be valid
--------------------------------------------------------------

Log snippet
-------------------------------------------------
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: - <cib
admin_epoch="0" epoch="285" num_updates="33" >
Feb 23 11:25:48 node1 crmd: [1629]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: -
<configuration >
Feb 23 11:25:48 node1 crmd: [1629]: info: need_abort: Aborting on change to
admin_epoch
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: -
<constraints >
Feb 23 11:25:48 node1 crmd: [1629]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: -
<rsc_location id="vir-ip-with-pingd" >
Feb 23 11:25:48 node1 crmd: [1629]: info: do_state_transition: All 2 cluster
nodes are eligible to run resources.
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff:
-         <rule score="-1000" id="vir-ip-with-pingd-rule" />
Feb 23 11:25:48 node1 crmd: [1629]: info: do_pe_invoke: Query 187:
Requesting the current CIB: S_POLICY_ENGINE
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: -
</rsc_location>
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: -
</constraints>
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: -
</configuration>
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: -
</cib>
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: + <cib
admin_epoch="0" epoch="286" num_updates="1" >
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: +
<configuration >
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: +
<constraints >
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: +
<rsc_location id="vir-ip-with-pingd" >
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff:
+         <rule score="-INFINITY" id="vir-ip-with-pingd-rule" />
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: +
</rsc_location>
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: +
</constraints>
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: +
</configuration>
Feb 23 11:25:48 node1 cib: [1625]: info: log_data_element: cib:diff: +
</cib>
Feb 23 11:25:48 node1 cib: [1625]: info: cib_process_request: Operation
complete: op cib_replace for section constraints (origin=local/cibadmin/2,
version=0.286.1): ok (rc=0)
Feb 23 11:25:48 node1 crmd: [1629]: info: do_pe_invoke_callback: Invoking
the PE: ref=pe_calc-dc-1266904548-176, seq=12, quorate=1
Feb 23 11:25:48 node1 pengine: [6277]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Feb 23 11:25:48 node1 pengine: [6277]: info: unpack_config: Node scores:
'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Feb 23 11:25:48 node1 pengine: [6277]: info: determine_online_status: Node
node2 is online
Feb 23 11:25:48 node1 pengine: [6277]: info: determine_online_status: Node
node1 is online
Feb 23 11:25:48 node1 pengine: [6277]: info: unpack_rsc_op:
slony-fail2_monitor_0 on node1 returned 0 (ok) instead of the expected
value: 7 (not running)
Feb 23 11:25:48 node1 pengine: [6277]: notice: unpack_rsc_op: Operation
slony-fail2_monitor_0 found resource slony-fail2 active on node1
Feb 23 11:25:48 node1 pengine: [6277]: info: unpack_rsc_op:
pgsql:1_monitor_0 on node1 returned 0 (ok) instead of the expected value: 7
(not running)
Feb 23 11:25:48 node1 pengine: [6277]: notice: unpack_rsc_op: Operation
pgsql:1_monitor_0 found resource pgsql:1 active on node1
Feb 23 11:25:48 node1 pengine: [6277]: notice: native_print: vir-ip
(ocf::heartbeat:IPaddr2):    Started node1
Feb 23 11:25:48 node1 pengine: [6277]: notice: native_print: slony-fail
(lsb:slony_failover):    Started node1
Feb 23 11:25:48 node1 pengine: [6277]: notice: clone_print: Clone Set:
pgclone
Feb 23 11:25:48 node1 pengine: [6277]: notice: print_list:     Started: [
node2 node1 ]
Feb 23 11:25:48 node1 pengine: [6277]: notice: native_print: slony-fail2
(lsb:slony_failover2):    Started node1
Feb 23 11:25:48 node1 pengine: [6277]: notice: clone_print: Clone Set:
pingclone
Feb 23 11:25:48 node1 pengine: [6277]: notice: print_list:     Started: [
node2 node1 ]
Feb 23 11:25:48 node1 pengine: [6277]: info: native_merge_weights: vir-ip:
Rolling back scores from slony-fail
Feb 23 11:25:48 node1 pengine: [6277]: info: native_merge_weights: vir-ip:
Rolling back scores from slony-fail2
Feb 23 11:25:48 node1 pengine: [6277]: WARN: native_color: Resource vir-ip
cannot run anywhere
Feb 23 11:25:48 node1 pengine: [6277]: WARN: native_color: Resource
slony-fail cannot run anywhere
Feb 23 11:25:48 node1 pengine: [6277]: WARN: native_color: Resource
slony-fail2 cannot run anywhere
Feb 23 11:25:48 node1 pengine: [6277]: notice: LogActions: Stop resource
vir-ip(node1)
Feb 23 11:25:48 node1 pengine: [6277]: notice: LogActions: Stop resource
slony-fail    (node1)
Feb 23 11:25:48 node1 pengine: [6277]: notice: LogActions: Leave resource
pgsql:0    (Started node2)
Feb 23 11:25:48 node1 pengine: [6277]: notice: LogActions: Leave resource
pgsql:1    (Started node1)
Feb 23 11:25:48 node1 pengine: [6277]: notice: LogActions: Stop resource
slony-fail2    (node1)
Feb 23 11:25:48 node1 pengine: [6277]: notice: LogActions: Leave resource
pingd:0    (Started node2)
Feb 23 11:25:48 node1 pengine: [6277]: notice: LogActions: Leave resource
pingd:1    (Started node1)
Feb 23 11:25:48 node1 lrmd: [1626]: info: rsc:slony-fail:41: stop
Feb 23 11:25:48 node1 cib: [10242]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-8.raw
Feb 23 11:25:48 node1 crmd: [1629]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Feb 23 11:25:48 node1 lrmd: [1626]: info: rsc:slony-fail2:42: stop
Feb 23 11:25:48 node1 crmd: [1629]: info: unpack_graph: Unpacked transition
22: 4 actions in 4 synapses
Feb 23 11:25:48 node1 crmd: [1629]: info: do_te_invoke: Processing graph 22
(ref=pe_calc-dc-1266904548-176) derived from
/var/lib/pengine/pe-warn-101.bz2
Feb 23 11:25:48 node1 crmd: [1629]: info: te_rsc_command: Initiating action
11: stop slony-fail_stop_0 on node1 (local)
Feb 23 11:25:48 node1 crmd: [1629]: info: do_lrm_rsc_op: Performing
key=11:22:0:fd31c6bc-df43-4481-8b69-2c54c50075fb op=slony-fail_stop_0 )
Feb 23 11:25:48 node1 crmd: [1629]: info: te_rsc_command: Initiating action
28: stop slony-fail2_stop_0 on node1 (local)
Feb 23 11:25:48 node1 crmd: [1629]: info: do_lrm_rsc_op: Performing
key=28:22:0:fd31c6bc-df43-4481-8b69-2c54c50075fb op=slony-fail2_stop_0 )
Feb 23 11:25:48 node1 lrmd: [10244]: WARN: For LSB init script, no
additional parameters are needed.
Feb 23 11:25:48 node1 lrmd: [10243]: WARN: For LSB init script, no
additional parameters are needed.
Feb 23 11:25:48 node1 crmd: [1629]: info: process_lrm_event: LRM operation
slony-fail_stop_0 (call=41, rc=0, cib-update=188, confirmed=true) complete
ok
Feb 23 11:25:48 node1 cib: [10242]: info: write_cib_contents: Wrote version
0.286.0 of the CIB to disk (digest: aaddbe7aeaf08365be5bbbdb4931295e)
Feb 23 11:25:48 node1 crmd: [1629]: info: match_graph_event: Action
slony-fail_stop_0 (11) confirmed on node1 (rc=0)
Feb 23 11:25:48 node1 pengine: [6277]: WARN: process_pe_message: Transition
22: WARNINGs found during PE processing. PEngine Input stored in:
/var/lib/pengine/pe-warn-101.bz2
Feb 23 11:25:48 node1 pengine: [6277]: info: process_pe_message:
Configuration WARNINGs found during PE processing.  Please run "crm_verify
-L" to identify issues.
Feb 23 11:25:48 node1 crmd: [1629]: info: process_lrm_event: LRM operation
slony-fail2_stop_0 (call=42, rc=0, cib-update=189, confirmed=true) complete
ok
Feb 23 11:25:48 node1 lrmd: [1626]: info: rsc:vir-ip:43: stop
Feb 23 11:25:48 node1 crmd: [1629]: info: match_graph_event: Action
slony-fail2_stop_0 (28) confirmed on node1 (rc=0)
Feb 23 11:25:48 node1 crmd: [1629]: info: te_rsc_command: Initiating action
10: stop vir-ip_stop_0 on node1 (local)
Feb 23 11:25:48 node1 crmd: [1629]: info: do_lrm_rsc_op: Performing
key=10:22:0:fd31c6bc-df43-4481-8b69-2c54c50075fb op=vir-ip_stop_0 )
Feb 23 11:25:48 node1 crmd: [1629]: info: process_lrm_event: LRM operation
vir-ip_monitor_15000 (call=31, rc=-2, cib-update=0, confirmed=true)
Cancelled unknown exec error
Feb 23 11:25:48 node1 cib: [10242]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.gwOpFZ (digest:
/var/lib/heartbeat/crm/cib.UtFyLu)
Feb 23 11:25:48 node1 IPaddr2[10249]: [10285]: INFO: ip -f inet addr delete
192.168.10.10/24 dev eth0
Feb 23 11:25:48 node1 IPaddr2[10249]: [10287]: INFO: ip -o -f inet addr show
eth0
Feb 23 11:25:48 node1 crmd: [1629]: info: process_lrm_event: LRM operation
vir-ip_stop_0 (call=43, rc=0, cib-update=190, confirmed=true) complete ok
Feb 23 11:25:48 node1 crmd: [1629]: info: match_graph_event: Action
vir-ip_stop_0 (10) confirmed on node1 (rc=0)
Feb 23 11:25:48 node1 crmd: [1629]: info: te_pseudo_action: Pseudo action 6
fired and confirmed
Feb 23 11:25:48 node1 crmd: [1629]: info: run_graph:
====================================================
Feb 23 11:25:48 node1 crmd: [1629]: notice: run_graph: Transition 22
(Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-warn-101.bz2): Complete
Feb 23 11:25:48 node1 crmd: [1629]: info: te_graph_trigger: Transition 22 is
now complete
Feb 23 11:25:48 node1 crmd: [1629]: info: notify_crmd: Transition 22 status:
done - <null>
Feb 23 11:25:48 node1 crmd: [1629]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 23 11:25:48 node1 crmd: [1629]: info: do_state_transition: Starting
PEngine Recheck Timer
Feb 23 11:27:11 node1 cib: [1625]: info: cib_stats: Processed 88 operations
(8295.00us average, 0% utilization) in the last 10min
------------------------------------------------------------

crm_mon snippet
------------------------------------------
============
Last updated: Tue Feb 23 11:27:56 2010
Stack: Heartbeat
Current DC: node1 (ac87f697-5b44-4720-a8af-12a6f2295930) - partition with
quorum
Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
2 Nodes configured, unknown expected votes
5 Resources configured.
============

Online: [ node2 node1 ]

Clone Set: pgclone
    Started: [ node2 node1 ]
Clone Set: pingclone
    Started: [ node2 node1 ]
-------------------------------------------------------------

On Tue, Feb 23, 2010 at 9:38 AM, Jayakrishnan <jayakrishnanlll at gmail.com>wrote:

> Sir,
> I am afraid to ask you but how can I tell pacemaker to compare as number
> instead of string.
> I changed -inf: to -10000 in pingd location constarint but same problem
> persists.
> I also changer the global resource stickness to 10000. but still not
> working.
>
> With thanks,
> Jayakrishnan.L
>
>
> On Tue, Feb 23, 2010 at 1:04 AM, Andrew Beekhof <andrew at beekhof.net>wrote:
>
>> On Mon, Feb 22, 2010 at 6:46 PM, Jayakrishnan <jayakrishnanlll at gmail.com>
>> wrote:
>> > Sir,
>> > I have setup a 2 node cluster with heartbeat 2.99  pacemaker 1.05. I am
>> > using Ubuntu 9.1. Both the packages are installed from ubuntu karmic
>> > repository.
>> > My packages are:
>> >
>> > heartbeat                   2.99.2+sles11r9-5ubuntu1
>> > heartbeat-common                     2.99.2+sles11r9-5ubuntu1
>> > heartbeat-common-dev                 2.99.2+sles11r9-5ubuntu1
>> > heartbeat-dev                        2.99.2+sles11r9-5ubuntu1
>> > libheartbeat2                        2.99.2+sles11r9-5ubuntu1
>> > libheartbeat2-dev                    2.99.2+sles11r9-5ubuntu1
>> > pacemaker-heartbeat                  1.0.5+hg20090813-0ubuntu4
>> > pacemaker-heartbeat-dev              1.0.5+hg20090813-0ubuntu4
>> >
>> > My ha.cf file, crm configuration are all attached in the mail.
>> >
>> > I am making a postgres database cluster with slony replication. eth1 is
>> my
>> > heartbeat link, a cross over cable is connected between the servers in
>> eth1.
>> > eth0 is my external network where my cluster IP get assigned.
>> > server1--> hostname node1
>> > node 1 192.168.10.129 eth1
>> > 192.168.1.1-->eth0
>> >
>> >
>> > servver2 --> hostname node2
>> > node2  192.168.10.130 eth1
>> > 192.168.1.2 --> eth0
>> >
>> > Now when I pull out my eth1 cable, I need to make a failover to the
>> other
>> > node. For that i have configured pingd as follows. But it is not
>> working. My
>> > resources are not at all starting when I give rule as
>> > rule -inf: not_defined pingd or pingd lte0
>>
>> You need to get 1.0.7 or tell pacemaker to do the comparison as a
>> number instead of as a string.
>>
>> >
>> > I tried changing the -inf: to inf: then the resources got started but
>> > resource failover is not taking place when i pull out the eth1 cable.
>> >
>> > Please check my configuration and kindly point out where I am missing.
>> > PLease see that I am using default resource stickness as INFINITY which
>> is
>> > compulsory for slony replication.
>> >
>> > MY ha.cf file
>> > ------------------------------------------------------------------
>> >
>> > autojoin none
>> > keepalive 2
>> > deadtime 15
>> > warntime 10
>> > initdead 64
>> > initdead 64
>> > bcast eth1
>> > auto_failback off
>> > node node1
>> > node node2
>> > crm respawn
>> > use_logd yes
>> > ____________________________________________
>> >
>> > My crm configuration
>> >
>> > node $id="3952b93e-786c-47d4-8c2f-a882e3d3d105" node2 \
>> >         attributes standby="off"
>> > node $id="ac87f697-5b44-4720-a8af-12a6f2295930" node1 \
>> >         attributes standby="off"
>> > primitive pgsql lsb:postgresql-8.4 \
>> >         meta target-role="Started" resource-stickness="inherited" \
>> >         op monitor interval="15s" timeout="25s" on-fail="standby"
>> > primitive pingd ocf:pacemaker:pingd \
>> >         params name="pingd" hostlist="192.168.10.1 192.168.10.75" \
>> >         op monitor interval="15s" timeout="5s"
>> > primitive slony-fail lsb:slony_failover \
>> >         meta target-role="Started"
>> > primitive slony-fail2 lsb:slony_failover2 \
>> >         meta target-role="Started"
>> > primitive vir-ip ocf:heartbeat:IPaddr2 \
>> >         params ip="192.168.10.10" nic="eth0" cidr_netmask="24"
>> > broadcast="192.168.10.255" \
>> >         op monitor interval="15s" timeout="25s" on-fail="standby" \
>> >         meta target-role="Started"
>> > clone pgclone pgsql \
>> >         meta notify="true" globally-unique="false" interleave="true"
>> > target-role="Started"
>> > clone pingclone pingd \
>> >         meta globally-unique="false" clone-max="2" clone-node-max="1"
>> > location vir-ip-with-pingd vir-ip \
>> >         rule $id="vir-ip-with-pingd-rule" inf: not_defined pingd or
>> pingd
>> > lte 0
>> > meta globally-unique="false" clone-max="2" clone-node-max="1"
>> > colocation ip-with-slony inf: slony-fail vir-ip
>> > colocation ip-with-slony2 inf: slony-fail2 vir-ip
>> > order ip-b4-slony2 inf: vir-ip slony-fail2
>> > order slony-b4-ip inf: vir-ip slony-fail
>> > property $id="cib-bootstrap-options" \
>> >         dc-version="1.0.5-3840e6b5a305ccb803d29b468556739e75532d56" \
>> >         cluster-infrastructure="Heartbeat" \
>> >         no-quorum-policy="ignore" \
>> >         stonith-enabled="false" \
>> >         last-lrm-refresh="1266851027"
>> > rsc_defaults $id="rsc-options" \
>> >         resource-stickiness="INFINITY"
>> >
>> > _____________________________________
>> >
>> > My crm status:
>> > __________________________
>> >
>> > crm(live)# status
>> >
>> >
>> > ============
>> > Last updated: Mon Feb 22 23:15:56 2010
>> > Stack: Heartbeat
>> > Current DC: node2 (3952b93e-786c-47d4-8c2f-a882e3d3d105) - partition
>> with
>> > quorum
>> > Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
>> > 2 Nodes configured, unknown expected votes
>> > 5 Resources configured.
>> > ============
>> >
>> > Online: [ node2 node1 ]
>> >
>> > Clone Set: pgclone
>> >     Started: [ node1 node2 ]
>> > Clone Set: pingclone
>> >     Started: [ node2 node1 ]
>> >
>> > ============================
>> >
>> > please help me out.
>> > --
>>
>
>

-- 
Regards,

Jayakrishnan. L

Visit: www.jayakrishnan.bravehost.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100223/dcddc8b8/attachment-0001.html>
-------------- next part --------------
node $id="3952b93e-786c-47d4-8c2f-a882e3d3d105" node2 \
        attributes standby="off"
node $id="ac87f697-5b44-4720-a8af-12a6f2295930" node1 \
        attributes standby="off"
primitive pgsql lsb:postgresql-8.4 \
        meta target-role="Started" resource-stickness="inherited" \
        op monitor interval="15s" timeout="25s" on-fail="standby"
primitive pingd ocf:pacemaker:pingd \
        params name="pingd" multiplier="100" hostlist="192.168.10.1 192.168.10.69" \
        op monitor interval="15s" timeout="5s"
primitive slony-fail lsb:slony_failover \
        meta target-role="Started"
primitive slony-fail2 lsb:slony_failover2 \
        meta target-role="Started"
primitive vir-ip ocf:heartbeat:IPaddr2 \
        params ip="192.168.10.10" nic="eth0" cidr_netmask="24" broadcast="192.168.10.255" \
        op monitor interval="15s" timeout="25s" on-fail="standby" \
        meta target-role="Started"
clone pgclone pgsql \
        meta notify="true" globally-unique="false" interleave="true" target-role="Started"
clone pingclone pingd \
        meta globally-unique="false" clone-max="2" clone-node-max="1"
location vir-ip-with-pingd vir-ip \
        rule $id="vir-ip-with-pingd-rule" -inf: not_defined pingd or pingd lte 0
colocation ip-with-slony inf: slony-fail vir-ip
colocation ip-with-slony2 inf: slony-fail2 vir-ip
order ip-b4-slony2 inf: vir-ip slony-fail2
order slony-b4-ip inf: vir-ip slony-fail
property $id="cib-bootstrap-options" \
        dc-version="1.0.5-3840e6b5a305ccb803d29b468556739e75532d56" \
        cluster-infrastructure="Heartbeat" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        last-lrm-refresh="1266851027"
rsc_defaults $id="rsc-options" \
        resource-stickiness="INFINITY"

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ha.cf
Type: application/octet-stream
Size: 10563 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100223/dcddc8b8/attachment-0001.obj>