[Pacemaker] Cannot start VirtualDomain resource after restart
emmanuel segura
emi2fast at gmail.com
Wed Jun 20 16:11:19 CEST 2012
I don't know but see the fail it's in the operation lx0_monitor_0, so i ask
to someone with more experience then me, if pacemaker does a monitor
operation before start?
maybe when you restart the resource something goes wrong and the resource
fail and after that it's blocked
================
on-fail="block"
================
2012/6/20 Kadlecsik József <kadlecsik.jozsef at wigner.mta.hu>
> On Wed, 20 Jun 2012, emmanuel segura wrote:
>
> > Why you say there is not error in the message
> > =========================================================
> > Jun 20 11:57:25 atlas4 lrmd: [17568]: info: operation monitor[35] on lx0
> > for client 17571: pid 30179 exited with return code 7
> > Jun 20 11:57:25 atlas4 crmd: [17571]: debug: create_operation_update:
> > do_update_resource: Updating resouce lx0 after complete monitor op
> > (interval=0)
> > Jun 20 11:57:25 atlas4 crmd: [17571]: info: process_lrm_event: LRM
> > operation lx0_monitor_0 (call=35, rc=7, cib-update=61, confirmed=true)
> not
> > running
>
> I interpreted those lines as a checking that the resource hasn't been
> started yet (confirmed=true). And indeed, it's not running so the return
> code is OCF_NOT_RUNNING.
>
> There's no log message about an attempt to start the resource.
>
> Best regards,
> Jozsef
>
> > 2012/6/20 Kadlecsik József <kadlecsik.jozsef at wigner.mta.hu>
> > Hello,
> >
> > Somehow a VirtualDomain resource after a "crm resource restart",
> > which did
> > *not* start the resource but just stop, the resource cannot be
> > started
> > anymore. The most baffling is that I do not see an error
> > message. The
> > resource in question, named 'lx0', can be started directly via
> > virsh/libvirt and libvirtd is running on all cluster nodes.
> >
> > We run corosync 1.4.2-1~bpo60+1, pacemaker 1.1.6-2~bpo60+1
> > (debian).
> >
> > # crm status
> > ============
> > Last updated: Wed Jun 20 15:14:44 2012
> > Last change: Wed Jun 20 14:07:40 2012 via cibadmin on atlas0
> > Stack: openais
> > Current DC: atlas0 - partition with quorum
> > Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> > 7 Nodes configured, 7 expected votes
> > 18 Resources configured.
> > ============
> >
> > Online: [ atlas0 atlas1 atlas2 atlas3 atlas4 atlas5 atlas6 ]
> >
> > kerberos (ocf::heartbeat:VirtualDomain): Started atlas0
> > stonith-atlas3 (stonith:ipmilan): Started atlas4
> > stonith-atlas1 (stonith:ipmilan): Started atlas4
> > stonith-atlas2 (stonith:ipmilan): Started atlas4
> > stonith-atlas0 (stonith:ipmilan): Started atlas4
> > stonith-atlas4 (stonith:ipmilan): Started atlas3
> > mailman (ocf::heartbeat:VirtualDomain): Started atlas6
> > indico (ocf::heartbeat:VirtualDomain): Started atlas0
> > papi (ocf::heartbeat:VirtualDomain): Started atlas1
> > wwwd (ocf::heartbeat:VirtualDomain): Started atlas2
> > webauth (ocf::heartbeat:VirtualDomain): Started atlas3
> > caladan (ocf::heartbeat:VirtualDomain): Started atlas4
> > radius (ocf::heartbeat:VirtualDomain): Started atlas5
> > mail0 (ocf::heartbeat:VirtualDomain): Started atlas6
> > stonith-atlas5 (stonith:apcmastersnmp): Started atlas4
> > stonith-atlas6 (stonith:apcmastersnmp): Started atlas4
> > w0 (ocf::heartbeat:VirtualDomain): Started atlas2
> >
> > # crm resource show
> > kerberos (ocf::heartbeat:VirtualDomain) Started
> > stonith-atlas3 (stonith:ipmilan) Started
> > stonith-atlas1 (stonith:ipmilan) Started
> > stonith-atlas2 (stonith:ipmilan) Started
> > stonith-atlas0 (stonith:ipmilan) Started
> > stonith-atlas4 (stonith:ipmilan) Started
> > mailman (ocf::heartbeat:VirtualDomain) Started
> > indico (ocf::heartbeat:VirtualDomain) Started
> > papi (ocf::heartbeat:VirtualDomain) Started
> > wwwd (ocf::heartbeat:VirtualDomain) Started
> > webauth (ocf::heartbeat:VirtualDomain) Started
> > caladan (ocf::heartbeat:VirtualDomain) Started
> > radius (ocf::heartbeat:VirtualDomain) Started
> > mail0 (ocf::heartbeat:VirtualDomain) Started
> > stonith-atlas5 (stonith:apcmastersnmp) Started
> > stonith-atlas6 (stonith:apcmastersnmp) Started
> > w0 (ocf::heartbeat:VirtualDomain) Started
> > lx0 (ocf::heartbeat:VirtualDomain) Stopped
> >
> > # crm configure show
> > node atlas0 \
> > attributes standby="false" \
> > utilization memory="24576"
> > node atlas1 \
> > attributes standby="false" \
> > utilization memory="24576"
> > node atlas2 \
> > attributes standby="false" \
> > utilization memory="24576"
> > node atlas3 \
> > attributes standby="false" \
> > utilization memory="24576"
> > node atlas4 \
> > attributes standby="false" \
> > utilization memory="24576"
> > node atlas5 \
> > attributes standby="off" \
> > utilization memory="20480"
> > node atlas6 \
> > attributes standby="off" \
> > utilization memory="20480"
> > primitive caladan ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/crm/caladan.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started"
> > is-managed="true" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="4608"
> > primitive indico ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/crm/indico.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started"
> > is-managed="true" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="5120"
> > primitive kerberos ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/qemu/kerberos.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started"
> > is-managed="true" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="4608"
> > primitive lx0 ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/crm/lx0.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started"
> > is-managed="true" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="4608"
> > primitive mail0 ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/crm/mail0.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started"
> > is-managed="true" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="4608"
> > primitive mailman ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/crm/mailman.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started"
> > is-managed="true" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="5120"
> > primitive papi ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/crm/papi.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started"
> > is-managed="true" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="6144"
> > primitive radius ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/crm/radius.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started"
> > is-managed="true" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="4608"
> > primitive stonith-atlas0 stonith:ipmilan \
> > params hostname="atlas0" ipaddr="192.168.40.20"
> > port="623"
> > auth="md5" priv="admin" login="root" password="XXXXX" \
> > op start interval="0" timeout="120s" \
> > meta target-role="Started"
> > primitive stonith-atlas1 stonith:ipmilan \
> > params hostname="atlas1" ipaddr="192.168.40.21"
> > port="623"
> > auth="md5" priv="admin" login="root" password="XXXX" \
> > op start interval="0" timeout="120s" \
> > meta target-role="Started"
> > primitive stonith-atlas2 stonith:ipmilan \
> > params hostname="atlas2" ipaddr="192.168.40.22"
> > port="623"
> > auth="md5" priv="admin" login="root" password="XXXX" \
> > op start interval="0" timeout="120s" \
> > meta target-role="Started"
> > primitive stonith-atlas3 stonith:ipmilan \
> > params hostname="atlas3" ipaddr="192.168.40.23"
> > port="623"
> > auth="md5" priv="admin" login="root" password="XXXX" \
> > op start interval="0" timeout="120s" \
> > meta target-role="Started"
> > primitive stonith-atlas4 stonith:ipmilan \
> > params hostname="atlas4" ipaddr="192.168.40.24"
> > port="623"
> > auth="md5" priv="admin" login="root" password="XXXX" \
> > op start interval="0" timeout="120s" \
> > meta target-role="Started"
> > primitive stonith-atlas5 stonith:apcmastersnmp \
> > params ipaddr="192.168.40.252" port="161"
> > community="XXXX"
> > pcmk_host_list="atlas5" pcmk_host_check="static-list"
> > primitive stonith-atlas6 stonith:apcmastersnmp \
> > params ipaddr="192.168.40.252" port="161"
> > community="XXXX"
> > pcmk_host_list="atlas6" pcmk_host_check="static-list"
> > primitive w0 ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/crm/w0.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="4608"
> > primitive webauth ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/crm/webauth.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started"
> > is-managed="true" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="4608"
> > primitive wwwd ocf:heartbeat:VirtualDomain \
> > params config="/etc/libvirt/crm/wwwd.xml"
> > hypervisor="qemu:///system" \
> > meta allow-migrate="true" target-role="Started"
> > is-managed="true" \
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="10s" timeout="40s" depth="0" \
> > op migrate_to interval="0" timeout="240s" on-fail="block"
> > \
> > op migrate_from interval="0" timeout="240s"
> > on-fail="block" \
> > utilization memory="5120"
> > location location-stonith-atlas0 stonith-atlas0 -inf: atlas0
> > location location-stonith-atlas1 stonith-atlas1 -inf: atlas1
> > location location-stonith-atlas2 stonith-atlas2 -inf: atlas2
> > location location-stonith-atlas3 stonith-atlas3 -inf: atlas3
> > location location-stonith-atlas4 stonith-atlas4 -inf: atlas4
> > location location-stonith-atlas5 stonith-atlas5 -inf: atlas5
> > location location-stonith-atlas6 stonith-atlas6 -inf: atlas6
> > property $id="cib-bootstrap-options" \
> >
> > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> > cluster-infrastructure="openais" \
> > expected-quorum-votes="7" \
> > stonith-enabled="true" \
> > no-quorum-policy="stop" \
> > last-lrm-refresh="1340193431" \
> > symmetric-cluster="true" \
> > maintenance-mode="false" \
> > stop-all-resources="false" \
> > is-managed-default="true" \
> > placement-strategy="balanced"
> >
> > # crm_verify -L -VV
> > [...]
> > crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave
> > w0
> > (Started atlas2)
> > crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave
> > stonith-atlas6 (Started atlas4)
> > crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave
> > stonith-atlas5 (Started atlas4)
> > crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave
> > stonith-atlas4 (Started atlas3)
> > crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave
> > stonith-atlas3 (Started atlas4)
> > crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave
> > stonith-atlas2 (Started atlas4)
> > crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave
> > stonith-atlas1 (Started atlas4)
> > crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave
> > stonith-atlas0 (Started atlas4)
> > crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Start
> > lx0
> > (atlas4)
> >
> > I have tried to delete the resource and add again, did not help.
> > The corresponding log entries:
> >
> > Jun 20 11:57:25 atlas4 crmd: [17571]: info: delete_resource:
> > Removing
> > resource lx0 for 28654_crm_resource (internal) on atlas0
> > Jun 20 11:57:25 atlas4 lrmd: [17568]: debug: lrmd_rsc_destroy:
> > removing
> > resource lx0
> > Jun 20 11:57:25 atlas4 crmd: [17571]: debug: delete_rsc_entry:
> > sync:
> > Sending delete op for lx0
> > Jun 20 11:57:25 atlas4 crmd: [17571]: info: notify_deleted:
> > Notifying
> > 28654_crm_resource on atlas0 that lx0 was deleted
> > Jun 20 11:57:25 atlas4 crmd: [17571]: WARN:
> > decode_transition_key: Bad
> > UUID (crm-resource-28654) in sscanf result (3) for
> > 0:0:crm-resource-28654
> > Jun 20 11:57:25 atlas4 crmd: [17571]: debug:
> > create_operation_update:
> > send_direct_ack: Updating resouce lx0 after complete delete op
> > (interval=60000)
> > Jun 20 11:57:25 atlas4 crmd: [17571]: info: send_direct_ack:
> > ACK'ing
> > resource op lx0_delete_60000 from 0:0:crm-resource-28654:
> > lrm_invoke-lrmd-1340186245-16
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] mcasted
> > message added
> > to pending queue
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] mcasted
> > message added
> > to pending queue
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering
> > 10d5 to 10d7
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering
> > MCAST
> > message with seq 10d6 to pending delivery queue
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering
> > MCAST
> > message with seq 10d7 to pending delivery queue
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Received
> > ringid(192.168.40.60:22264) seq 10d6
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Received
> > ringid(192.168.40.60:22264) seq 10d7
> > Jun 20 11:57:25 atlas4 crmd: [17571]: debug: notify_deleted:
> > Triggering a
> > refresh after 28654_crm_resource deleted lx0 from the LRM
> > Jun 20 11:57:25 atlas4 cib: [17567]: debug: cib_process_xpath:
> > Processing
> > cib_query op for
> >
> //cib/configuration/crm_config//cluster_property_set//nvpair[@name='last-lr
> > m-refresh']
> > (/cib/configuration/crm_config/cluster_property_set/nvpair[6])
> >
> >
> > Jun 20 11:57:25 atlas4 lrmd: [17568]: debug:
> > on_msg_add_rsc:client [17571]
> > adds resource lx0
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering
> > 149e to 149f
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering
> > MCAST
> > message with seq 149f to pending delivery queue
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Received
> > ringid(192.168.40.60:22264) seq 14a0
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering
> > 149f to 14a0
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering
> > MCAST
> > message with seq 14a0 to pending delivery queue
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] releasing
> > messages up
> > to and including 149e
> > Jun 20 11:57:25 atlas4 crmd: [17571]: info: do_lrm_rsc_op:
> > Performing
> > key=26:10266:7:e7426ec7-3bae-4a4b-a4ae-c3f80f17e058
> > op=lx0_monitor_0 )
> > Jun 20 11:57:25 atlas4 lrmd: [17568]: debug:
> > on_msg_perform_op:2396:
> > copying parameters for rsc lx0
> > Jun 20 11:57:25 atlas4 lrmd: [17568]: debug: on_msg_perform_op:
> > add an
> > operation operation monitor[35] on lx0 for client 17571, its
> > parameters:
> > crm_feature_set=[3.0.5] config=[/etc/libvirt/crm/lx0.xml]
> > CRM_meta_timeout=[20000] hypervisor=[qemu:///system] to the
> > operation
> > list.
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] releasing
> > messages up
> > to and including 149f
> > Jun 20 11:57:25 atlas4 lrmd: [17568]: info: rsc:lx0 probe[35]
> > (pid 30179)
> > Jun 20 11:57:25 atlas4 VirtualDomain[30179]: INFO: Domain name
> > "lx0" saved
> > to /var/run/resource-agents/VirtualDomain-lx0.state.
> > Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] releasing
> > messages up
> > to and including 14bc
> > Jun 20 11:57:25 atlas4 VirtualDomain[30179]: DEBUG: Virtual
> > domain lx0 is
> > currently shut off.
> > Jun 20 11:57:25 atlas4 lrmd: [17568]: WARN: Managed lx0:monitor
> > process
> > 30179 exited with return code 7.
> > Jun 20 11:57:25 atlas4 lrmd: [17568]: info: operation
> > monitor[35] on lx0
> > for client 17571: pid 30179 exited with return code 7
> > Jun 20 11:57:25 atlas4 crmd: [17571]: debug:
> > create_operation_update:
> > do_update_resource: Updating resouce lx0 after complete monitor
> > op
> > (interval=0)
> > Jun 20 11:57:25 atlas4 crmd: [17571]: info: process_lrm_event:
> > LRM
> > operation lx0_monitor_0 (call=35, rc=7, cib-update=61,
> > confirmed=true) not
> > running
> > Jun 20 11:57:25 atlas4 crmd: [17571]: debug:
> > update_history_cache:
> > Appending monitor op to history for 'lx0'
> > Jun 20 11:57:25 atlas4 crmd: [17571]: debug: get_xpath_object:
> > No match
> > for //cib_update_result//diff-added//crm_config in
> > /notify/cib_update_result/diff
> >
> > What can be wrong in the setup/configuration? And what on the
> > earth
> > happened?
> >
> > Best regards,
> > Jozsef
> > --
> > E-mail : kadlecsik.jozsef at wigner.mta.hu
> > PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> > Address: Wigner Research Centre for Physics, Hungarian Academy
> > of Sciences
> > H-1525 Budapest 114, POB. 49, Hungary
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> >
> > --
> > esta es mi vida e me la vivo hasta que dios quiera
> >
> >
>
> --
> E-mail : kadlecsik.jozsef at wigner.mta.hu
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
> H-1525 Budapest 114, POB. 49, Hungary
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
--
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120620/801f86b7/attachment-0001.html>
More information about the Pacemaker
mailing list