[ClusterLabs] All IP resources deleted once a fenced node rejoins
Ken Gaillot
kgaillot at redhat.com
Fri Jan 15 18:08:31 CET 2016
On 01/15/2016 05:02 AM, Arjun Pandey wrote:
> Based on corosync logs from orana ( The node that did the actual
> fencing and is the current master node)
>
> I also tried looking at pengine outputs based on crm_simulate. Uptil
> the fenced node rejoins things look good.
>
> [root at ucc1 orana]# crm_simulate -S --xml-file
> ./pengine/pe-input-1450.bz2 -u kamet
> Current cluster status:
> Node kamet: pending
> Online: [ orana ]
Above, "pending" means that the node has started to join the cluster,
but has not yet fully joined.
> Jan 13 19:32:53 [4295] orana pengine: info: probe_resources:
> Action probe_complete-kamet on kamet is unrunnable (pending)
Any action on kamet is unrunnable until it finishes joining the cluster.
> Jan 13 19:32:59 [4292] orana stonith-ng: info:
> crm_update_peer_proc: pcmk_cpg_membership: Node kamet[2] -
> corosync-cpg is now online
The pacemaker daemons on orana each report when they see kamet come up
at the corosync level. Here, stonith-ng sees it.
> Jan 13 19:32:59 [4291] orana cib: info:
> crm_update_peer_proc: pcmk_cpg_membership: Node kamet[2] -
> corosync-cpg is now online
Now, the cib sees it.
> Jan 13 19:33:00 [4296] orana crmd: info:
> crm_update_peer_proc: pcmk_cpg_membership: Node kamet[2] -
> corosync-cpg is now online
Now, crmd sees it.
>>>> [Arjun] Why does pengine declare that the following monitor actions are now unrunnable ?
>
> Jan 13 19:33:00 [4295] orana pengine: warning: custom_action:
> Action foo:0_monitor_0 on kamet is unrunnable (pending)
At this point, pengine still hasn't seen kamet join yet, so actions on
it are still unrunnable.
> Jan 13 19:33:00 [4296] orana crmd: info: join_make_offer:
> join-2: Sending offer to kamet
Having seen kamet at the corosync level, crmd now offers cluster-level
membership to kamet.
> Jan 13 19:33:00 [4291] orana cib: info:
> cib_process_replace: Replacement 0.4.0 from kamet not applied to
> 0.74.1: current epoch is greater than the replacement
> Jan 13 19:33:00 [4291] orana cib: warning:
> cib_process_request: Completed cib_replace operation for section
> 'all': Update was older than existing configuration (rc=-205,
> origin=kamet/cibadmin/2, version=0.74.1)
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op:
> Diff: --- 0.74.1 2
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op:
> Diff: +++ 0.75.0 (null)
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/nodes/node[@id='kamet']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/nodes/node[@id='orana']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='fence-uc-orana']/meta_attributes[@id='fence-uc-orana-meta_attributes']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='fence-uc-kamet']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='C-3']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='C-FLT']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='C-FLT2']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='E-3']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='MGMT-FLT']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='M-FLT']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='M-FLT2']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='S-FLT']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/resources/primitive[@id='S-FLT2']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-3-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-C-3-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-FLT-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-C-FLT-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-C-FLT2-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-C-FLT2-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-E-3-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-E-3-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-MGMT-FLT-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-MGMT-FLT-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-M-FLT-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-M-FLT-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-M-FLT2-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-M-FLT2-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-S-FLT-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-S-FLT-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-S-FLT2-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-S-FLT2-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-fence-uc-orana-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_colocation[@id='colocation-fence-uc-kamet-foo-master-INFINITY']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-fence-uc-kamet-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: --
> /cib/configuration/constraints/rsc_order[@id='order-fence-uc-orana-foo-master-mandatory']
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: +
> /cib: @epoch=75, @num_updates=0
> Jan 13 19:33:00 [4291] orana cib: info: cib_perform_op: +
> /cib/configuration/resources/primitive[@id='fence-uc-orana']/instance_attributes[@id='fence-uc-orana-instance_attributes']/nvpair[@id='fence-uc-orana-instance_attributes-delay']:
> @value=0
> Jan 13 19:33:00 [4291] orana cib: info:
> cib_process_request: Completed cib_replace operation for section
> configuration: OK (rc=0, origin=kamet/cibadmin/2, version=0.75.0)
The above is the problem. You can see all the resources being deleted
from the CIB ("--" indicates lines being removed from the CIB, and "+"
indicates lines being added). For some reason, the cluster used a much
older CIB on kamet to replace the current one used by the cluster.
I'm not sure why this happened; it may be a bug.
What version of pacemaker are you using?
Check the permissions on /var/lib/pacemaker/cib and the files in it on
both nodes. I'd expect everything to be owned and writeable by the
hacluster user.
>>>>>> [Arjun] What do the following logs signify ?
> Jan 13 19:33:00 [4292] orana stonith-ng: info:
> stonith_device_remove: Device 'C-3' not found (2 active devices)
These are not important in themselves, but are follow-up effects from
the resources being removed from the CIB above. Whenever the CIB
changes, stonith-ng will re-check what fencing devices are available.
More information about the Users
mailing list