[ClusterLabs] Transition aborted when disabling resource

Ken Gaillot kgaillot at redhat.com
Sat Sep 30 18:02:27 EDT 2017


On Thu, 2017-09-28 at 10:49 +0000, Roberto Muñoz Gomez wrote:
> > 
> > > > It is common to get a "Transition aborted" error when try to
> > > disable
> > > > a resource?
> > > 
> > > Yes, "Transition aborted" is not an error (notice the log is at
> > > "info:"
> > > level), just an indication that something in the configuration or
> > > status changed, and requires a new transition calculation.
> > > 
> > > 
> > > If its not an error, then why the resource is not disable
> > > afterwards?
> > > The command seems to be forgotten.
> >  
> > It is very well possible that there is an error.
> > The transition being aborted is not an error because a
> > configuration
> > change would mean that pacemaker (pengine in particular) has to
> > reconsider the decisions made before. The calculated transition is
> > kind of a write-up was has to be done next. 
> > The fact that the transition is being aborted makes it actually
> > very
> > reasonable that the configuration had been updated.
> > 
> > Did you check the config (like  target-role) or did you just see
> > the
> > service still running by other means (systemctl status, ps, ...)?
> > 
> > Possible errors that prevent a service from being stopped should
> > appear in the logs after pacemaker has calculated a new transition
> > and faces problems executing it (e.g. resource-agent having
> > problems to stop the resource).
> > 
> > 
> > 
> > 
> > Maybe I miss something, this is the full log when disabling the
> > resource. We have several "disable" commands at the same time
> > because we need to stop the application to perform some End-Of-Day
> > tasks, so in the middle of one "disable" another comes and then I
> > guess the "Transtion aborted" happens.
> > 
> > 
> > Sep 25 23:50:07 [4492] vttwinformlrz1        cib:     info:
> > crm_compress_string: Compressed 1106686 bytes into 72331 (ratio 
> > 15:1) in 171ms
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      Diff: --- 0.18638.0 2
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      Diff: +++ 0.18639.0 (null)
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      --
> > /cib/configuration/resources/group[@id='ctpinet']/primitive[@id='ct
> > pinetfh']/meta_attributes[@id='ctpinetfh-meta_attributes']
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      +  /cib:  @epoch=18639
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      ++
> > /cib/configuration/resources/group[@id='cmecp2']/primitive[@id='cme
> > cp2dv']/meta_attributes[@id='cmecp2dv-meta_attributes']:  <nvpair
> > id="cmecp2dv-meta_attributes-target-role" name="target-role"
> > value="Stopped"/>

FYI, this line above is what "pcs resource disable" does ... set
target-role to Stopped

> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_process_request: Completed cib_replace operation for section
> > configuration: OK (rc=0, origin=vttwinformlrz1/cibadmin/2,
> > version=0.18639.0)
> > Sep 25 23:50:08 [4501] vttwinformlrz1       crmd:     info:
> > abort_transition_graph:      Transition aborted by deletion of
> > meta_attributes[@id='ctpinetfh-meta_attributes']: Configuration
> > change | cib=0.18639.0 source=te_update_diff:456
> > path=/cib/configuration/resources/group[@id='ctpinet']/primitive[@i
> > d='ctpinetfh']/meta_attributes[@id='ctpinetfh-meta_attributes']
> > complete=false
> > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng:     info:
> > stonith_device_remove:       Device 'ctpinetfh' not found (1 active
> > devices)

Yes, these stonith_device_remove messages annoy me, but I haven't
gotten around to improving them yet. They mean nothing significant ...
stonithd just has to check if any resource modified in the CIB was a
fencing device. If it wasn't, it logs this "not found" message.

> > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng:     info:
> > update_cib_stonith_devices_v2:       Updating device list from the
> > cib: create meta_attributes[@id='cmecp2dv-meta_attributes']
> > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng:     info:
> > cib_devices_update:  Updating devices to version 0.18639.0
> > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng:     info:
> > cib_device_update:   Device fence1 has been disabled on
> > vttwinformlrz1: score=-INFINITY
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_process_request: Forwarding cib_replace operation for section
> > configuration to all (origin=local/cibadmin/2)
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > crm_compress_string: Compressed 1106837 bytes into 72490 (ratio
> > 15:1) in 171ms
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      Diff: --- 0.18639.0 2
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      Diff: +++ 0.18640.0 (null)
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      --
> > /cib/configuration/resources/group[@id='cmecp2']/primitive[@id='cme
> > cp2dv']/meta_attributes[@id='cmecp2dv-
> > meta_attributes']/nvpair[@id='cmecp2dv-meta_attributes-target-
> > role']

This is interesting: something just deleted the target-role that was
set earlier in the same second. So, now the resource won't be stopped.

> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      +  /cib:  @epoch=18640
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      ++
> > /cib/configuration/resources/group[@id='cftmd1']/primitive[@id='cft
> > md1s1']:  <meta_attributes id="cftmd1s1-meta_attributes"/>
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:     
> > ++                                                                 
> >                <nvpair id="cftmd1s1-meta_attributes-target-role"
> > name="target-role" value="Stopped"/>

Here, a different resource is disabled.

> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:     
> > ++                                                                 
> >              </meta_attributes>
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      ++
> > /cib/configuration/resources/group[@id='cinet2']/primitive[@id='cin
> > et2dv']/meta_attributes[@id='cinet2dv-meta_attributes']:  <nvpair
> > id="cinet2dv-meta_attributes-target-role" name="target-role"
> > value="Stopped"/>
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_process_request: Completed cib_replace operation for section
> > configuration: OK (rc=0, origin=vttwinformlrz1/cibadmin/2,
> > version=0.18640.0)
> > Sep 25 23:50:08 [4501] vttwinformlrz1       crmd:     info:
> > abort_transition_graph:      Transition aborted by deletion of
> > nvpair[@id='cmecp2dv-meta_attributes-target-role']: Configuration
> > change | cib=0.18640.0 source=te_update_diff:456
> > path=/cib/configuration/resources/group[@id='cmecp2']/primitive[@id
> > ='cmecp2dv']/meta_attributes[@id='cmecp2dv-
> > meta_attributes']/nvpair[@id='cmecp2dv-meta_attributes-target-
> > role'] complete=false

Then the original resource's target-role is deleted again. This is what
 to investigate -- what is deleting the target-role. It could be
something calling "pcs resource enable".

> > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng:     info:
> > stonith_device_remove:       Device 'cmecp2dv' not found (1 active
> > devices)
> > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng:     info:
> > update_cib_stonith_devices_v2:       Updating device list from the
> > cib: create primitive[@id='cftmd1s1']
> > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng:     info:
> > cib_devices_update:  Updating devices to version 0.18640.0
> > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng:     info:
> > cib_device_update:   Device fence1 has been disabled on
> > vttwinformlrz1: score=-INFINITY
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_process_request: Forwarding cib_replace operation for section
> > configuration to all (origin=local/cibadmin/2)
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > crm_compress_string: Compressed 1106885 bytes into 72394 (ratio
> > 15:1) in 171ms
> > Sep 25 23:50:08 [4497] vttwinformlrz1       lrmd:     info:
> > log_finished:        finished - rsc:ctpinetibfh action:stop
> > call_id:6205 pid:10248 exit-code:0 exec-time:1234ms queue-time:0ms
> > Sep 25 23:50:08 [4501] vttwinformlrz1       crmd:   notice:
> > process_lrm_event:   Result of stop operation for ctpinetibfh on
> > vttwinformlrz1: 0 (ok) | call=6205 key=ctpinetibfh_stop_0
> > confirmed=true cib-update=24192
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      Diff: --- 0.18640.0 2
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      Diff: +++ 0.18640.1 (null)
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      +  /cib:  @num_updates=1
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_perform_op:      + 
> > /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_reso
> > urce[@id='cftmd1s1']/lrm_rsc_op[@id='cftmd1s1_last_0']: 
> > @operation_key=cftmd1s1_start_0, @operation=start, @transition-
> > key=513:17221:0:d060d698-76d6-4a95-8f54-b0cd908aa999, @transition-
> > magic=0:0;513:17221:0:d060d698-76d6-4a95-8f54-b0cd908aa999, @call-
> > id=10460, @last-run=1506376207, @last-rc-change=1506376207, @exec-
> > time=1190
> > Sep 25 23:50:08 [4492] vttwinformlrz1        cib:     info:
> > cib_process_request: Completed cib_modify operation for section
> > status: OK (rc=0, origin=vttwinformlrz2/crmd/9922,
> > version=0.18640.1)
> > Sep 25 23:50:08 [4501] vttwinformlrz1       crmd:     info:
> > match_graph_event:   Action cftmd1s1_start_0 (513) confirmed on
> > vttwinformlrz2 (rc=0)
> > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng:     info:
> > update_cib_stonith_devices_v2:       Updating device list from the
> > cib: modify lrm_rsc_op[@id='cftmd1s1_last_0']
> > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng:     info:
> > cib_devices_update:  Updating devices to version 0.18640.1
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > I noticed the service is still running because the EOD taks fails
> > and the next day the program is still running and it should not.
> > The forward lines in the log do not show anything regarding this
> > resource, it only shows it is 'Started'
>  
> Well we see that the target-role is properly set to 'Stopped' in the
> CIB.
> As we don't see the pengine running triggered by this config change I
> would
> suppose that the logs you have attached are not from the DC where the
> pengine would be running. (e.g. 'Current DC' shown by crm_mon)
> 
> 
> Yes it was the DC becasuse the log in the other node is even shorter.
> I have noticed the line: "Sep 25 23:50:08 [4495] vttwinformlrz1
> stonith-ng:     info: stonith_device_remove:       Device 'ctpinetfh'
> not found (1 active devices)"
> 
> The resource is not a stonith device. Why try to remove it?
> 
> I have set up some logs in the resource agent to trace if the stop
> method is called.
> 
> 
> Regards,
> Klaus
> 
> > Regards
> > Roberto
-- 
Ken Gaillot <kgaillot at redhat.com>







More information about the Users mailing list