[Pacemaker] Dangling last-failure transient attribute

Mon Nov 26 01:17:53 UTC 2012

Could be a bug. crm_report?

On Tue, Nov 20, 2012 at 6:30 PM, Vladislav Bogdanov
<bubble at hoster-ok.com> wrote:
> Hi,
>
> Looking at pengine inputs (06229e9) I noticed that there are transient
> last-failure-<rsc_id> attributes for resources last failed a long ago
> (more that 60000 seconds).
>
> Example is:
>   <node_state id="1107559690" uname="vd01-c" in_ccm="true" crmd="online"
> join="member" expected="member" crm-debug-origin="do_state_transition">
>     <transient_attributes id="1107559690">
>       <instance_attributes id="status-1107559690">
>         <nvpair id="status-1107559690-probe_complete"
> name="probe_complete" value="true"/>
>         <nvpair
> id="status-1107559690-last-failure-bubble-test01.vds-ok.com-vm"
> name="last-failure-bubble-test01.vds-ok.com-vm" value="1353335921"/>
>       </instance_attributes>
>     </transient_attributes>
>
> date +%s shows 1353396142, so 1353396142-1353335921=60221
>
> According to code ( update_failcount() and handle_failcount_op() ) that
> attribute is always added/removed in a pair with fail-count-<rsc_id>,
> but later is not in a list for any node.
>
> Is that bug or feature? Or I just miss something?
>
> Best,
> Vladislav
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org