[Pacemaker] error with cib synchronisation on disk
Халезов Иван
i.khalezov at rts.ru
Thu May 16 11:31:47 UTC 2013
On 16.05.2013 07:14, Andrew Beekhof wrote:
> On 15/05/2013, at 9:53 PM, Халезов Иван <i.khalezov at rts.ru> wrote:
>
>> Hello everyone!
>>
>> Some problems occured with synchronisation CIB configuration to disk.
>> I have this errors in pacemaker's logfile:
> What were the messages before this?
> Did it happen once or many times?
> At startup or while the cluster was running?
I had updated cluster configuration before, so there was some output
about it in the logfile (not from the beginning here, because it is
rather big):
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - <primitive
id="Security_A" >
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: -
<meta_attributes id="Security_A-meta_attributes" >
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - <nvpair
id="Security_A-meta_attributes-target-role" name="target-role"
value="Stopped" __crm_diff_marker__="r
emoved:top" />
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </meta_attributes>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </primitive>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - <primitive
id="Security_B" >
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: -
<meta_attributes id="SPBEX_Security_B-meta_attributes" >
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - <nvpair
id="Security_B-meta_attributes-target-role" name="target-role"
value="Started" __crm_diff_marker__="removed:top" />
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </meta_attributes>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </primitive>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </group>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </resources>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </configuration>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </cib>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <cib
epoch="496" num_updates="1" admin_epoch="0"
validate-with="pacemaker-1.2" cib-last-written="Mon May 13 18:50:25
2013" crm_feature_set="3.0.6" update-origin="iblade6.net.rts"
update-client="cibadmin" have-quorum="1" dc-uuid="2130706433" >
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <configuration >
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <resources >
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <group
id="FAST_SENDERS" >
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: +
<meta_attributes id="FAST_SENDERS-meta_attributes"
__crm_diff_marker__="added:top" >
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <nvpair
id="FAST_SENDERS-meta_attributes-target-role" name="target-role"
value="Started" />
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + </meta_attributes>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + </group>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + </resources>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + </configuration>
May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + </cib>
May 14 13:29:13 iblade6 cib[2848]: info: cib_process_request:
Operation complete: op cib_replace for section resources
(origin=local/cibadmin/2, version=0.496.1): ok (rc=0)
May 14 13:29:13 iblade6 pengine[2852]: notice: LogActions: Start
Trades_INCR_A#011(iblade6.net.rts)
May 14 13:29:13 iblade6 pengine[2852]: notice: LogActions: Start
Trades_INCR_B#011(iblade6.net.rts)
May 14 13:29:13 iblade6 pengine[2852]: notice: LogActions: Start
Security_A#011(iblade6.net.rts)
May 14 13:29:13 iblade6 pengine[2852]: notice: LogActions: Start
Security_B#011(iblade6.net.rts)
May 14 13:29:13 iblade6 crmd[2853]: notice: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
May 14 13:29:13 iblade6 crmd[2853]: info: do_te_invoke: Processing
graph 41 (ref=pe_calc-dc-1368523753-125) derived from
/var/lib/pengine/pe-input-452.bz2
May 14 13:29:13 iblade6 crmd[2853]: info: te_rsc_command: Initiating
action 80: start Trades_INCR_A_start_0 on iblade6.net.rts (local)
May 14 13:29:13 iblade6 cluster: error: validate_cib_digest: Digest
comparision failed: expected 2c91194022c98636f90df9dd5e7176c6
(/var/lib/heartbeat/crm/cib.Zm249H), calculated
bc160870924630b3907c8cb1c3128eee
May 14 13:29:13 iblade6 cluster: error: retrieveCib: Checksum of
/var/lib/heartbeat/crm/cib.a024wF failed! Configuration contents ignored!
May 14 13:29:13 iblade6 cluster: error: retrieveCib: Usually this is
caused by manual changes, please refer to
http://clusterlabs.org/wiki/FAQ#cib_changes_detected
May 14 13:29:13 iblade6 cluster: error: crm_abort:
write_cib_contents: Triggered fatal assert at io.c:662 :
retrieveCib(tmp1, tmp2, FALSE) != NULL
May 14 13:29:13 iblade6 pengine[2852]: notice: process_pe_message:
Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2
May 14 13:29:13 iblade6 cib[2848]: error: cib_diskwrite_complete:
Disk write failed: status=134, signo=6, exitcode=0
May 14 13:29:13 iblade6 cib[2848]: error: cib_diskwrite_complete:
Disabling disk writes after write failure
It happened two times during last week. Both while the cluster was running.
>> May 14 13:29:13 iblade6 cluster: error: validate_cib_digest: Digest comparision failed: expected 2c91194022c98636f90df9dd5e7176c6 (/var/lib/heartbeat/crm/cib.Zm249H), calculated bc1
>> 60870924630b3907c8cb1c3128eee
>> May 14 13:29:13 iblade6 cluster: error: retrieveCib: Checksum of /var/lib/heartbeat/crm/cib.a024wF failed! Configuration contents ignored!
>> May 14 13:29:13 iblade6 cluster: error: retrieveCib: Usually this is caused by manual changes, please refer to http://clusterlabs.org/wiki/FAQ#cib_changes_detected
>> May 14 13:29:13 iblade6 cluster: error: crm_abort: write_cib_contents: Triggered fatal assert at io.c:662 : retrieveCib(tmp1, tmp2, FALSE) != NULL
>> May 14 13:29:13 iblade6 pengine[2852]: notice: process_pe_message: Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2
>> May 14 13:29:13 iblade6 cib[2848]: error: cib_diskwrite_complete: Disk write failed: status=134, signo=6, exitcode=0
>> May 14 13:29:13 iblade6 cib[2848]: error: cib_diskwrite_complete: Disabling disk writes after write failure
>>
>>
>> I didn't find anything about it, at this link: http://clusterlabs.org/wiki/FAQ#cib_changes_detected
>>
>> What can be the reason of this error?
>> Why the checksum of a cib file can be wrong?
>> Is it a problem of a hdd, or pacemaker bug or something else? (there are no disk or filesystem errors in syslog)
>>
>> I had a pair of such incidents during the last week.
>>
>>
>> My cluster installation: CentOS 6.4 x86_64, pacemaker 1.1.7, corosync 2.3.0
>>
>> Thank you in advance!
>>
>> Ivan Khalezov.
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
Ivan Khalezov
More information about the Pacemaker
mailing list