[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Satomi TANIGUCHI taniguchis at intellilink.co.jp
Thu Nov 27 07:27:26 UTC 2008


Hi Andrew,

I found another behavior that is caused because the cluster forgets the resource 
is supposed to stay stopped.

For example, in the case of a node which has primitive and master/slave resource.
Their settings of on-fail is "standby".
When the master/slave resource is failed, all resources on failed node are going 
to stop. And master/slave resource's fail-count is increased.
But then, only primitive resource re-starts on failed node because its 
fail-count is not be increased and the cluster forgets the resource is supposed 
to stay stopped...

When F/O occurs,
in the case of _not_ master/slave resource,
pengine creates one graph to stop and restart the resource.
And in the case of master/slave resource, it creates a graph 2 times.
One is for the resource's stop-process and another is for restart-process.
And when it creates a graph for restart-process,
no one remembers that resources are supposed to stay stopped on failed node.

This behavior is same as (or similar to) what you are worried, isn't it?

To avoid this behavior, it requires to update the status of a node before 
restart-process.
On trial, I created a patch (for pacemaker-dev 366b14d79780).
And I attached the graph with patched pacemaker.
It's not a "general" way, just for reference...


Regards,
Satomi TANIGUCHI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: expand_on-fail.patch
Type: text/x-patch
Size: 8467 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081127/e07f2e8c/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pe-warn-0.left.gif
Type: image/gif
Size: 96129 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081127/e07f2e8c/attachment-0002.gif>


More information about the Pacemaker mailing list