[Pacemaker] resources are restarted without obvious reasons

Tue Oct 5 09:15:07 UTC 2010

On Fri, Oct 1, 2010 at 9:53 AM, Pavlos Parissis
<pavlos.parissis at gmail.com> wrote:
> Hi,
> It seams that it happens every time PE wants to check the conf
> 09:23:55 crmd: [3473]: info: crm_timer_popped: PEngine Recheck Timer
> (I_PE_CALC) just popped!
>
> and then check_rsc_parameters() wants to reset my resources
>
> 09:23:55 pengine: [3979]: notice: check_rsc_parameters: Forcing restart of
> pbx_02 on node-02, provider changed: heartbeat -> <null>
> 09:23:55 pengine: [3979]: notice: DeleteRsc: Removing pbx_02 from node-02
> 09:23:55 pengine: [3979]: notice: check_rsc_parameters: Forcing restart of
> pbx_01 on node-01, provider changed: heartbeat -> <null>

Could be a bug in the code that detects changes to the resource definition.
Could you file a bug please?
    http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

> looking at the code I can't conclude where the issue could  be, in the
> actual conf or  I am hitting a bug
> static gboolean
> check_rsc_parameters(resource_t *rsc, node_t *node, xmlNode *rsc_entry,
>              pe_working_set_t *data_set)
> {
>     int attr_lpc = 0;
>     gboolean force_restart = FALSE;
>     gboolean delete_resource = FALSE;
>
>     const char *value = NULL;
>     const char *old_value = NULL;
>     const char *attr_list[] = {
>         XML_ATTR_TYPE,
>         XML_AGENT_ATTR_CLASS,
>         XML_AGENT_ATTR_PROVIDER
>     };
>
>     for(; attr_lpc < DIMOF(attr_list); attr_lpc++) {
>         value = crm_element_value(rsc->xml, attr_list[attr_lpc]);
>         old_value = crm_element_value(rsc_entry, attr_list[attr_lpc]);
>         if(value == old_value /* ie. NULL */
>            || crm_str_eq(value, old_value, TRUE)) {
>             continue;
>         }
>
>         force_restart = TRUE;
>         crm_notice("Forcing restart of %s on %s, %s changed: %s -> %s",
>                rsc->id, node->details->uname, attr_list[attr_lpc],
>                crm_str(old_value), crm_str(value));
>     }
>     if(force_restart) {
>         /* make sure the restart happens */
>         stop_action(rsc, node, FALSE);
>         set_bit(rsc->flags, pe_rsc_start_pending);
>         delete_resource = TRUE;
>     }
>     return delete_resource;
> }
>
>
> On 1 October 2010 09:13, Pavlos Parissis <pavlos.parissis at gmail.com> wrote:
>>
>> Hi
>> Could be related to a possible bug mentioned here[1]?
>>
>> BTW here is the conf of pacemaker
>> node $id="b8ad13a6-8a6e-4304-a4a1-8f69fa735100" node-02
>> node $id="d5557037-cf8f-49b7-95f5-c264927a0c76" node-01
>> node $id="e5195d6b-ed14-4bb3-92d3-9105543f9251" node-03
>> primitive drbd_01 ocf:linbit:drbd \
>>         params drbd_resource="drbd_pbx_service_1" \
>>         op monitor interval="30s" \
>>         op start interval="0" timeout="240s" \
>>         op stop interval="0" timeout="120s"
>> primitive drbd_02 ocf:linbit:drbd \
>>         params drbd_resource="drbd_pbx_service_2" \
>>         op monitor interval="30s" \
>>         op start interval="0" timeout="240s" \
>>         op stop interval="0" timeout="120s"
>> primitive fs_01 ocf:heartbeat:Filesystem \
>>         params device="/dev/drbd1" directory="/pbx_service_01"
>> fstype="ext3" \
>>         meta migration-threshold="3" failure-timeout="60" \
>>         op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \
>>         op start interval="0" timeout="60s" \
>>         op stop interval="0" timeout="60s"
>> primitive fs_02 ocf:heartbeat:Filesystem \
>>         params device="/dev/drbd2" directory="/pbx_service_02"
>> fstype="ext3" \
>>         meta migration-threshold="3" failure-timeout="60" \
>>         op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \
>>         op start interval="0" timeout="60s" \
>>         op stop interval="0" timeout="60s"
>> primitive ip_01 ocf:heartbeat:IPaddr2 \
>>         params ip="192.168.78.10" cidr_netmask="24"
>> broadcast="192.168.78.255" \
>>         meta failure-timeout="120" migration-threshold="3" \
>>         op monitor interval="5s"
>> primitive ip_02 ocf:heartbeat:IPaddr2 \
>>         params ip="192.168.78.20" cidr_netmask="24"
>> broadcast="192.168.78.255" \
>>         op monitor interval="5s"
>> primitive pbx_01 lsb:test-01 \
>>         meta failure-timeout="60" migration-threshold="3"
>> target-role="Started" \
>>         op monitor interval="20s" \
>>         op start interval="0" timeout="60s" \
>>         op stop interval="0" timeout="60s"
>> primitive pbx_02 lsb:test-02 \
>>         meta failure-timeout="60" migration-threshold="3"
>> target-role="Started" \
>>         op monitor interval="20s" \
>>         op start interval="0" timeout="60s" \
>>         op stop interval="0" timeout="60s"
>> group pbx_service_01 ip_01 fs_01 pbx_01 \
>>         meta target-role="Started"
>> group pbx_service_02 ip_02 fs_02 pbx_02 \
>>         meta target-role="Started"
>> ms ms-drbd_01 drbd_01 \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true" target-role="Started"
>> ms ms-drbd_02 drbd_02 \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true" target-role="Started"
>> location PrimaryNode-drbd_01 ms-drbd_01 100: node-01
>> location PrimaryNode-drbd_02 ms-drbd_02 100: node-02
>> location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01
>> location PrimaryNode-pbx_service_02 pbx_service_02 200: node-02
>> location SecondaryNode-drbd_01 ms-drbd_01 0: node-03
>> location SecondaryNode-drbd_02 ms-drbd_02 0: node-03
>> location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03
>> location SecondaryNode-pbx_service_02 pbx_service_02 10: node-03
>> colocation fs_01-on-drbd_01 inf: fs_01 ms-drbd_01:Master
>> colocation fs_02-on-drbd_02 inf: fs_02 ms-drbd_02:Master
>> order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote
>> pbx_service_01:start
>> order pbx_service_02-after-drbd_02 inf: ms-drbd_02:promote
>> pbx_service_02:start
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
>>         cluster-infrastructure="Heartbeat" \
>>         stonith-enabled="false" \
>>         symmetric-cluster="false" \
>>         last-lrm-refresh="1285323745"
>> rsc_defaults $id="rsc-options" \
>>
>> Cheers,
>> Pavlos
>>
>>
>>
>>
>> [1]
>> http://oss.clusterlabs.org/pipermail/pacemaker/2010-September/007624.html
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>