[Pacemaker] A patch of crm_mon for the trouble actions.

Andrew Beekhof andrew at beekhof.net
Tue Sep 14 15:41:22 EDT 2010


Perfect. Pushed. Thanks!

   http://hg.clusterlabs.org/pacemaker/1.1/rev/d932da0b886b

2010/9/14  <renayama19661014 at ybb.ne.jp>:
> Hi Andrew,
>
> Thank you for comment.
>
>> Thanks for the explanation, I think you're right that we shouldn't be
>> showing these failed actions.
>> I think we want to do it in the PE though, eg. stop them from making
>> it into the failed_ops list in the first place.
>
> Does your answer mean that the next patch is more right?
>
> diff -r 9b95463fde99 lib/pengine/unpack.c
> --- a/lib/pengine/unpack.c      Mon Sep 13 13:07:16 2010 +0900
> +++ b/lib/pengine/unpack.c      Tue Sep 14 11:04:25 2010 +0900
> @@ -1427,7 +1427,9 @@
>                        crm_xml_add(xml_op, XML_ATTR_UNAME, node->details->uname);
>                        if(actual_rc_i != EXECRA_NOT_INSTALLED
>                           || is_set(data_set->flags, pe_flag_symmetric_cluster)) {
> -                           add_node_copy(data_set->failed, xml_op);
> +                           if ((node->details->shutdown == FALSE) || (node->details->online == TRUE)) {
> +                               add_node_copy(data_set->failed, xml_op);
> +                           }
>                        }
>                }
>                break;
> @@ -1533,7 +1535,9 @@
>                                 id, node->details->uname,
>                                 execra_code2string(actual_rc_i), actual_rc_i);
>                        crm_xml_add(xml_op, XML_ATTR_UNAME, node->details->uname);
> -                       add_node_copy(data_set->failed, xml_op);
> +                       if ((node->details->shutdown == FALSE) || (node->details->online == TRUE)) {
> +                           add_node_copy(data_set->failed, xml_op);
> +                       }
>
>                        if(*on_fail < action->on_fail) {
>                                *on_fail = action->on_fail;
>
>
>
> Best Regards,
> Hideo Yamauchi.
>
> --- Andrew Beekhof <andrew at beekhof.net> wrote:
>
>> Thanks for the explanation, I think you're right that we shouldn't be
>> showing these failed actions.
>> I think we want to do it in the PE though, eg. stop them from making
>> it into the failed_ops list in the first place.
>>
>> On Mon, Sep 13, 2010 at 10:37 AM,  <renayama19661014 at ybb.ne.jp> wrote:
>> > Hi Andrew,
>> >
>> > Thank you for comment.
>> >
>> >> I assume this is for the stonith-enabled=true case, since offline
>> >> nodes are ignored for stonith-enabled=false.
>> >> Once the node is shot, then its status section is erased and no failed
>> >> actions will be shown... so why do we need this patch?
>> >
>> > I know that trouble information disappears when I succeeded in shooting a node.
>> > In addition, in the case of stonith-enabled=false, I know that it is not displayed if a node
>> becomes
>> > the offline.
>> >
>> > (snip)
>> > � � � � � � �
> �if(this_node->details->online || is_set(data_set->flags,
>> pe_flag_stonith_enabled)) {
>> > � � � � � � � � � � �
> �/* offline nodes run no resources...
>> > � � � � � � � � � � �
> � * unless stonith is enabled in which case we need to
>> > � � � � � � � � � � �
> � * � make sure rsc start events happen after the stonith
>> > � � � � � � � � � � �
> � */
>> > � � � � � � � � � � �
> �crm_debug_3("Processing lrm resource entries");
>> > � � � � � � � � � � �
> �unpack_lrm_resources(this_node, lrm_rsc, data_set);
>> > � � � � � � � �}
>> > � � � � � � � �);
>> > (snip)
>> >
>> > But, the failed action information is displayed in crm_mon though a node is shutdown when it
>> is not
>> > necessary to shoot a node.
>> > (The failed count of times disappears then, but the failed action stays.)
>> >
>> > �# srv01 was monitor error.
>> >
>> > Migration summary:
>> > * Node srv04:
>> > * Node srv02:
>> > * Node srv01:
>> > � prmApPostgreSQLDB1: migration-threshold=1 fail-count=1
>> > * Node srv03:
>> >
>> > Failed actions:
>> > � �prmApPostgreSQLDB1_monitor_10000 (node=srv01, call=81, rc=7, status=complete):
> not running
>> >
>> > �# Next....srv01 was service stop.
>> >
>> > Migration summary: ---> The failed count of srv01 disappears
>> > * Node srv04:
>> > * Node srv02:
>> > * Node srv03:
>> >
>> > Failed actions: ---> The failed action stays
>> > � �prmApPostgreSQLDB1_monitor_10000 (node=srv01, call=81, rc=7, status=complete):
> not running
>> >
>> > Our user does not expect the trouble information of the node that stopped normally.
>> >
>> > In the case of stonith-enabled=true, should the node that trouble happened display failed
>> action
>> > information till it is shot?
>> > When the trouble information of the node that stopped normally is displayed, is not the user
>> confused?
>> >
>> > Best Regards,
>> > Hideo Yamauchi.
>> >
>> > --- Andrew Beekhof <andrew at beekhof.net> wrote:
>> >
>> >> 2010/9/13 �<renayama19661014 at ybb.ne.jp>:
>> >> > Hi,
>> >> >
>> >> > I contribute the patch of the crm_mon command.
>> >> >
>> >> > A node was offline and, in the case of the shutdown, revised it not to display a trouble
>> >> action.
>> >> >
>> >> > Please confirm a patch.
>> >> > And, without a problem, please take this revision in a development version.
>> >>
>> >> Hmmm.
>> >> I'm not sure about this patch.
>> >>
>> >> I assume this is for the stonith-enabled=true case, since offline
>> >> nodes are ignored for stonith-enabled=false.
>> >> Once the node is shot, then its status section is erased and no failed
>> >> actions will be shown... so why do we need this patch?
>> >>
>> >> >
>> >> >
>> >> > diff -r 9b95463fde99 tools/crm_mon.c
>> >> > --- a/tools/crm_mon.c � Mon Sep 13 13:07:16 2010 +0900
>> >> > +++ b/tools/crm_mon.c � Mon Sep 13 13:07:59 2010 +0900
>> >> > @@ -829,6 +829,7 @@
>> >> > � � int configured_resources = 0;
>> >> > � � int print_opts = pe_print_ncurses;
>> >> > � � const char *quorum_votes = "unknown";
>> >> > + � �gboolean is_failed_first_disp = TRUE;
>> >> >
>> >> > � � if(as_console) {
>> >> > � � � �blank_screen();
>> >> > @@ -989,16 +990,28 @@
>> >> > � � }
>> >> >
>> >> > � � if(xml_has_children(data_set->failed)) {
>> >> > - � � � print_as("\nFailed actions:\n");
>> >> > � � � �xml_child_iter(data_set->failed, xml_op,
>> >> > � � � � � � � � � �
>>>> > int val = 0;
>> >> > + � � � � � � � � � �
>> > �node_t *failed_node = NULL;
>> >> > � � � � � � � � � �
>>>> > const char *id = ID(xml_op);
>> >> > � � � � � � � � � �
>>>> > const char *last = crm_element_value(xml_op, "last_run");
>> >> > � � � � � � � � � �
>>>> > const char *node = crm_element_value(xml_op, XML_ATTR_UNAME);
>> >> > � � � � � � � � � �
>>>> > const char *call = crm_element_value(xml_op, XML_LRM_ATTR_CALLID);
>> >> > � � � � � � � � � �
>>>> > const char *rc � = crm_element_value(xml_op, XML_LRM_ATTR_RC);
>> >> > � � � � � � � � � �
>>>> > const char *status = crm_element_value(xml_op, XML_LRM_ATTR_OPSTATUS);
>> >> > -
>> >> > +
>> >> > + � � � � � � � � � �
>> > �failed_node = pe_find_node(data_set->nodes, node);
>> >> > + � � � � � � � � � �
>> > � if (failed_node != NULL) {
>> >> > + � � � � � � � � � �
>> > � � �if ((failed_node->details->shutdown == TRUE) &&
>> >> (failed_node->details->online ==
>> >> > FALSE)) {
>> >> > + � � � � � � � � � �
>> > � � � � �continue;
>> >> > + � � � � � � � � � �
>> > � � �}
>> >> > + � � � � � � � � � �
>> > �}
>> >> > +
>> >> > + � � � � � � � � � �
>> > �if (is_failed_first_disp){
>> >> > + � � � � � � � � � �
>> > � � �is_failed_first_disp = FALSE;
>> >> > + � � � � � � � � � �
>> > � � �print_as("\nFailed actions:\n");
>> >> > + � � � � � � � � � �
>> > �}
>> >> > +
>> >> > � � � � � � � � � �
>>>> > val = crm_parse_int(status, "0");
>> >> > � � � � � � � � � �
>>>> > print_as(" � �%s (node=%s, call=%s, rc=%s, status=%s",
>> >> > � � � � � � � � � �
>>>> > � � � � �id, node, call, rc, op_status2text(val));
>> >> >
>> >> >
>> >> >
>> >> > Best Regards,
>> >> > Hideo Yamauchi.
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >> >
>> >> > Project Home: http://www.clusterlabs.org
>> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> >> >
>> >> >
>> >>
>> >> _______________________________________________
>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> >>
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>>
> === 以下のメッセージは省略されました ===
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>




More information about the Pacemaker mailing list