[Pacemaker] A patch of crm_mon for the trouble actions.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Tue Sep 14 20:32:02 EDT 2010


Hi Andrew,

> Perfect. Pushed. Thanks!
> 
>    http://hg.clusterlabs.org/pacemaker/1.1/rev/d932da0b886b

Thanks!!

Hideo Yamauchi.

--- Andrew Beekhof <andrew at beekhof.net> wrote:

> Perfect. Pushed. Thanks!
> 
>    http://hg.clusterlabs.org/pacemaker/1.1/rev/d932da0b886b
> 
> 2010/9/14  <renayama19661014 at ybb.ne.jp>:
> > Hi Andrew,
> >
> > Thank you for comment.
> >
> >> Thanks for the explanation, I think you're right that we shouldn't be
> >> showing these failed actions.
> >> I think we want to do it in the PE though, eg. stop them from making
> >> it into the failed_ops list in the first place.
> >
> > Does your answer mean that the next patch is more right?
> >
> > diff -r 9b95463fde99 lib/pengine/unpack.c
> > --- a/lib/pengine/unpack.c      Mon Sep 13 13:07:16 2010 +0900
> > +++ b/lib/pengine/unpack.c      Tue Sep 14 11:04:25 2010 +0900
> > @@ -1427,7 +1427,9 @@
> >                        crm_xml_add(xml_op, XML_ATTR_UNAME, node->details->uname);
> >                        if(actual_rc_i != EXECRA_NOT_INSTALLED
> >                           || is_set(data_set->flags, pe_flag_symmetric_cluster)) {
> > -                           add_node_copy(data_set->failed, xml_op);
> > +                           if ((node->details->shutdown == FALSE) || (node->details->online
> == TRUE)) {
> > +                               add_node_copy(data_set->failed, xml_op);
> > +                           }
> >                        }
> >                }
> >                break;
> > @@ -1533,7 +1535,9 @@
> >                                 id, node->details->uname,
> >                                 execra_code2string(actual_rc_i), actual_rc_i);
> >                        crm_xml_add(xml_op, XML_ATTR_UNAME, node->details->uname);
> > -                       add_node_copy(data_set->failed, xml_op);
> > +                       if ((node->details->shutdown == FALSE) || (node->details->online ==
> TRUE)) {
> > +                           add_node_copy(data_set->failed, xml_op);
> > +                       }
> >
> >                        if(*on_fail < action->on_fail) {
> >                                *on_fail = action->on_fail;
> >
> >
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> > --- Andrew Beekhof <andrew at beekhof.net> wrote:
> >
> >> Thanks for the explanation, I think you're right that we shouldn't be
> >> showing these failed actions.
> >> I think we want to do it in the PE though, eg. stop them from making
> >> it into the failed_ops list in the first place.
> >>
> >> On Mon, Sep 13, 2010 at 10:37 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> >> > Hi Andrew,
> >> >
> >> > Thank you for comment.
> >> >
> >> >> I assume this is for the stonith-enabled=true case, since offline
> >> >> nodes are ignored for stonith-enabled=false.
> >> >> Once the node is shot, then its status section is erased and no failed
> >> >> actions will be shown... so why do we need this patch?
> >> >
> >> > I know that trouble information disappears when I succeeded in shooting a node.
> >> > In addition, in the case of stonith-enabled=false, I know that it is not displayed if a
> node
> >> becomes
> >> > the offline.
> >> >
> >> > (snip)
> >> > � � � � � � �
> > �if(this_node->details->online || is_set(data_set->flags,
> >> pe_flag_stonith_enabled)) {
> >> > � � � � � � � � � �
>> > �/* offline nodes run no resources...
> >> > � � � � � � � � � �
>> > � * unless stonith is enabled in which case we need to
> >> > � � � � � � � � � �
>> > � * � make sure rsc start events happen after the stonith
> >> > � � � � � � � � � �
>> > � */
> >> > � � � � � � � � � �
>> > �crm_debug_3("Processing lrm resource entries");
> >> > � � � � � � � � � �
>> > �unpack_lrm_resources(this_node, lrm_rsc, data_set);
> >> > � � � � � � � �}
> >> > � � � � � � � �);
> >> > (snip)
> >> >
> >> > But, the failed action information is displayed in crm_mon though a node is shutdown when
> it
> >> is not
> >> > necessary to shoot a node.
> >> > (The failed count of times disappears then, but the failed action stays.)
> >> >
> >> > �# srv01 was monitor error.
> >> >
> >> > Migration summary:
> >> > * Node srv04:
> >> > * Node srv02:
> >> > * Node srv01:
> >> > � prmApPostgreSQLDB1: migration-threshold=1 fail-count=1
> >> > * Node srv03:
> >> >
> >> > Failed actions:
> >> > � �prmApPostgreSQLDB1_monitor_10000 (node=srv01, call=81, rc=7,
> status=complete):
> > not running
> >> >
> >> > �# Next....srv01 was service stop.
> >> >
> >> > Migration summary: ---> The failed count of srv01 disappears
> >> > * Node srv04:
> >> > * Node srv02:
> >> > * Node srv03:
> >> >
> >> > Failed actions: ---> The failed action stays
> >> > � �prmApPostgreSQLDB1_monitor_10000 (node=srv01, call=81, rc=7,
> status=complete):
> > not running
> >> >
> >> > Our user does not expect the trouble information of the node that stopped normally.
> >> >
> >> > In the case of stonith-enabled=true, should the node that trouble happened display failed
> >> action
> >> > information till it is shot?
> >> > When the trouble information of the node that stopped normally is displayed, is not the
> user
> >> confused?
> >> >
> >> > Best Regards,
> >> > Hideo Yamauchi.
> >> >
> >> > --- Andrew Beekhof <andrew at beekhof.net> wrote:
> >> >
> >> >> 2010/9/13 �<renayama19661014 at ybb.ne.jp>:
> >> >> > Hi,
> >> >> >
> >> >> > I contribute the patch of the crm_mon command.
> >> >> >
> >> >> > A node was offline and, in the case of the shutdown, revised it not to display a trouble
> >> >> action.
> >> >> >
> >> >> > Please confirm a patch.
> >> >> > And, without a problem, please take this revision in a development version.
> >> >>
> >> >> Hmmm.
> >> >> I'm not sure about this patch.
> >> >>
> >> >> I assume this is for the stonith-enabled=true case, since offline
> >> >> nodes are ignored for stonith-enabled=false.
> >> >> Once the node is shot, then its status section is erased and no failed
> >> >> actions will be shown... so why do we need this patch?
> >> >>
> >> >> >
> >> >> >
> >> >> > diff -r 9b95463fde99 tools/crm_mon.c
> >> >> > --- a/tools/crm_mon.c � Mon Sep 13 13:07:16 2010 +0900
> >> >> > +++ b/tools/crm_mon.c � Mon Sep 13 13:07:59 2010 +0900
> >> >> > @@ -829,6 +829,7 @@
> >> >> > � � int configured_resources = 0;
> >> >> > � � int print_opts = pe_print_ncurses;
> >> >> > � � const char *quorum_votes = "unknown";
> >> >> > + � �gboolean is_failed_first_disp = TRUE;
> >> >> >
> >> >> > � � if(as_console) {
> >> >> > � � � �blank_screen();
> >> >> > @@ -989,16 +990,28 @@
> >> >> > � � }
> >> >> >
> >> >> > � � if(xml_has_children(data_set->failed)) {
> >> >> > - � � � print_as("\nFailed actions:\n");
> >> >> > � � � �xml_child_iter(data_set->failed, xml_op,
> >> >> > � � � � � � � � �
>> >> �
> >> > int val = 0;
> >> >> > + � � � � � � � � �
>> >> > �node_t *failed_node = NULL;
> >> >> > � � � � � � � � �
>> >> �
> >> > const char *id = ID(xml_op);
> >> >> > � � � � � � � � �
>> >> �
> >> > const char *last = crm_element_value(xml_op, "last_run");
> >> >> > � � � � � � � � �
>> >> �
> >> > const char *node = crm_element_value(xml_op, XML_ATTR_UNAME);
> 
=== 以下のメッセージは省略されました ===





More information about the Pacemaker mailing list