[Pacemaker] A patch of crm_mon for the trouble actions.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Mon Sep 13 08:37:15 UTC 2010


Hi Andrew,

Thank you for comment.

> I assume this is for the stonith-enabled=true case, since offline
> nodes are ignored for stonith-enabled=false.
> Once the node is shot, then its status section is erased and no failed
> actions will be shown... so why do we need this patch?

I know that trouble information disappears when I succeeded in shooting a node.
In addition, in the case of stonith-enabled=false, I know that it is not displayed if a node becomes
the offline.

(snip)
		if(this_node->details->online || is_set(data_set->flags, pe_flag_stonith_enabled)) {
			/* offline nodes run no resources...
			 * unless stonith is enabled in which case we need to
			 *   make sure rsc start events happen after the stonith
			 */
			crm_debug_3("Processing lrm resource entries");
			unpack_lrm_resources(this_node, lrm_rsc, data_set);
		}
		);
(snip)

But, the failed action information is displayed in crm_mon though a node is shutdown when it is not
necessary to shoot a node.
(The failed count of times disappears then, but the failed action stays.)

 # srv01 was monitor error.

Migration summary: 
* Node srv04:  
* Node srv02:  
* Node srv01:  
   prmApPostgreSQLDB1: migration-threshold=1 fail-count=1 
* Node srv03:  
 
Failed actions: 
    prmApPostgreSQLDB1_monitor_10000 (node=srv01, call=81, rc=7, status=complete): not running

 # Next....srv01 was service stop.

Migration summary: ---> The failed count of srv01 disappears
* Node srv04:  
* Node srv02:  
* Node srv03:  
 
Failed actions: ---> The failed action stays
    prmApPostgreSQLDB1_monitor_10000 (node=srv01, call=81, rc=7, status=complete): not running

Our user does not expect the trouble information of the node that stopped normally.

In the case of stonith-enabled=true, should the node that trouble happened display failed action
information till it is shot?
When the trouble information of the node that stopped normally is displayed, is not the user confused?

Best Regards,
Hideo Yamauchi.

--- Andrew Beekhof <andrew at beekhof.net> wrote:

> 2010/9/13  <renayama19661014 at ybb.ne.jp>:
> > Hi,
> >
> > I contribute the patch of the crm_mon command.
> >
> > A node was offline and, in the case of the shutdown, revised it not to display a trouble
> action.
> >
> > Please confirm a patch.
> > And, without a problem, please take this revision in a development version.
> 
> Hmmm.
> I'm not sure about this patch.
> 
> I assume this is for the stonith-enabled=true case, since offline
> nodes are ignored for stonith-enabled=false.
> Once the node is shot, then its status section is erased and no failed
> actions will be shown... so why do we need this patch?
> 
> >
> >
> > diff -r 9b95463fde99 tools/crm_mon.c
> > --- a/tools/crm_mon.c � Mon Sep 13 13:07:16 2010 +0900
> > +++ b/tools/crm_mon.c � Mon Sep 13 13:07:59 2010 +0900
> > @@ -829,6 +829,7 @@
> > � � int configured_resources = 0;
> > � � int print_opts = pe_print_ncurses;
> > � � const char *quorum_votes = "unknown";
> > + � �gboolean is_failed_first_disp = TRUE;
> >
> > � � if(as_console) {
> > � � � �blank_screen();
> > @@ -989,16 +990,28 @@
> > � � }
> >
> > � � if(xml_has_children(data_set->failed)) {
> > - � � � print_as("\nFailed actions:\n");
> > � � � �xml_child_iter(data_set->failed, xml_op,
> > � � � � � � � � � � �
int val = 0;
> > + � � � � � � � � � �
�node_t *failed_node = NULL;
> > � � � � � � � � � � �
const char *id = ID(xml_op);
> > � � � � � � � � � � �
const char *last = crm_element_value(xml_op, "last_run");
> > � � � � � � � � � � �
const char *node = crm_element_value(xml_op, XML_ATTR_UNAME);
> > � � � � � � � � � � �
const char *call = crm_element_value(xml_op, XML_LRM_ATTR_CALLID);
> > � � � � � � � � � � �
const char *rc � = crm_element_value(xml_op, XML_LRM_ATTR_RC);
> > � � � � � � � � � � �
const char *status = crm_element_value(xml_op, XML_LRM_ATTR_OPSTATUS);
> > -
> > +
> > + � � � � � � � � � �
�failed_node = pe_find_node(data_set->nodes, node);
> > + � � � � � � � � � �
� if (failed_node != NULL) {
> > + � � � � � � � � � �
� � �if ((failed_node->details->shutdown == TRUE) &&
> (failed_node->details->online ==
> > FALSE)) {
> > + � � � � � � � � � �
� � � � �continue;
> > + � � � � � � � � � �
� � �}
> > + � � � � � � � � � �
�}
> > +
> > + � � � � � � � � � �
�if (is_failed_first_disp){
> > + � � � � � � � � � �
� � �is_failed_first_disp = FALSE;
> > + � � � � � � � � � �
� � �print_as("\nFailed actions:\n");
> > + � � � � � � � � � �
�}
> > +
> > � � � � � � � � � � �
val = crm_parse_int(status, "0");
> > � � � � � � � � � � �
print_as(" � �%s (node=%s, call=%s, rc=%s, status=%s",
> > � � � � � � � � � � �
� � � � �id, node, call, rc, op_status2text(val));
> >
> >
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 





More information about the Pacemaker mailing list