[Pacemaker] Problem with configuring stonith rcd_serial

Dejan Muhamedagic dejanmm at fastmail.fm
Wed Nov 3 18:21:31 UTC 2010


Hi,

On Wed, Nov 03, 2010 at 05:08:50PM +0100, Eberhard Kuemmerle wrote:
> On 3 Nov 2010 11:06, Dejan Muhamedagic wrote:
> > On Tue, Nov 02, 2010 at 06:45:08PM +0100, Dejan Muhamedagic wrote:
> >
> >> On Tue, Nov 02, 2010 at 04:26:40PM +0100, Eberhard Kuemmerle wrote:
> >>
> >>> On 2 Nov 2010 16:15 02.11.2010 16:18, Eberhard Kuemmerle wrote:
> >>>
> >>>> Hi,
> >>>> here is what you requested:
> >>>>
> >>>> TEST 1:
> >>>> stonith -t rcd_serial -p "test /dev/ttyS0 rts 2000" test
> >>>> ** (process:2928): DEBUG: rcd_serial_set_config:called
> >>>> Alarm clock
> >>>> # echo $?
> >>>> 142
> >>>>
> >>>> TEST 2:
> >>>> stonith -t rcd_serial hostlist="node2" ttydev="/dev/ttyS0" dtr_rts="rts"
> >>>> msduration="2000" -S
> >>>> ** (process:6851): DEBUG: rcd_serial_set_config:called
> >>>> stonith: rcd_serial device OK.
> >>>> # echo $?
> >>>> 0
> >>>>
> >>>> TEST 3:
> >>>> stonith -t rcd_serial hostlist="node2" ttydev="/dev/ttyS0" dtr_rts="rts"
> >>>> msduration="2000" -T reset node2
> >>>> ** (process:8142): DEBUG: rcd_serial_set_config:called
> >>>> Alarm clock
> >>>> # echo $?
> >>>> 142
> >>>>
> >>>> TEST 1 as well as TEST 2 caused a reboot of node2!
> >>>>
> >>>>
> >>> SORRY, that's wrong!
> >>> I wanted to say:
> >>> TEST 1 as well as TEST 3 caused a reboot of node2!
> >>>
> >> Well, then there seems to be a problem with rcd_serial.
> >> According to the exit code (142 = 128 + 14), it seems like the
> >> plugin instance gets killed by the ALRM signal. The signal
> >> should've been caught, but there is something wrong with the
> >> registration of the signal handler.
> >>
> >> Looks like this fails unexpectedly:
> >>
> >> #if !defined(HAVE_POSIX_SIGNALS)
> >>
> >> because our autoconf doesn't do tests for signal implementation.
> >>
> >> Can you please try the attached patch? You'll have to rebuild
> >> the package for that.
> >>
> > If you've wondered which patch, here's finally one.
> >
> > Thanks,
> >
> > Dejan
> >
> > -------------- next part --------------
> > A non-text attachment was scrubbed...
> > Name: have-posix-signals.patch
> > Type: text/x-diff
> > Size: 1032 bytes
> > Desc: not available
> > URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101103/a5cd5005/attachment-0001.bin>
> >
> Wow, success!
> 
> With your patch and additionally replacing 'dtr|rts' by 'dtr_rts' in
> rcd_serial.c, everything works fine!!!

Great.

> There are still some strange entries in /var/log/messages, but the
> STONITH action is performed correctly!
> 
> Just for your information, here are the messages:
> 
> Nov  3 16:41:50 node2 pengine: [5327]: WARN: stage6: Scheduling Node
> node1 for STONITH
> Nov  3 16:41:50 node2 stonith-ng: [5323]: WARN: parse_host_line: Could
> not parse (0 2): ** (process:8669): DEBUG: rcd_serial_set_config:called
> Nov  3 16:41:50 node2 stonith-ng: [5323]: WARN: parse_host_line: Could
> not parse (3 18): (process:8669): DEBUG: rcd_serial_set_config:called
> Nov  3 16:41:50 node2 stonith-ng: [5323]: WARN: parse_host_line: Could
> not parse (0 0):
> Nov  3 16:41:50 node2 pengine: [5327]: WARN: process_pe_message:
> Transition 102: WARNINGs found during PE processing. PEngine Input
> stored in: /var/lib/pengine/pe-warn-0.bz2
> Nov  3 16:41:52 node2 crmd: [5328]: notice: crmd_peer_update: Status
> update: Client node1/crmd now has status [offline] (DC=true)
> Nov  3 16:41:52 node2 crmd: [5328]: notice: run_graph: Transition 102
> (Complete=11, Pending=0, Fired=0, Skipped=23, Incomplete=11,
> Source=/var/lib/pengine/pe-warn-0.bz2): Stopped
> Nov  3 16:41:52 node2 lrmd: [5325]: ERROR: crm_abort: crm_strdup_fn:
> Triggered assert at utils.c:964 : src != NULL
> Nov  3 16:41:52 node2 lrmd: [5325]: ERROR: crm_strdup_fn: Could not
> perform copy at st_client.c:514 (stonith_api_device_metadata)

I guess that these two were fixed in the meantime. Can you post
output of "crmd version".

Thanks,

Dejan

> Nov  3 16:41:52 node2 lrmd: [5325]: WARN: stonith_api_device_metadata:
> no short description in rcd_serial's metadata.
> 
> Thank you very much!
>   Eberhard
> 
> 
> 
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



More information about the Pacemaker mailing list