[Pacemaker] CTS outputs Single search timed out.

Thu Jan 27 08:03:10 UTC 2011

2011/1/27 nozawat <nozawat at gmail.com>:
> Hi
>
>  I was able to complete CTS.
>  A bad point is the following points.
>  1)Python after 2.5 is necessary.
>     However, I used 2.4 of RHEL5.5.
>     Therefore I carried out CTS with Python 2.6.5 of RHEL6.0.

Ah!

> 2)The following environment variables are necessary.
>     * cluster_log=/share/ha/logs/ha-log-local7
>     * cluster_hosts="cts0101 cts0102"

CTS doesn't do anything with environment variables - unless you're
using my cts-run() function from the release testing page.

It is also sufficient to set them on the command line (as you did) with:
    --nodes "cts0201 cts0202" and --logfile /share/ha/logs/ha-log-local7

>  3)It started to add stonith-enabled to cib-bootstrap-options to let you
> read cib.xml.
>     The following errors occur unless they do so it.
> -----
> Jan 27 12:02:39 BadNews: Jan 27 12:01:17 cts0201 pengine: [14630]: ERROR:
> unpack_resources: Resource start-up disabled since no STONITH resources have
> been defined
> Jan 27 12:02:39 BadNews: Jan 27 12:01:17 cts0201 pengine: [14630]: ERROR:
> unpack_resources: Either configure some or disable STONITH with the
> stonith-enabled option
> Jan 27 12:02:39 BadNews: Jan 27 12:01:17 cts0201 pengine: [14630]: ERROR:
> unpack_resources: NOTE: Clusters with shared data need STONITH to ensure
> data integrity
> ----

That looks pretty normal, we don't test with --stonith no

> The log is as follows when I carried out CTS.
> ---
> [buildbot at bbs02 /usr/share/pacemaker/tests/cts]$ python CTSlab.py --nodes
> "cts0201 cts0202" --at-boot 1 --stack corosync --stonith no --logfile
> /share/ha/logs/ha-log-local7 --syslog-facility local7 --cib-filename
> /share/ha/cib.xml 10
> Jan 27 15:48:41 Random seed is: 1296110921
> Jan 27 15:48:41 >>>>>>>>>>>>>>>> BEGINNING 10 TESTS
> Jan 27 15:48:41 Stack:            corosync (flatiron)
> Jan 27 15:48:41 Schema:           pacemaker-1.0
> Jan 27 15:48:41 Scenario:         Random Test Execution
> Jan 27 15:48:41 Random Seed:      1296110921
> Jan 27 15:48:41 System log files: /share/ha/logs/ha-log-local7
> Jan 27 15:48:41 Cluster nodes:
> Jan 27 15:48:41 * cts0201
> Jan 27 15:48:41 * cts0202
> Jan 27 15:48:53 Testing for syslog logs
> Jan 27 15:48:53 Testing for remote logs
> Jan 27 15:49:31 Continuing with remote-based log reader
> Jan 27 15:49:42 Stopping Cluster Manager on all nodes
> Jan 27 15:49:42 Starting Cluster Manager on all nodes.
> Jan 27 15:49:42 Starting crm-flatiron on node cts0201
> Jan 27 15:51:07 Starting crm-flatiron on node cts0202
> Jan 27 15:52:40 Running test SimulStop              (cts0202)     [  1]
> Jan 27 15:53:39 Running test NearQuorumPoint        (cts0202)     [  2]
> Jan 27 15:55:34 Running test ComponentFail          (cts0201)     [  3]
> Jan 27 15:57:40 Running test Reattach               (cts0202)     [  4]
> Jan 27 16:02:35 Running test SimulStop              (cts0201)     [  5]
> Jan 27 16:03:26 Running test SpecialTest1           (cts0201)     [  6]
> Jan 27 16:06:19 Running test ComponentFail          (cts0201)     [  7]
> Jan 27 16:07:17 Running test SpecialTest1           (cts0201)     [  8]
> Jan 27 16:10:47 Running test ComponentFail          (cts0201)     [  9]
> Jan 27 16:11:42 BadNews: Jan 27 16:11:03 cts0201 crmd: [23399]: ERROR:
> stonithd_op_result_ready: not signed on

Was there a bug here?
Otherwise it looks good, glad you were able to get it going in the end.

> Jan 27 16:11:45 Running test ResourceRecover        (cts0202)     [ 10]
> Jan 27 16:11:46 No active resources on cts0202
> Jan 27 16:12:03 Stopping Cluster Manager on all nodes
> Jan 27 16:12:03 Stopping crm-flatiron on node cts0201
> Jan 27 16:12:26 Stopping crm-flatiron on node cts0202
> Jan 27 16:13:09 ****************
> Jan 27 16:13:09 Overall Results:{'failure': 0, 'skipped': 0, 'success': 10,
> 'BadNews': 1}
> Jan 27 16:13:09 ****************
> Jan 27 16:13:09 Test Summary
> Jan 27 16:13:09 Test Flip:                {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 0}
> Jan 27 16:13:09 Test Restart:             {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 0}
> Jan 27 16:13:09 Test StartOnebyOne:       {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 0}
> Jan 27 16:13:09 Test SimulStart:          {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 0}
> Jan 27 16:13:09 Test SimulStop:           {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 2}
> Jan 27 16:13:09 Test StopOnebyOne:        {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 0}
> Jan 27 16:13:09 Test RestartOnebyOne:     {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 0}
> Jan 27 16:13:09 Test PartialStart:        {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 0}
> Jan 27 16:13:09 Test Standby:             {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 0}
> Jan 27 16:13:09 Test ResourceRecover:     {'auditfail': 0, 'failure': 0,
> 'skipped': 1, 'calls': 1}
> Jan 27 16:13:09 Test ComponentFail:       {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 3}
> Jan 27 16:13:09 Test Reattach:            {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 1}
> Jan 27 16:13:09 Test SpecialTest1:        {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 2}
> Jan 27 16:13:09 Test NearQuorumPoint:     {'auditfail': 0, 'failure': 0,
> 'skipped': 0, 'calls': 1}
> Jan 27 16:13:09 <<<<<<<<<<<<<<<< TESTS COMPLETED
> -----
>
>  The URL to take into account is as follows.
>  http://www.clusterlabs.org/wiki/Release_Testing
>
> Regards,
> Tomo
>
> 2011年1月26日20:01 nozawat <nozawat at gmail.com>:
>>
>> Hi Andrew
>>
>>  Where is filename of cts_log_watcher.py set?
>>  cts_log_watcher is made by /tmp, but filename of this inside seems not to
>> be changed by /var/log/messages.
>>  Or filename seems not to be handed by CTSlab.py.
>>
>> Regards,
>> Tomo
>>
>> 2011年1月22日11:15 nozawat <nozawat at gmail.com>:
>>>
>>> Hi
>>>
>>>  Thank you for your reply.
>>>  I stopped a script in "CTL+C" after "Audit LogAudit FAILED" was output.
>>>  * bbs01-console.log -> central server console log
>>>  * ha-log-local7-bbs01 -> central server
>>>  * ha-log-local7-cts0101 -> cts server 1
>>>  * ha-log-local7-cts0102 -> cts server 2
>>>
>>>  The real file name of the server log is ha-log-local7.
>>>  I renamed a file name to send it by an email.
>>>  They are made with all servers by /share/ha/logs subordinates.
>>>
>>>  BTW, a file is made in /tmp.
>>>  Don't you have any problem in authority below?
>>> ---
>>> [11:14:23][root at bbs01 ~]$ ll /tmp
>>> -rw-r--r-- 1 root root 1612  1月 22 10:44 cts_log_watcher.py
>>> [11:12:39][root at cts0101 ~]$ ll /tmp
>>> -rw-r--r-- 1 root root  1612  1月 22 10:44 cts_log_watcher.py
>>> [11:13:36][root at cts0102 ~]$ ll /tmp
>>> -rw-r--r-- 1 root root 1612  1月 22 10:44 cts_log_watcher.py
>>> ---
>>>
>>> Regards,
>>> Tomo
>>>
>>> 2011/1/22 Andrew Beekhof <andrew at beekhof.net>
>>>>
>>>> On Fri, Jan 21, 2011 at 4:38 PM, nozawat <nozawat at gmail.com> wrote:
>>>> > Hi
>>>> >
>>>> >  Thank you for your reply.
>>>> >  I logging with central server and both running CTS server.
>>>> >  A test message is output by both central server and running CTS
>>>> > server.
>>>>
>>>> Can we see it please?
>>>>
>>>> >
>>>> > Regards,
>>>> > Tomo
>>>> >
>>>> > 2011/1/21 Andrew Beekhof <andrew at beekhof.net>
>>>> >>
>>>> >> On Fri, Jan 21, 2011 at 6:03 AM, nozawat <nozawat at gmail.com> wrote:
>>>> >> > Hi
>>>> >> >
>>>> >> >  I ran CTS in the following environment.
>>>> >> >  * OS:RHEL5.5-x86_64
>>>> >> >  * pacemaker-1.0.9.1-1.15.el5
>>>> >> >  * TDN(bbs01)
>>>> >> >  * TNNs(cts0101 cts0102)
>>>> >> >
>>>> >> >  Probably it is a phenomenon like the following.
>>>> >> >  http://www.gossamer-threads.com/lists/linuxha/pacemaker/69322
>>>> >> >
>>>> >> >  SSH login without password -> OK.
>>>> >> >  Syslog Message transfer by syslog-ng -> OK.
>>>> >>
>>>> >> You're logging to a central server?  The same server you're running
>>>> >> CTS
>>>> >> on?
>>>> >> If so, what is the contents of /share/ha/logs/ha-log-local7 on that
>>>> >> machine?  Because that is where CTS is looking.
>>>> >>
>>>> >> >
>>>> >> > -------
>>>> >> > $ python /usr/share/pacemaker/tests/cts/CTSlab.py --nodes "cts0101
>>>> >> > cts0102"
>>>> >> > --at-boot 1 --stack heartbeat --stonith no --logfile
>>>> >> > /share/ha/logs/ha-log-local7 --syslog-facility local7 1
>>>> >> > Jan 21 13:23:08 Random seed is: 1295583788
>>>> >> > Jan 21 13:23:08 >>>>>>>>>>>>>>>> BEGINNING 1 TESTS
>>>> >> > Jan 21 13:23:08 Stack:            heartbeat
>>>> >> > Jan 21 13:23:08 Schema:           pacemaker-1.0
>>>> >> > Jan 21 13:23:08 Scenario:         Random Test Execution
>>>> >> > Jan 21 13:23:08 Random Seed:      1295583788
>>>> >> > Jan 21 13:23:08 System log files: /share/ha/logs/ha-log-local7
>>>> >> > Jan 21 13:23:08 Cluster nodes:
>>>> >> > Jan 21 13:23:08 * cts0101
>>>> >> > Jan 21 13:23:08 * cts0102
>>>> >> > Jan 21 13:23:12 Testing for syslog logs
>>>> >> > Jan 21 13:23:12 Testing for remote logs
>>>> >> > Jan 21 13:24:16 Restarting logging on: ['cts0101', 'cts0102']
>>>> >> > Jan 21 13:25:49 Restarting logging on: ['cts0101', 'cts0102']
>>>> >> > Jan 21 13:28:21 Restarting logging on: ['cts0101', 'cts0102']
>>>> >> > Jan 21 13:31:54 Restarting logging on: ['cts0101', 'cts0102']
>>>> >> > Jan 21 13:35:54 ERROR: Cluster logging unrecoverable.
>>>> >> > Jan 21 13:35:54 Audit LogAudit FAILED.
>>>> >> > -----
>>>> >> >
>>>> >> >  I run it in heartbeat, but a similar error occurs in corosync.
>>>> >> >  I become the error in "Single search timed out" in the log and
>>>> >> > seem to
>>>> >> > retry.
>>>> >> > -----
>>>> >> > Jan 21 13:23:11 bbs01 CTS: debug: Audit DiskspaceAudit passed.
>>>> >> > Jan 21 13:23:12 bbs01 CTS: Testing for syslog logs
>>>> >> > Jan 21 13:23:12 bbs01 CTS: Testing for remote logs
>>>> >> > Jan 21 13:23:12 bbs01 CTS: debug: lw:
>>>> >> > cts0101:/share/ha/logs/ha-log-local7:
>>>> >> > Installing /tmp/cts_log_watcher.py on cts0101
>>>> >> > Jan 21 13:23:12 bbs01 CTS: debug: lw:
>>>> >> > cts0102:/share/ha/logs/ha-log-local7:
>>>> >> > Installing /tmp/cts_log_watcher.py on cts0102
>>>> >> > Jan 21 13:23:13 cts0102 logger: Test message from cts0102
>>>> >> > Jan 21 13:23:13 cts0101 logger: Test message from cts0101
>>>> >> > Jan 21 13:23:44 bbs01 CTS: debug: lw: LogAudit: Single search timed
>>>> >> > out:
>>>> >> > timeout=30, start=1295583793, limit=1295583824, now=1295583824
>>>> >> > Jan 21 13:24:16 bbs01 CTS: debug: lw: LogAudit: Single search timed
>>>> >> > out:
>>>> >> > timeout=30, start=1295583824, limit=1295583855, now=1295583856
>>>> >> > Jan 21 13:24:16 bbs01 CTS: Restarting logging on: ['cts0101',
>>>> >> > 'cts0102']
>>>> >> > Jan 21 13:24:16 bbs01 CTS: debug: cmd: async: target=cts0101,
>>>> >> > rc=22203:
>>>> >> > /etc/init.d/syslog-ng restart 2>&1 > /dev/null
>>>> >> > Jan 21 13:24:16 bbs01 CTS: debug: cmd: async: target=cts0102,
>>>> >> > rc=22204:
>>>> >> > /etc/init.d/syslog-ng restart 2>&1 > /dev/null
>>>> >> > Jan 21 13:25:17 cts0102 logger: Test message from cts0102
>>>> >> > Jan 21 13:25:17 cts0101 logger: Test message from cts0101
>>>> >> > -----
>>>> >> >
>>>> >> >  The test case seems to be carried out after this error.
>>>> >> >  However, the script is finished by an error. It is because "Audit
>>>> >> > LogAudit
>>>> >> > FAILED" occurs.
>>>> >> >  Is it right that how becomes the result of the CTS?
>>>> >> >
>>>> >> > Regards,
>>>> >> > Tomo
>>>> >> >
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> >> >
>>>> >> > Project Home: http://www.clusterlabs.org
>>>> >> > Getting started:
>>>> >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> >> > Bugs:
>>>> >> >
>>>> >> >
>>>> >> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>> >> >
>>>> >> >
>>>> >>
>>>> >> _______________________________________________
>>>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> >>
>>>> >> Project Home: http://www.clusterlabs.org
>>>> >> Getting started:
>>>> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> >> Bugs:
>>>> >>
>>>> >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> >
>>>> > Project Home: http://www.clusterlabs.org
>>>> > Getting started:
>>>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> > Bugs:
>>>> >
>>>> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>> >
>>>> >
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs:
>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>