-
Notifications
You must be signed in to change notification settings - Fork 43
Testing & Test data
Searchdaimon have collected some data sets useful for testing. More information is available at http://www.opentestsearch.com/test-sets/ . We normally uses the "English Wikipedia as html files" for quick tests, and both the "English Wikipedia as html files", "Enron files" and "File samples" if one need to do a more thorough test.
The attack.pl is a script for running a list of queries on the ES. It is normally used before and after a change to see what impact the change had on search results. It takes a host name and a file with queries.
Some queries that match the test collection are available in the meta/testqueries.txt file ( More information about why we use some of this queries is available at http://www.opentestsearch.com/example-queries/ ).
For example:
perl perl/attack.pl localhost meta/testqueries.txt
This will use each line in the meta/testqueries.txt as a query, and do a search on the localhost for thus queries. The output will look something like this:
----------------------------------------------------------------------- | Query | Time | Hits | T | ----------------------------------------------------------------------- | "to be or not to be" | 0.12568 | 0 | 0 | | desert sky report | 0.34322 | 41 | 0 | | Doug Birdsall party | 0.12558 | 0 | 0 | | gas base contract | 0.30690 | 112 | 0 | | "State and local officials" | 0.12390 | 0 | 0 |
Argument | |
-u user | User. Username and password separated with a “:”. For example –u demo:12345 |
Often developers want to do improvements that are not supposed to change which documents that are returned for queries. For example an improvement that will speed up disk i/o shouldn't changes which documents that are returned for any queries.
If the number of document has changed after your improvement your change may have some unintended consequences you are not aware of. It is therefore recommended that you record the numbers of hits for a set of queries, do the change, and then see if any search results has changed.
You can use the -s (silent) switch and pipe the results to a file to help check for thus errors:
perl perl/attack.pl -s localhost meta/testqueries.txt > ~/attach.pre.txt
Do changes, recompile etc.
perl perl/attack.pl -s localhost meta/testqueries.txt > ~/attach.post.txt
Diff the two files:
diff -u ~/attach.pre.txt ~/attach.post.txt
If you don't see any changes the improvement probably don't interfere with the search results.
However if you get an answer like this you must investigate:
--- /home/boitho/attach.pre.txt 2013-07-10 20:39:52.000000000 +0200 +++ /home/boitho/attach.post.txt 2013-07-10 21:02:13.000000000 +0200 @@ -36,7 +36,7 @@ | the the the the the the the the the the | 0.00000 | 5,765 | 0 | | the | 0.00000 | 5,765 | 0 | | of | 0.00000 | 5,775 | 0 | -| and | 0.00000 | 5,751 | 0 | +| and | 0.00000 | 5,750 | 0 | | to | 0.00000 | 5,749 | 0 | | in | 0.00000 | 5,735 | 0 | | that | 0.00000 | 5,584 | 0 | @@ -45,7 +45,7 @@ | he | 0.00000 | 4,116 | 0 | | for | 0.00000 | 5,704 | 0 | | it | 0.00000 | 5,578 | 0 | -| with | 0.00000 | 5,669 | 0 | +| with | 0.00000 | Error | 0 | | as | 0.00000 | 5,656 | 0 | | his | 0.00000 | 4,168 | 0 | | on | 0.00000 | 5,696 | 0 |
Here we see that all queries returned the same number of results, except that searching for "and" returned 1 less results and the search for "with" failed altogether. This means that your improvement did change the search results, if that was not what you intended you must investigate.
It is of course no problem to make improvements that changes the search results. That is totally fine and quite common, just be sure that that was what your intended. A simple speed improvement should not change the search results, and if it does it indicates that something went wrong. On the other hand a fix to a file filter that are able to extract more text from a file is great, and will probably changes the number of documents returned for many queries.