-
Notifications
You must be signed in to change notification settings - Fork 211
Reuters tutorial
- Reuters tutorial
- Step 1: Talk to Solr
- Step 2: Add a results widget
- Step 3: Add a pager widget
- Step 4: Add a tagcloud widget
- Step 5: Display the current filters
- Step 6: Add a free-text widget
- Step 7: Add an autocomplete widget
- Step 8: Add a map widget
- Step 9: Add a calendar widget
- Step 10: Extra credit
In this tutorial, we'll go step-by-step through building the AJAX Solr demo site.
Before we start, we write the HTML to which the JavaScript widgets will attach themselves. In practice, this HTML will often be the non-JavaScript version of your search interface, which you now want to improve with unobtrusive JS. The demo uses jQuery and jQuery UI.
Next part of the tutorial: Now, let's talk to Solr!
This section is not needed to go through the tutorial.
If you use Chef, see this recipe for deploying Solr.
If you want to run a local instance of the Solr server used in this demo, download a Solr index of the Reuters data:
Replace the data
directory of your Solr instance with the data
directory from one of the above tarballs. Or, you can index the data yourself.
For Solr 4, you can use the configuration files distributed with AJAX Solr. If you are not using Solr 4, or want to use your own configuration files, add the following to your schema.xml
in the conf
directory of your Solr instance (for example solr-home/example/solr/collection1/conf/schema.xml
):
<field name="places" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
<field name="countryCodes" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
<field name="topics" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
<field name="organisations" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
<field name="exchanges" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
<field name="companies" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
<field name="allText" type="text_general" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
<copyField source="title" dest="allText"/>
<copyField source="text" dest="allText"/>
<copyField source="places" dest="allText"/>
<copyField source="topics" dest="allText"/>
<copyField source="companies" dest="allText"/>
<copyField source="exchanges" dest="allText"/>
In Solr > 3.5, replace the date
field definition with the following (changes type
to pdate
):
<field name="date" type="pdate" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
In Solr > 3.5, add:
<field name="dateline" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
In Solr 4, add an optional copyField
for dateline
:
<copyField source="dateline" dest="allText"/>
In this example you'll be using two copies of Solr. One copy, solr-4x
, will be a modern Solr instance with the schema.xml
changes described above, and another copy, solr-js-r824380
, will be an older Solr instance.
These partial instructions were based on the SolrJS wiki but have been modified. The commands below will download the data from the Reuters-21578 Text Categorization Collection and checkout old SolrJS code. The instructions don't yet include adding the Reuters data to the Solr index, because those commands have not been tested. A starting point for that follows the commands below.
svn checkout -r 824380 http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/ solr-js-r824380
cd solr-js-r824380 # remember this as the Top Level Solr Directory
ant dist # creates jar files used in later step
If you get an error message about get-nni
, remove any reference to nni-1.0.0.jar
from contrib/clustering/build.xml
and try again.
This old Reuters injector code is based on Solr 3x, so in order to allow us to inject data into Solr 4x, we need to make a slight adjustment in the code.
Backup and edit the file solr-js-r824380/client/javascript/example/reuters/importer/java/org/apache/solr/solrjs/ReutersService.java
, we'll add one line of Java code, add line 107 shown below:
104 public ReutersService(String solrUrl) {
105 try {
106 this.solrServer = new CommonsHttpSolrServer(solrUrl);
107 ((CommonsHttpSolrServer) this.solrServer).setParser( new org.apache.solr.client.solrj.impl.XMLResponseParser() ); // Remove "javabin" error
108 this.solrServer.ping();
109 } catch (Exception e) {
110 throw new RuntimeException("unable to connect to solr server: " + solrUrl, e );
111 }
112 }
This code change will be recompiled later on when you run the next ant command.
Return back to the top level directory solr-js-r824380
to download and import the data:
# Run from solr-js-r824380 directory
cd client/javascript/example/reuters/testdata
curl -O http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
tar xf reuters21578.tar.gz
cd ../../..
Now you should get back into the directory solr-js-r824380/client/javascript
.
At this point you should check whether your Solr 4x instance is running on your local machine or not (localhost
), and on the default Solr port of 8983
, and if you need to set the collection name. In your running Solr 4x directory, if you're using the Solr 4x default core of collection1
then you don't need to add it to the URL and the old 3x Reuters code should work fine. However, if you're not sure, or would like to specify a different collection/core name, you can edit ant build.xml
file to change it. (default collections are currently defined with the defaultCoreName
attribute in solr-4x/example/solr/solr.xml
but this may change or be removed in future versions)
To change the collection name (or the machine name or TCP/IP port number of your running Solr 4x server), backup and edit the file solr-js-r824380/client/javascript/build.xml
to look like this, where my_collection
is the name you want to use:
112 <java classname="org.apache.solr.solrjs.ReutersService" fork="true" dir="example/reuters/testdata">
113 <arg value="http://localhost:8983/solr/my_collection" />
We're almost ready to inject data. Make sure your Solr 4x instance is running with the modified schema.xml, and make sure the machine name, port and collection name (if not the default) have been changed in build.xml.
Issue the command:
# Run from solr-js-r824380/client/javascript directory
ant reuters-import
If you get errors, switch to the Solr 4x window and look at the errors there. The most common mistake is not having a field defined in the 4x schema.xml
.
If you get an error about "javabin", make sure you've made the change to ReutersService.java
discussed above. Setting the server parser to XMLResponseParser
allows Solr 3x clients to talk to Solr 4x servers!
The main class that's being run is solr-js-r824380/client/javascript/example/reuters/importer/java/org/apache/solr/solrjs/ReutersService.java
, which defines an importer, which is then run by ant reuters-import
command. The above instructions have just got the data setup.
(Attribution: The demo site is based in part on the SolrJS demo site.)