This code was developed and tested in the Linux environment (Red Hat Enterprise Linux 5). # Requirements Maven 3.x - (to compile most of the repositories) Maven 2.x: (to compile the PwaLucene project) * download from http://maven.apache.org/download.html * add bin directory to PATH (you need a version of maven 2 in order to compile the project PWALucene Apache Ant: * yum install ant * (see more at http://ant.apache.org/manual/install.html) install Subversion: * yum install subversion * (see more at http://subversion.apache.org/packages.html) Java SE 6.x (at least): (you will need a version of JAVA 6 to compile the project PWALucene) * export JAVA\_HOME=/usr/java/default * (see more at http://www.oracle.com/technetwork/java/javase/downloads/index.html) Apache Tomcat (5.5 at least): * (see http://tomcat.apache.org/download-55.cgi) # Step-by-step Checkout Hadoop (branch-0.14): * `git clone -b branch-0.14 https://github.com/arquivo/hadoop-common` Install Hadoop: (compile with JAVA 8 and maven 3) * `cd hadoop-common` * `mvn install`
This version of Hadoop (http://hadoop.apache.org/) must be used for all mapreduce processing.
Checkout PwaLucene + PwaArchiveAccess:
cd pwa-technologies/PwaLucene
mvn install
Install PwaArchiveAccess:cd pwa-technologies/PwaArchive-access
(compile with JAVA 8 and maven 3)
mvn install
mvn install
pwa-technologies/PwaArchive-access/projects/nutchwax/nutchwax-job/target/nutchwax-job-0.11.0-SNAPSHOT.jar
pwa-technologies/PwaArchive-access/projects/nutchwax/nutchwax-webapp/target/nutchwax-webapp-0.11.0-SNAPSHOT.war
pwa-technologies/PwaArchive-access/projects/wayback/wayback-webapp/target/wayback-1.2.1.war
pwa-technologies/PwaLucene/target/pwalucene-1.0.0-SNAPSHOT.jar
cd pwa-technologies/PwaArchive-access
ln -s ../../projects/nutchwax/nutchwax-thirdparty/nutch/ projects/nutch-trec/
This is only necessary if you will use the TREC datasets for tests.