This repository has been archived by the owner on May 24, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
carrot2/folder2index
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
folder2index ------------ Converts PDF, TXT or HTML documents to a Lucene index (for use with Carrot2 Clustering Workbench) Quick usage guide ----------------- - install Apache Maven. - run: mvn clean package - cd target - prepare a folder FOO with your PDF, HTML or plain text files. Prepare an empty folder BAR for the index. - run: java -jar folder2index-0.0.2.jar --folder FOO --index BAR --use-tika The index will be created. Download and open Carrot2 Workbench. http://project.carrot2.org/download.html Select Lucene as the document source and pick the correct fields for the title, content and URL (pick file path as the URL field). http://download.carrot2.org/head/manual/index.html#section.getting-started.lucene Select other input options (how many results to cluster, query or *:*) and run your clustering.
About
Creates a Lucene index out of files from a local folder
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published