The Geoparser is a software tool that can process information from any type of file, extract geographic coordinates, and visualize locations on a map. Users who are interested in seeing a geographical representation of information or data can choose to search for locations using the Geoparser, through a search index or by uploading files from their computer. The Geoparser will parse the files and visualizes cities or latitude-longitude points on the map. After the information is parsed and points are plotted on the map, users are able to filter their results by density, or by searching a key word and applying a "facet" to the parsed information. On the map, users can click on location points to reveal more information about the location and how it is related to their search.
docker build -t nasajplmemex/geo-parser --no-cache -f Dockerfile .
docker-compose up -d
- Visit
http://localhost:8000
on your browser
GeoParser has been updated with a new easy to use Docker install, and also an example to download and run the COVID-19 literature data and view the locations. Use that example to explore and test out GeoParser on a real example and view locations from that dataset.
- Python 2.7
- pip
- Django
- Tika Python
- Install python requirements
pip install -r requirements.txt
-
Run Solr Change directory to where you cloned the project
cd Solr/solr-5.3.1/ ./bin/solr start
-
Clone lucene-geo-gazetteer repo
git clone https://github.com/chrismattmann/lucene-geo-gazetteer.git cd lucene-geo-gazetteer mvn install assembly:assembly add lucene-geo-gazetteer/src/main/bin to your PATH environment variable
make sure it is working
lucene-geo-gazetteer --help usage: lucene-geo-gazetteer -b,--build <gazetteer file> The Path to the Geonames allCountries.txt -h,--help Print this message. -i,--index <directoryPath> The path to the Lucene index directory to either create or read -s,--search <set of location names> Location names to search the Gazetteer for
-
You will now need to build a Gazetteer using the Geonames.org dataset. (1.2 GB)
cd lucene-geo-gazetteer curl -O http://download.geonames.org/export/dump/allCountries.zip unzip allCountries.zip lucene-geo-gazetteer -i geoIndex -b allCountries.txt
make sure it is working
lucene-geo-gazetteer -s Pasadena Texas [ {"Texas" : [ "Texas", "-91.92139", "18.05333" ]}, {"Pasadena" : [ "Pasadena", "-74.06446", "4.6964" ]} ]
Now start lucene-geo-gazetteer server
lucene-geo-gazetteer -server
-
Run tika server as mentioned in
https://cwiki.apache.org/confluence/display/TIKA/GeoTopicParser
on port8001
. Port can be configured via config.txt -
Make sure you can extract locations from Tika Server
curl -T /path/to/polar.geot -H "Content-Disposition: attachment; filename=polar.geot" http://localhost:8001/rmeta
You can obtain [file here] (https://raw.githubusercontent.com/chrismattmann/geotopicparser-utils/master/geotopics/polar.geot)
Output should be this
[
{
"Content-Type":"application/geotopic",
"Geographic_LATITUDE":"39.76",
"Geographic_LONGITUDE":"-98.5",
"Geographic_NAME":"United States",
"Optional_LATITUDE1":"27.33931",
"Optional_LONGITUDE1":"-108.60288",
"Optional_NAME1":"China",
"X-Parsed-By":[
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.geo.topic.GeoParser"
],
"X-TIKA:parse_time_millis":"1634",
"resourceName":"polar.geot"
}
]
-
Run Django server
python manage.py runserver
-
Open in browser http://localhost:8000/ Note : Please refer to the wiki page on this github repository which can act as a guide for you on how to use GeoParser.