This repository contains materials for the Irchel Geoparser workshop.
For more information about the Irchel Geoparser, visit the project website: geoparser.app
Please follow the instructions below to set up your environment for the workshop.
First, verify that you have Python installed on your system. Open a terminal (Command Prompt, PowerShell, Terminal, etc.) and run:
-
Windows:
python --version
-
macOS/Linux:
python3 --version
This command should output the version of Python installed. For this workshop, you need Python 3.9 or higher but not higher than 3.12.
Important: We recommend having Python installed directly on your system, rather than using Anaconda or other virtual environments. This will ensure compatibility with the workshop instructions and make it easier for us to assist you in case of any issues.
- Go to the Python Downloads page.
- Download the latest supported version (up to Python 3.12).
- Do not download versions higher than 3.12, as they are not supported yet.
- Run the installer:
- Windows Users:
- During installation, make sure to check the box that says "Add Python to PATH".
- macOS/Linux Users:
- Follow the standard installation procedure for your system.
- Windows Users:
After installation, verify the installation:
-
Windows:
python --version
-
macOS/Linux:
python3 --version
You should see the installed Python version in the output.
Create a directory for the workshop files and navigate to it.
-
Windows (Command Prompt):
Open Command Prompt and run:
mkdir %USERPROFILE%\geoparser-workshop
cd %USERPROFILE%\geoparser-workshop
-
Windows (PowerShell):
Open PowerShell and run:
mkdir $Env:UserProfile\geoparser-workshop
cd $Env:UserProfile\geoparser-workshop
-
macOS/Linux (Terminal):
Open Terminal and run:
mkdir ~/geoparser-workshop
cd ~/geoparser-workshop
We'll create a virtual environment to isolate the workshop's dependencies and avoid conflicts with other installations on your system.
-
Windows:
python -m venv geoparser-env
-
macOS/Linux:
python3 -m venv geoparser-env
This command creates a virtual environment named geoparser-env
in your current directory (geoparser-workshop
).
-
Windows:
geoparser-env\Scripts\activate
-
macOS/Linux:
source geoparser-env/bin/activate
You should now see (geoparser-env)
at the beginning of your command prompt, indicating that the virtual environment is active.
With the virtual environment activated, you can now install the necessary Python packages.
pip install geoparser
Verify the installation and version:
pip show geoparser
Ensure that the version of geoparser
is 0.2.0. If an older version is installed or you encounter issues, please reach out to us.
We will use Jupyter for running the notebooks and Folium for mapping:
pip install jupyter folium
Even if you have Jupyter installed elsewhere, it's important to install it in this virtual environment.
In the tutorial, we will use two spaCy models for English texts.
-
Install the small model:
python -m spacy download en_core_web_sm
-
Install the transformer-based model:
python -m spacy download en_core_web_trf
You can now load the package via spacy.load('en_core_web_sm')
For this workshop, we'll use the GeoNames gazetteer (dictionary for placenames):
python -m geoparser download geonames
Note: The last step of the setup process involves some database operations that are not reflected using progress bars but are still executing in the background. It may appear that the setup is stuck, but it isn't. Please wait until you see:
Database setup complete.
Download the following files from this repository and save them in your geoparser-workshop
directory:
- Tutorial Notebooks:
01_Geoparser_Basics.ipynb
02_Geoparser_FineTuning.ipynb
- Annotation Data:
training_annotations_incomplete.json
test_annotations.json
With all the files in place and the virtual environment still active, launch Jupyter Lab (still within the same geoparser-workshop
directory):
jupyter lab
This will open Jupyter Lab in your default web browser, showing the contents of your geoparser-workshop
directory.
You are now ready to start the workshop!
During this workshop we will use data from GeoCorpora for training and testing models:
Wallgrün, J. O., Karimzadeh, M., MacEachren, A. M., & Pezanowski, S. (2017). GeoCorpora: building a corpus to test and train microblog geoparsers. International Journal of Geographical Information Science, 32(1), 1–29. https://doi.org/10.1080/13658816.2017.1368523