This is the repository accompanying the DSAIT4050: Information Retrieval course at TU Delft.
Here, we publish any hands-on material (i.e., Jupyter notebooks). The notebooks will be released alongside the lecture, so be sure to keep checking this repository every week!
Did you find a bug or an error in one of the notebooks? We're happy to accept pull requests!
What you'll find here:
- Project material (scaffolding project, final project)
- Introduction to PyTerrier series
What you'll find on Brightspace:
- Slides
- Assignments
- Announcements
- Any other lecture-related material and discussions
We recommend running the notebooks locally. You'll need up-to-date versions of
- Python,
- JDK,
- a notebook viewer, such as JupyterLab or Visual Studio Code.
Alternatively, you can use Google Colab to run the notebooks in the cloud. However, this requires a Google account. Note that Colab environments are not persistent, i.e., you'll have to download files you don't want to lose.
We'll collect common issues and respective solutions here.
This is an issue of ir_datasets
which seems to happen on Windows only. There is a fix already, but it hasn't been merged. A possible workaround is to set Python to use UTF-8 by default.
TL;DR: Set the environment variable PYTHONUTF8=1
.
- Open the environment variable settings.
- Create a new user environment variable.
- Use
PYTHONUTF8
as variable name and1
as value.
You have to restart Python (i.e., the notebook server) after this.
Note that this setting may have unexpected effects on other Python scripts, so it's best to revert this after you're done with the notebooks.