Skip to content

Practical lecture material for DSAIT4050 (Information Retrieval) at TU Delft

Notifications You must be signed in to change notification settings

wis-delft/ms-information-retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSAIT4050: Information retrieval

This is the repository accompanying the DSAIT4050: Information Retrieval course at TU Delft.

Here, we publish any hands-on material (i.e., Jupyter notebooks). The notebooks will be released alongside the lecture, so be sure to keep checking this repository every week!

Did you find a bug or an error in one of the notebooks? We're happy to accept pull requests!

What you'll find here:

  • Project material (scaffolding project, final project)
  • Introduction to PyTerrier series

What you'll find on Brightspace:

  • Slides
  • Assignments
  • Announcements
  • Any other lecture-related material and discussions

Content

# File Title
00 scaffolding/00-task.ipynb Scaffolding project
01 intro-pyterrier/01-setup.ipynb Setup Open In Colab
02 intro-pyterrier/02-indexing-retrieval.ipynb Indexing & retrieval Open In Colab
03 intro-pyterrier/03-datasets.ipynb Datasets Open In Colab
04 intro-pyterrier/04-evaluation-experiments.ipynb Evaluation & experiments Open In Colab
05 intro-pyterrier/05-transformers.ipynb Transformers Open In Colab
06 intro-pyterrier/06-learning_to_rank.ipynb Learning to rank Open In Colab
07 intro-pyterrier/07-neural_models.ipynb Neural ranking models Open In Colab

How to run the notebooks

We recommend running the notebooks locally. You'll need up-to-date versions of

Alternatively, you can use Google Colab to run the notebooks in the cloud. However, this requires a Google account. Note that Colab environments are not persistent, i.e., you'll have to download files you don't want to lose.

Troubleshooting

We'll collect common issues and respective solutions here.

UnicodeDecodeError: 'charmap' codec can't decode [...]

This is an issue of ir_datasets which seems to happen on Windows only. There is a fix already, but it hasn't been merged. A possible workaround is to set Python to use UTF-8 by default.

TL;DR: Set the environment variable PYTHONUTF8=1.

Step-by-step guide

  1. Open the environment variable settings.
  2. Create a new user environment variable.
  3. Use PYTHONUTF8 as variable name and 1 as value.

You have to restart Python (i.e., the notebook server) after this.

Note that this setting may have unexpected effects on other Python scripts, so it's best to revert this after you're done with the notebooks.

About

Practical lecture material for DSAIT4050 (Information Retrieval) at TU Delft

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published