Resources used for my talk at Pycon Spain 2018
The main objective of the talk was to introduce the audience into the gensim library for Topic Modelling through a use case where a dataset of movies' reviews from IMDB is analyzed to automatically discover fine-grained topics that can be used to better contextualize movies' profiles.
The Notebook shows how to train LDA models using gensim and how a first noise and useless model can be improved mainly by using different NLP techniques for filtering the initial vocabulary trying to keep only those terms that are semantically valuable in this domain.
- Uncompress the movies' reviews dataset at resources/aclImdb folder
- Build the Dockerfile using a tag name of your selection
- Run the built docker image in this way:
docker run --rm -p 8888:8888 -v "$PWD":/home/jovyan/work jupyter/[YOUR_TAG]
- Select the Notebook at 'work' folder in the Jupyter Dashboard