topicmodeler

This repository contains a collection of Topic Modeling tools developed for the IntelComp H2020 project. The toolbox is built in Python and includes both a command-line interface and a PyQT6-based graphical user interface for training topic models using an expert-in-the-loop approach. It offers tools for preprocessing operations (e.g., ad-hoc stopwords removal, equivalences substitution), training topic models using state-of-the-art implementations, and curating topic models.

Getting Started

To start using the application, execute the following command:

python main_script --p project_folder --parquet parquet_folder --wdlist wordlists_folder

main_script: Refers to the script for the desired application version:
- Use ITMT_mainCMD.py for the command-line interface.
- Use ITMT_mainGUI.py for the graphical user interface.
project_folder: Path to a new or existing project where the application's output will be saved.
parquet_folder: Path to the downloaded parquet datasets.
wordlists_folder: Path to the folder containing wordlists.

For the graphical user interface, you can also invoke the application without parameters:

python ITMT_mainGUI.py

In this case, you can select the required parameters from the application's front page.

User Interfaces

Command Line User Interface

The command-line user interface is menu-based. Users can navigate the application by entering the number associated with each functionality. When input is required, the user will be prompted accordingly.

As the user interacts with the application, new options will be presented. The logic of operation for the console-based interface is defined in the configuration file located at /config/ITMTmenu.yaml.

Graphical User Interface

The graphical user interface comprises four main subwindows, each corresponding to a distinct functionality of the application, along with an additional subwindow for the welcome page. These subwindows are accessible via buttons on the left menu.

Docker Integration

Moreover, all scripts have been dockerized to allow them to be executed as independent modules or integrated within an external frontend (see the interactive-model-trainer repository for an example.)

Available Topic Modeling implementations

Name	Implementation
CTM (Bianchi et al. 2021)	Contextualized Topic Models
BERTopic (Grootendorst, M. (2022))	BERTopic
Mallet-LDA (Blei et al. 2003)	Gensim
NeuralLDA (Srivastava and Sutton 2017)	PyTorchAVITM
ProdLda (Srivastava and Sutton 2017)	PyTorchAVITM
SparkLDA	MLLib

Acknowledgements

This work has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 101004870, and from Grant TED2021-132366B-I00 funded by MCIN/AEI/10.13039/501100011033 and by the ``European Union NextGenerationEU/PRTR''.

Name		Name	Last commit message	Last commit date
Latest commit History 278 Commits
aux_scripts		aux_scripts
config		config
docs		docs
images		images
sphinx-settings		sphinx-settings
src		src
wordlists		wordlists
.gitignore		.gitignore
ITMT_mainCMD.py		ITMT_mainCMD.py
ITMT_mainGUI.py		ITMT_mainGUI.py
JSONadapter.py		JSONadapter.py
LICENSE		LICENSE
README.md		README.md
TMinferencer.py		TMinferencer.py
config.cf.default		config.cf.default
corpus2JSON.ipynb		corpus2JSON.ipynb
corpus2JSON.py		corpus2JSON.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

topicmodeler

Getting Started

User Interfaces

Command Line User Interface

Graphical User Interface

Docker Integration

Available Topic Modeling implementations

Acknowledgements

About

Releases 1

Packages

Contributors 5

Languages

License

IntelCompH2020/topicmodeler

Folders and files

Latest commit

History

Repository files navigation

topicmodeler

Getting Started

User Interfaces

Command Line User Interface

Graphical User Interface

Docker Integration

Available Topic Modeling implementations

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 5

Languages

Packages