Introducing language models and word embeddings 🧑‍💻

This repo contains a Jupyter notebook introducing to language models and word embeddings by training a word2vec model relying on datasets of 100K and 1M sentences from German news articles.

Prerequisites

Python and JupyterLab installed on your machine.

Instructions

Run jupyterlab in your terminal.
Clone this repo.
Download this folder from Wortschatz Leipzig, unpack it and save the file "deu_news_2022_1M-sentences.txt" in the "data" folder. It is not provided in this repo as it exceeds 100 MB.
Navigate to this repo using the file manager inside JupyterLab.
Open "Notebook.ipynb" and enjoy!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
.gitignore		.gitignore
LICENSE CC BY 4.0		LICENSE CC BY 4.0
Notebook.ipynb		Notebook.ipynb
README.md		README.md
config.json		config.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introducing language models and word embeddings 🧑‍💻

Prerequisites

Instructions

About

Releases

Packages

Languages

yannickfrommherz/Sprachmodelle-und-Word-Embeddings

Folders and files

Latest commit

History

Repository files navigation

Introducing language models and word embeddings 🧑‍💻

Prerequisites

Instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages