DH_collab

The show and tell for code from our similar but different DH group projects.

Remember to credit people and write your names on what you make <3

Description of contents

clustering_documents This very much a work in progress. An attempt to use a kmeans clustering algorithm on parts of the OB corpus

tf-idf Contains a notebook which can pass a multiword search query (in regex patterns) on any amount of the OB corpus. It calculates some basic statistics for word frequencies and finally computes tf-idf for the search terms in the retrieved documents (this is still to be completed).

The Code is written by Soeren Fomsgaard and Stella Verkijk.

speech Quirine Smit's work on speech extraction and gender-identification?

occupation Vivian Claes' code for extracting occupations (pre- and post-1834), accounting for the change in formatting. Sparql script (found in txt file) works better than the XML code.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.history		.history
.vscode		.vscode
clustering_documents		clustering_documents
occupations		occupations
speech		speech
tf-idf		tf-idf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DH_collab

Description of contents

About

Releases

Packages

Contributors 3

Languages

SorenKF/DH_collab

Folders and files

Latest commit

History

Repository files navigation

DH_collab

Description of contents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages