Skip to content

The show and tell for code from our similar but different DH group projects.

Notifications You must be signed in to change notification settings

SorenKF/DH_collab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DH_collab

The show and tell for code from our similar but different DH group projects.

Remember to credit people and write your names on what you make <3

Description of contents

clustering_documents This very much a work in progress. An attempt to use a kmeans clustering algorithm on parts of the OB corpus

tf-idf Contains a notebook which can pass a multiword search query (in regex patterns) on any amount of the OB corpus. It calculates some basic statistics for word frequencies and finally computes tf-idf for the search terms in the retrieved documents (this is still to be completed).

The Code is written by Soeren Fomsgaard and Stella Verkijk.

speech Quirine Smit's work on speech extraction and gender-identification?

occupation Vivian Claes' code for extracting occupations (pre- and post-1834), accounting for the change in formatting. Sparql script (found in txt file) works better than the XML code.

About

The show and tell for code from our similar but different DH group projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published