Skip to content

The repository contains the scripts used in the master's thesis "Kus on kodukoht? Analysing the Meaning of Home and Improving OCR Quality in Estonian Exile Newspapers Published in Sweden".

Notifications You must be signed in to change notification settings

lauranemvalts/lnu_ma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Information about the repository

The repository contains the scripts used in the master's thesis Kus on kodukoht? Analysing the Meaning of Home and Improving OCR Quality in Estonian Exile Newspapers Published in Sweden. This thesis was created as part of the digital humanities master's programme at Linneaus University in the spring semester of 2024. This study uses as sources the largest Estonian exile newspapers published in Sweden between 1944 and 1991: Teataja/Eesti Teataja, Välis-Eesti and Eesti Päevaleht/Stockholms-Tidningen Eestlastele. The scripts have been used to answer the first research question of the thesis, which studies the context in which Estonian exile newspapers talk about home and places in occupied Estonia.

The scripts are divided into two folders by method. The scripts in the text analysis folder are used for text analysis and the scripts in the spatial analysis folder are used for named entity recognition, data cleaning and geocoding. Thesis is primarily an experiment and therefore there was no specific aim to create a coherent workflow. Therefore, there is generally one script per analysis phase. However, this approach makes it easier to find a tool when an user wants to perform only one step of the analysis, for example to identify NEs (Named Entities) with EstNLTK or to filter unique results. The scripts are also written in both Python and R programming languages. This choice is based on the author's previous experience and skills.

The full texts of the exile newspapers have been obtained from the National Library of Estonia, instructions for accessing and using the material can be found in Digilab. The scripts have been developed using material from a previous project and with the help of LLM models GPT-3.5 and GPT-4o in the ChatGPT environment.

The full texts of the newspapers may be used only under the same conditions as indicated in the digital archive of the National Library of Estonia (example). The scripts created are licensed under CC BY 4.0.

About

The repository contains the scripts used in the master's thesis "Kus on kodukoht? Analysing the Meaning of Home and Improving OCR Quality in Estonian Exile Newspapers Published in Sweden".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published