Abstract

The goal of this thesis is to investigate the applicability of computational means to the exploration of large unstructured text corpora. Finding relevant documents and interconnections between documents becomes significantly more difficult due to the sheer amount of documents available. Institutes, such as the German tax offices, have access to leak data, for instance, the Panama Papers or the Bahamas leak, containing huge amounts of documents and valuable information yet to be extracted. However, these institutes, companies and individuals do not have sufficient resources to explore individual documents in order to find a specific one or to identify inherent key topics. Hence, computational means, such as text mining or topic analysis, may help to overcome this obstacle. This thesis proposes an approach to finding relevant documents which share common topics from a large unstructured text corpus. The approach bundles different methods, such as textual embeddings, transformation of images and clustering techniques. As a result of this work, a web interface that enables the comparison of the methods examined via queries for similar documents to a database is provided.

Name		Name	Last commit message	Last commit date
Latest commit History 312 Commits
.vscode		.vscode
bibliography		bibliography
chapter		chapter
images		images
svg-inkscape		svg-inkscape
.gitignore		.gitignore
BA_methods.docx		BA_methods.docx
README.md		README.md
abstract.tex		abstract.tex
abstract_german.tex		abstract_german.tex
acronyms.tex		acronyms.tex
appendix.tex		appendix.tex
declaration.tex		declaration.tex
header.tex		header.tex
main.pdf		main.pdf
main.tex		main.tex
titlepage.tex		titlepage.tex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

About

Releases

Packages

Languages

KlaraGtknst/bachelor-thesis

Folders and files

Latest commit

History

Repository files navigation

Abstract

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages