Skip to content

A list of things that I read. Focus is mostly on text analysis and its applications.

Notifications You must be signed in to change notification settings

janrn/reading-list

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Reading list

A list of things that I read. Focus is mostly on text analysis and its applications.

Scientific papers

Title Venue/Year Summary Keywords
Topic Modeling in Embedding Spaces Transactions of the Association for Computational Linguistics, Volume 8, 2020 Embedded topic models (ETM) combines word embeddings and traditional topic models. Particularly on datasets with a large vocabulary (a NYT dataset is discussed), it outperforms LDA (measured through complexity score). Due to how word embeddings work, stopwords are also much less of a concern, as they are grouped in their own topic rather than mixed with every topic. On the qualitative side, the interpretability of the topics seems to be quite high as well. Topic modeling on large vocabulary, ETM, alternative to LDA
Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too! Empirical Methods in Natural Language Processing (EMNLP), 2020 The authors propose an alternative to LDA, by clustering (pre-trained) word embeddings to obtain topics. According to their evaluations it performs as well as classical approaches, but with a lower runtime. Topic modeling
Evaluation criteria to assess the value of identification sources for horizon scanning International journal of technology assessment in health care, 2010 Deciding whether a source is useful for horizon scanning is a complex and not standardized process. The authors present a system based on the Analytical Hierarchy Process (AHP, by Saaty) to rank sources through experts in horizon scanning. This manual and time-consuming process could, according to the authors, be seen as groundwork for automated decision systems, or technological support for scholars. Horizon scanning, source identification, source ranking
Applying Text Mining for Identifying Future Signals of Land Administration Land, 2019 How can future signals be detected through text mining? Authors recommend 1) keyword filtering (based on term frequency, document frequency) in combination with 2) topic modeling. Keywords (and the topics that include them) are ranked by degree of visibility (DoV) and degree of diffusion (DoD). Visualization through keyword emergence/issue maps. Especially weak signals still need quantitative + qualitative approach. Text mining, topic modeling, future signal, signal detection
Identification of future signal based on the quantitative and qualitative text mining: a case study on ethical issues in artificial intelligence Quality & Quantity, 2017 Identifying future signals and categorizing them into strong, weak, latent and well-known but not strong is important for research and policy makers. Combining quantitative metrics (DoV and DoD) with a qualitative approach (interpretation of keywords, and reviewing source material to verify keyword clusters that form topics based on co-occurrences). Authors acknowledge that methodological foundations of finding future patterns are still "challenging". Semantic analysis probably helpful. Future signal, signal detection, qualitative

About

A list of things that I read. Focus is mostly on text analysis and its applications.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published