A list of things that I read. Focus is mostly on text analysis and its applications.
Title | Venue/Year | Summary | Keywords |
---|---|---|---|
Topic Modeling in Embedding Spaces | Transactions of the Association for Computational Linguistics, Volume 8, 2020 | Embedded topic models (ETM) combines word embeddings and traditional topic models. Particularly on datasets with a large vocabulary (a NYT dataset is discussed), it outperforms LDA (measured through complexity score). Due to how word embeddings work, stopwords are also much less of a concern, as they are grouped in their own topic rather than mixed with every topic. On the qualitative side, the interpretability of the topics seems to be quite high as well. | Topic modeling on large vocabulary, ETM, alternative to LDA |
Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too! | Empirical Methods in Natural Language Processing (EMNLP), 2020 | The authors propose an alternative to LDA, by clustering (pre-trained) word embeddings to obtain topics. According to their evaluations it performs as well as classical approaches, but with a lower runtime. | Topic modeling |
Evaluation criteria to assess the value of identification sources for horizon scanning | International journal of technology assessment in health care, 2010 | Deciding whether a source is useful for horizon scanning is a complex and not standardized process. The authors present a system based on the Analytical Hierarchy Process (AHP, by Saaty) to rank sources through experts in horizon scanning. This manual and time-consuming process could, according to the authors, be seen as groundwork for automated decision systems, or technological support for scholars. | Horizon scanning, source identification, source ranking |
Applying Text Mining for Identifying Future Signals of Land Administration | Land, 2019 | How can future signals be detected through text mining? Authors recommend 1) keyword filtering (based on term frequency, document frequency) in combination with 2) topic modeling. Keywords (and the topics that include them) are ranked by degree of visibility (DoV) and degree of diffusion (DoD). Visualization through keyword emergence/issue maps. Especially weak signals still need quantitative + qualitative approach. | Text mining, topic modeling, future signal, signal detection |
Identification of future signal based on the quantitative and qualitative text mining: a case study on ethical issues in artificial intelligence | Quality & Quantity, 2017 | Identifying future signals and categorizing them into strong, weak, latent and well-known but not strong is important for research and policy makers. Combining quantitative metrics (DoV and DoD) with a qualitative approach (interpretation of keywords, and reviewing source material to verify keyword clusters that form topics based on co-occurrences). Authors acknowledge that methodological foundations of finding future patterns are still "challenging". Semantic analysis probably helpful. | Future signal, signal detection, qualitative |