The tensorflow implementation for copula regularized GSM/NVCTM.
We provided the model implemantations and experiment results in this repository. We trained the models for longer periods instead of early-stopped to get more accurate NPMI scores, although the early stopped parameters are able to generate coherent and distinguishable topics with higher NPMI. Further analysis are given below as supplements to the original paper.
i) Knowing that the K-variate topic vectors are located on K-1 simplex, the zero points of the regularization term (the intersections of the log-copula-pdf surface and the simplex) should be inside the simplex. We introduced a constant parameter C to scale the positions of the zero points (in general, we do not want the topic distributions to be one-hot, which harms the soft-clustering nature of topic models). Due the the shape of the Clayton copula's PDF, the regularization term obtained by taking logarithm on it will lead to steep surface near the simplex's vertices. This is problematic for training since the gradient will dramatically increase when some of the topic vectors are regularized to one-hot vectors, causing NaN (gradient explotion). Further more, we found it helpful to incoporate the loss weighting method (widely used in multi-task learning) to stablize the training of copula regularized topic models. We performed gradient clipping to alleviate the gradient explotion. We also found that assigning smaller weight to the regularization term contributes to higher NPMI scores (the topics are still distinguishable according to the top words) but higher perplexity. However, it is possible to tune our models so as to get lower perplexities than the baselines.
ii) We found it tricky to reach a balance point where the perplexity and the NPMI are both satisfactory for AEVB-based topic models. Considering that the main purpose of modelling topics on short texts is to get better interpretability, we pay more attention to the NPMI scores. For some reason, the topic models on short texts that we did survey on such as the Dual-Sparse Topic Model, Dirichlet Multinomial Mixture with Variational Manifold Regularization and GraphBTM did not report the perplexities. The authors argued that perplexity is not a good metric to evaluate the performances of short text topic modelling (possibly because the perplexities are much higher than general topic models, according to our experiments on GraphBTM). We observed that our copula regularized topic models achieved significant improvement on NPMI (results are avaliable and we provided the testing script), and the perplexity is relatively acceptable. This behavior is quite similar to ProdLDA, which gets impressive NPMI scores on StackOverflow and Snippets, while the perplexity is higher.
iii) We found that setting lambda as a learnable variable leads to topic vectors with apparently lower average stddev, and the perplexity drops together. Also, we argue that the copula regularization may not contribute to boosting the NPMI scores on some datasets of NVCTM, where the Centralized Transformation Flow (CTF) is utilized. We are looking for reasonable mathematical explanations.
iv) For GraphBTM, we did not use the stochastic samping of mini-corpus. We sequentially read the data batch by batch and represent each doc by buiding the graph with its biterm disctionary, so that we can obtain the corresponding latent representation for each doc to perform perplexity computation and text classification.
v) If the original link to TagMyNews dataset is broken, please refer to https://github.com/vijaynandwani/News-Classification or https://github.com/ZhaoyangLyu/POPQORN.
The variances of experimental results are acceptable. We print the results on training, validation and testing set step by step for the convinienvce of observation. It is possible to add the topic diverity regularization proposed in the original GSM and see how it works.
We thank @FengJiaChunFromSYSU for implementing the Neural Variational Correlated Topic Model.