Allow use of truncated Dictionary for coherence measures#1349
Merged
menshikh-iv merged 34 commits intopiskvorky:developfrom macks22:developJun 14, 2017
+1,481-438
Commits
Commits on May 22, 2017
piskvorky#1342: Allow use of truncated
Dictionary
for coherence calculation by avoiding lookup of tokens not in the topic token lists.committedSweeney, Mackpiskvorky#1342: Do not produce sliding windows for texts with no relevant words, and ensure each relevant word has a set in the
per_topic_postings
dict.committedSweeney, Mack- committedSweeney, Mack
Commits on May 24, 2017
- committedSweeney, Mack
- committedSweeney, Mack
- committedSweeney, Mack
- committedSweeney, Mack
Commits on May 25, 2017
- committedSweeney, Mack
- committedSweeney, Mack
Commits on May 27, 2017
Commits on May 30, 2017
piskvorky#1342: Cleanup, documentation improvements, proper caching of accumulator in CoherenceModel, and various test fixes.
committedSweeney, Mack- authored
piskvorky#1342: Do not swallow
KeyboardInterrupt
naively inWikiCorpus.get_texts
; instead, log warning and do not setlength
.committedSweeney, Mackpiskvorky#1342: Formatting fixes (hanging indent in
coherencemodel
and non-empty blank lines intext_analysis
.committedSweeney, Mackpiskvorky#1342: Improve
CoherenceModel
documentation and minor refactor for variable interpretability.committedSweeney, Mackpiskvorky#1342: Optimize word occurrence accumulation and fix a bug with repeated counting of tokens that occur more than once in a window.
committedSweeney, Mack
Commits on May 31, 2017
piskvorky#1342: Minor bug fixes and improved logging in text_analysis module; cleaned up spacing in coherencemodel.
committedSweeney, Mackpiskvorky#1342: Optimize data structures being used for window set tracking and avoid undue network traffic by moving relevancy filtering and token conversion to the master process.
committedSweeney, Mack- committedSweeney, Mack
piskvorky#1342: Further optimize word co-occurrence accumulation by using a
collections.Counter
instance for accumulation within a batch.committedSweeney, Mack
Commits on Jun 1, 2017
piskvorky#1342: Clean up logging in
text_analysis
module and remove empty line at end ofutil
module.committedSweeney, Mack- committedSweeney, Mack
- committedSweeney, Mack
- committedSweeney, Mack
piskvorky#1342: Realized the python3 compatibility issue was due to the Dictionary mapping to different ids, so fixed the
probability_estimation
tests to be agnostic of this. Also fixed an issue …committedSweeney, Mack
Commits on Jun 2, 2017
Commits on Jun 4, 2017
- committedSweeney, Mack
Commits on Jun 5, 2017
- committedSweeney, Mack
Commits on Jun 6, 2017
Commits on Jun 7, 2017
Commits on Jun 8, 2017
piskvorky#1342: Hanging indents and switch out
union
withupdate
for unique ids from topic segments.committedSweeney, Mack
Commits on Jun 9, 2017
- committedSweeney, Mack