Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes #518: tiny 10µsec fudge against 0 elapsed #584

Merged
merged 1 commit into from
Jan 23, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Changes
- `utils.HAS_PATTERN` flag moved to `utils.has_pattern()`
* Threadsafe Word2Vec/Doc2Vec finish-check to avoid hang/unending Word2Vec/Doc2Vec training (Gordon Mohr, #571)
* Tuned `TestWord2VecModel.test_cbow_hs()` against random failures (Gordon Mohr, #531)
* Prevent ZeroDivisionError when `default_timer()` indicate no elapsed time (Gordon Mohr, #518)
* Forwards compatibility for NumPy > 1.10 (Matti Lyra, #494, #513)
- LdaModel and LdaMulticore produce a large number of DeprecationWarnings from
.inference() because the term ids in each chunk returned from utils.grouper
Expand Down
2 changes: 1 addition & 1 deletion gensim/models/doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -626,7 +626,7 @@ def scan_vocab(self, documents, progress_per=10000, trim_rule=None):
document_no = -1
total_words = 0
min_reduce = 1
interval_start = default_timer()
interval_start = default_timer() - 0.00001 # guard against next sample being identical
interval_count = 0
vocab = defaultdict(int)
for document_no, document in enumerate(documents):
Expand Down
6 changes: 3 additions & 3 deletions gensim/models/word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -807,7 +807,7 @@ def job_producer():
thread.start()

example_count, trained_word_count, raw_word_count = 0, 0, word_count
start, next_report = default_timer(), 1.0
start, next_report = default_timer() - 0.00001, 1.0

while unfinished_worker_count > 0:
report = progress_queue.get() # blocks if workers too slow
Expand Down Expand Up @@ -844,7 +844,7 @@ def job_producer():
elapsed = default_timer() - start
logger.info(
"training on %i raw words (%i effective words) took %.1fs, %.0f effective words/s",
raw_word_count, trained_word_count, elapsed, trained_word_count / elapsed if elapsed else 0.0)
raw_word_count, trained_word_count, elapsed, trained_word_count / elapsed)
if job_tally < 10 * self.workers:
logger.warn("under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay")

Expand Down Expand Up @@ -968,7 +968,7 @@ def worker_loop():
self.clear_sims()
logger.info(
"scoring %i sentences took %.1fs, %.0f sentences/s",
sentence_count, elapsed, sentence_count / elapsed if elapsed else 0.0)
sentence_count, elapsed, sentence_count / elapsed)
return sentence_scores[:sentence_count]

def clear_sims(self):
Expand Down