Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The _raw_word_count method is breaking word2vec model #1106

Closed
edilsonacjr opened this issue Jan 24, 2017 · 3 comments
Closed

The _raw_word_count method is breaking word2vec model #1106

edilsonacjr opened this issue Jan 24, 2017 · 3 comments
Labels
bug Issue described a bug

Comments

@edilsonacjr
Copy link

edilsonacjr commented Jan 24, 2017

The old code for the function _raw_word_count is breaking the word2vec model:

File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/gensim/models/word2vec.py", line 744, in
return sum(len(sentence) for sentence in job)
TypeError: object of type 'map' has no len()

The solution was already implemented. Why is it not yet available?

From (Issue #535 ):
def _raw_word_count(self, job):
"""Return the number of words in a given job."""
return sum(len(sentence.words) for sentence in job)

@tmylk
Copy link
Contributor

tmylk commented Jan 25, 2017

Dear @edilsonacjr

What is the type of input that you are giving to sentence? Really keen to reproduce the error. Do you have a code snippet?

The code is current for last 12 months. In (word2vec)[https://github.com/RaRe-Technologies/gensim/blame/2a70e3a726404cd4230542a35cfd2dc4d63da6f1/gensim/models/word2vec.py#L747] len(sentence) was added in #535. The change to len(sentence.words) only affects doc2vec and not word2vec.

@tmylk tmylk added the bug Issue described a bug label Jan 25, 2017
@edilsonacjr
Copy link
Author

I just realized the problem is that I'm using the 'map' function directly as sentences, instead of a list:

walks = [map(str, walk) for walk in walks]
model = Word2Vec(walks, size=args.dimensions, window=args.window_size, min_count=0, sg=1, workers=args.workers,
                 iter=args.iter)
model.save_word2vec_format(args.output)

Thank you anyway.

@tmylk
Copy link
Contributor

tmylk commented Jan 27, 2017

Glad you figured it out! Closing as "not an issue".

@tmylk tmylk closed this as completed Jan 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug
Projects
None yet
Development

No branches or pull requests

2 participants