Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble re-loading Doc2Vec Model Generated In earlier Version #1977

Closed
fsfeng opened this issue Mar 13, 2018 · 10 comments
Closed

Trouble re-loading Doc2Vec Model Generated In earlier Version #1977

fsfeng opened this issue Mar 13, 2018 · 10 comments
Assignees
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills

Comments

@fsfeng
Copy link

fsfeng commented Mar 13, 2018

I trained a Doc2Vec model in an earlier version of gensim and now can not load it. I tried both Doc2Vec.load(fname) and Doc2Vec.load(fname, mmap = 'r'). Both produce the same errors:

AttributeError: Can't get attribute 'DocvecsArray' on <module 'gensim.models.doc2vec' from 'DIR/python/miniconda3/lib/python3.6/site-packages/gensim/models/doc2vec.py'>

AttributeError: 'Doc2Vec' object has no attribute 'running_training_loss'

Any help would be much appreciated. Thanks!

@menshikh-iv
Copy link
Contributor

Hello, can you share your model (or install current develop branch and check again), probably this already resolved in #1974 (but I'm not sure)

@menshikh-iv menshikh-iv added the need info Not enough information for reproduce an issue, need more info from author label Mar 13, 2018
@fsfeng
Copy link
Author

fsfeng commented Mar 13, 2018

Hi I tried with the develop branch install but am getting the exact same problem.

Not sure what you mean by sharing my model, but here's the code I used to create it.

from gensim import models
from gensim.models.doc2vec import TaggedDocument

class DocIterator(object):
    def __init__(self, doc_list, labels_list):
        self.labels_list = labels_list
        self.doc_list = doc_list

    def __iter__(self):
        for idx, doc in enumerate(self.doc_list):
            # print 'creating tagged document...%d' % idx
            yield TaggedDocument(words=doc.split(), tags=[self.labels_list[idx]])


iterator = DocIterator(TEXT, LABELS)

%time model = gensim.models.Doc2Vec(size=100, window=10, min_count=50, workers=4, alpha=0.025, min_alpha=0.025)
%time model.build_vocab(iterator)

print('done building vocabulary')
print('start training the model')

for epoch in range(10):
    starttime = datetime.datetime.now()
    print("start", starttime)
    print('epoch %d..' % (epoch+1))
    model.train(iterator)
    model.alpha -= 0.002
    model.min_alpha = model.alpha
    model.train(iterator)
    endtime = datetime.datetime.now()
    print("end", endtime, "took", endtime-starttime)

model.save("DIR/doc2vec_1030.model")

This was done a few months ago and now when I try to re-load the model I'm getting the error reported above.

@menshikh-iv
Copy link
Contributor

@fsfeng about "sharing model" I mean load it doc2vec_1030.model to GitHub/google drive/etc and publish a link here (this needed for investigation & fixing).

What's a gensim version you are used?

CC: @manneshiva probably this is a bug, please fix it ASAP.

@marlohaering
Copy link

I encountered the same error. I trained my model with gensim version 1.0.1 and I'm trying to reload it with gensim version 3.4.0. Unfortunately, I cannot share my model.

@menshikh-iv
Copy link
Contributor

@marlohaering that's expected behavior, we don't have so long backward compatibility for models.

@menshikh-iv
Copy link
Contributor

@fsfeng I'm still waiting for a model from you (this needed for reproducing an error), also, what's gensim version you used for training?

@fsfeng
Copy link
Author

fsfeng commented Mar 22, 2018 via email

@fsfeng
Copy link
Author

fsfeng commented Mar 22, 2018 via email

@menshikh-iv
Copy link
Contributor

@fsfeng thanks for the information, now I understand what happens.

@menshikh-iv menshikh-iv added bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills and removed need info Not enough information for reproduce an issue, need more info from author labels Mar 24, 2018
menshikh-iv pushed a commit that referenced this issue Apr 12, 2018
…g old Gensim versions. Fix #2000, #1977 (#2012)

* adds default values for attributes

* ignore values for attributes that do not exist

* adds unit test

* fixes default values for missing attributes for older gensim models

* adds unit test cases for loading really old gensim models

* adds test cases for loading all old models

* adds more tests post loading

* handles loading d2v models saved using version <=0.12.2

* fix `max_rawint` value and PEP8 errors

* adds saving and loading back tests

* adds comments and fixes `max_rawint`

* fix PEP8
@menshikh-iv
Copy link
Contributor

fixed in #2012

darindf pushed a commit to darindf/gensim that referenced this issue Apr 23, 2018
…g old Gensim versions. Fix piskvorky#2000, piskvorky#1977 (piskvorky#2012)

* adds default values for attributes

* ignore values for attributes that do not exist

* adds unit test

* fixes default values for missing attributes for older gensim models

* adds unit test cases for loading really old gensim models

* adds test cases for loading all old models

* adds more tests post loading

* handles loading d2v models saved using version <=0.12.2

* fix `max_rawint` value and PEP8 errors

* adds saving and loading back tests

* adds comments and fixes `max_rawint`

* fix PEP8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills
Projects
None yet
Development

No branches or pull requests

4 participants