Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to use it in Google Colab #109

Open
ajason08 opened this issue Apr 21, 2020 · 3 comments
Open

Fail to use it in Google Colab #109

ajason08 opened this issue Apr 21, 2020 · 3 comments

Comments

@ajason08
Copy link

Hello,
Thank you for your effort at doing this python version.
I am struggling to run the first example.

My code (just 3 lines of code) can be reproduced with this notebook

Can you please help me to understand what is wrong?

Thank you!

@IronFarm
Copy link

Hi

L.19 in example.py only works in python 2.
Try replacing yield line.lower().translate(None, delchars).split(' ')
with yield line.lower().translate({ord(x): None for x in delchars}).split(' ')

You're probably going to run into more issues down the line though as this code was written for python 2 and appears to be no longer maintained.

Cheers

@IronFarm
Copy link

(The issue further down in your code is because model.fit() expects a list of lists, not a list of strings. Each document should be represented as a list of words.)

@ajason08
Copy link
Author

ajason08 commented Apr 21, 2020

Now working as expected!
Thank you

I paste my solved code here for future references to readers.

!pip install glove_python
!curl -o my_corpus.txt https://norvig.com/big.txt

from glove import Corpus, Glove

#Creating a corpus object
corpus = Corpus() 

""" The learner "model.fit()" expects a list of (list of string),
  not a big string nor a list of strings. 
  Each document should be represented as a list of words: [[doc1],[doc2]...]
  Next code will turn a txt file into this format.
  However it should have more efficient alternatives """

with open("my_corpus.txt",'r') as f:  
    lines = f.read().split()  

num_docs = 10
doc_list = []
last_index = 0
for i in range(num_docs):
  upper_index= (int(len(lines)/num_docs))*(i+1) #probably lossing last lines
  newdoc = lines[last_index:upper_index]
  doc_list.append(newdoc)

print("number of docs in doc_list:",len(doc_list))
print("first doc fragment:", doc_list[0][0:11])


#Training the corpus to generate the co occurence matrix which is used in GloVe
corpus.fit(doc_list, window=10)
glove = Glove(no_components=5, learning_rate=0.05) 
glove.fit(corpus.matrix, epochs=30, no_threads=1, verbose=True)
glove.add_dictionary(corpus.dictionary)
glove.save('glove.model')

glove = Glove.load('glove.model')
x = glove.most_similar("Sherlock", number=10)
x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants