-
Notifications
You must be signed in to change notification settings - Fork 0
Recipes & FAQ
##Add your useful code snippets and recipes here. You can also post a short question -- please only ask questions that can be fully answered in a sentence or two. No open-ended questions or discussions here.
###Q: How many times does a feature with id 123
appear in a corpus?
A: total_sum = sum(dict(doc).get(123, 0) for doc in corpus)
###Q: How do you calculate the vector length of a term? A: (note that "vector length" only makes sense for non-zero vectors):
- If the input vector
vec
is in gensim sparse format (a list of 2-tuples) :length = math.sqrt(sum(val**2 for _, val in vec))
, or uselength = gensim.matutils.veclen(vec)
. - If the input vector is a numpy array:
length = gensim.matutils.blas_nrm2(vec)
- If the input vector is in a
scipy.sparse
format:length = numpy.sqrt(numpy.sum(vec.tocsr().data**2))
###Q: How do you calculate the vector V in LSI space?
A: With the singular value decomposition of your corpus X
being X=U*S*V^T
, doing lsi[X]
computes U^-1*X
, which equals V*S
(basic linear algebra). So if you want V
, divide lsi[X]
by S
:
V = gensim.matutils.corpus2dense(lsi[X], len(lsi.projection.s)).T / lsi.projection.s
, to get V
as a 2d numpy array.