如何输出词向量？ #4

Wenenen · 2019-08-05T02:36:48Z

加载这个模型之后，如何输出词向量。Google的模型可以输出768维的向量，这个如何弄呢？

zzy14 · 2019-08-05T15:49:58Z

您可以参考这个issue。

Wenenen · 2019-08-07T03:21:49Z

谢谢，目前能输出word embedding了，但是这个应该是字的embedding吧？需要怎么将它组合成词的embedding呢？比如：‘民事’

    model = BertModel.from_pretrained('ms')
    embedding = model.embeddings.word_embeddings
    print(embedding)
    input = torch.LongTensor([[1, 2, 4, 5], [0, 3, 2, 9]])
    print(embedding(input))

Embedding(23283, 768, padding_idx=0)
tensor([[[ 0.0121, -0.0052, 0.0428, ..., 0.0085, -0.0226, 0.0671],
[-0.0381, -0.0242, 0.0140, ..., 0.0077, 0.0074, 0.0635],
[ 0.0075, 0.0054, -0.0041, ..., -0.0073, -0.0303, 0.0289],
[-0.0139, -0.0260, -0.0086, ..., 0.0424, 0.0187, 0.0521]],

    [[ 0.0027, -0.0153,  0.0268,  ...,  0.0323, -0.0142,  0.0039],
     [ 0.0278,  0.0139, -0.0009,  ...,  0.0170, -0.0452,  0.0104],
     [-0.0381, -0.0242,  0.0140,  ...,  0.0077,  0.0074,  0.0635],
     [-0.0139,  0.0054,  0.0031,  ...,  0.0092,  0.0022,  0.0549]]],
   grad_fn=<EmbeddingBackward>)

zzy14 · 2019-08-07T07:58:42Z

Bert本身只能提供字的向量，他是基于词表的，如果训练一个基于词的词表才可能获得词向量。

Wenenen · 2019-08-08T08:00:03Z

非常感谢。请问下有哪些方法去训练词的向量呢？出来的效果会比字的好吗？是否可以在基于字的基础上训练出一个基于词的呢？

Yyy11181 · 2023-07-29T06:39:44Z

可以直接作为embedding 来用吗？ @zzy14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如何输出词向量？ #4

如何输出词向量？ #4

Wenenen commented Aug 5, 2019

zzy14 commented Aug 5, 2019

Wenenen commented Aug 7, 2019

zzy14 commented Aug 7, 2019

Wenenen commented Aug 8, 2019

Yyy11181 commented Jul 29, 2023

如何输出词向量？ #4

如何输出词向量？ #4

Comments

Wenenen commented Aug 5, 2019

zzy14 commented Aug 5, 2019

Wenenen commented Aug 7, 2019

zzy14 commented Aug 7, 2019

Wenenen commented Aug 8, 2019

Yyy11181 commented Jul 29, 2023