Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何输出词向量? #4

Open
Wenenen opened this issue Aug 5, 2019 · 5 comments
Open

如何输出词向量? #4

Wenenen opened this issue Aug 5, 2019 · 5 comments

Comments

@Wenenen
Copy link

Wenenen commented Aug 5, 2019

加载这个模型之后,如何输出词向量。Google的模型可以输出768维的向量,这个如何弄呢?

@zzy14
Copy link
Member

zzy14 commented Aug 5, 2019

您可以参考这个issue

@Wenenen
Copy link
Author

Wenenen commented Aug 7, 2019

谢谢,目前能输出word embedding了,但是这个应该是字的embedding吧?需要怎么将它组合成词的embedding呢?比如:‘民事’

    model = BertModel.from_pretrained('ms')
    embedding = model.embeddings.word_embeddings
    print(embedding)
    input = torch.LongTensor([[1, 2, 4, 5], [0, 3, 2, 9]])
    print(embedding(input))

Embedding(23283, 768, padding_idx=0)
tensor([[[ 0.0121, -0.0052, 0.0428, ..., 0.0085, -0.0226, 0.0671],
[-0.0381, -0.0242, 0.0140, ..., 0.0077, 0.0074, 0.0635],
[ 0.0075, 0.0054, -0.0041, ..., -0.0073, -0.0303, 0.0289],
[-0.0139, -0.0260, -0.0086, ..., 0.0424, 0.0187, 0.0521]],

    [[ 0.0027, -0.0153,  0.0268,  ...,  0.0323, -0.0142,  0.0039],
     [ 0.0278,  0.0139, -0.0009,  ...,  0.0170, -0.0452,  0.0104],
     [-0.0381, -0.0242,  0.0140,  ...,  0.0077,  0.0074,  0.0635],
     [-0.0139,  0.0054,  0.0031,  ...,  0.0092,  0.0022,  0.0549]]],
   grad_fn=<EmbeddingBackward>)

@zzy14
Copy link
Member

zzy14 commented Aug 7, 2019

Bert本身只能提供字的向量,他是基于词表的,如果训练一个基于词的词表才可能获得词向量。

@Wenenen
Copy link
Author

Wenenen commented Aug 8, 2019

非常感谢。请问下有哪些方法去训练词的向量呢?出来的效果会比字的好吗?是否可以在基于字的基础上训练出一个基于词的呢?

@Yyy11181
Copy link

可以直接作为embedding 来用吗? @zzy14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants