tokenizer无法加载多条数据 #73

myaijarvis · 2023-04-12T12:06:05Z

tokenizer无法加载多条数据

import torch

from datasets.bert_dataset import BertDataset
from models.modeling_glycebert import GlyceBertModel
CHINESEBERT_PATH = 'C:\\Users\\jarvis\\.cache\\huggingface\\hub\\models--ShannonAI--ChineseBERT-base\\snapshots\\aa8b6fa9c3427f77b0911b07ab35f2b1b8bf248b'
tokenizer = BertDataset(CHINESEBERT_PATH)
chinese_bert = GlyceBertModel.from_pretrained(CHINESEBERT_PATH)

sentence1 = '我喜欢猫'
sentence2 = '我喜欢猫2'
sentence=[sentence1,sentence2]
input_ids_b=torch.Tensor()
pinyin_ids_b=torch.Tensor()
for sent in sentence:
    input_ids, pinyin_ids = tokenizer.tokenize_sentence(sent)
    length = input_ids.shape[0]
    input_ids = input_ids.view(1, length)
    pinyin_ids = pinyin_ids.view(1, length, 8)
    input_ids_b=torch.cat([input_ids_b,input_ids],0)
    pinyin_ids_b=torch.cat([pinyin_ids_b,pinyin_ids],0)

output_hidden = chinese_bert.forward(input_ids_b, pinyin_ids_b)[0]
print(output_hidden)

C:\Anaconda3\envs\chinese_bert\python.exe D:/workspace/python/ChineseBert/my_test.py
Traceback (most recent call last):
  File "D:/workspace/python/ChineseBert/my_test.py", line 22, in <module>
    output_hidden = chinese_bert.forward(input_ids_b, pinyin_ids_b)[0]
  File "D:\workspace\python\ChineseBert\models\modeling_glycebert.py", line 146, in forward
    inputs_embeds=inputs_embeds
  File "C:\Anaconda3\envs\chinese_bert\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\workspace\python\ChineseBert\models\fusion_embedding.py", line 64, in forward
    inputs_embeds = self.word_embeddings(input_ids)
  File "C:\Anaconda3\envs\chinese_bert\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Anaconda3\envs\chinese_bert\lib\site-packages\torch\nn\modules\sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Anaconda3\envs\chinese_bert\lib\site-packages\torch\nn\functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.FloatTensor instead (while checking arguments for embedding)

进程已结束,退出代码1

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer无法加载多条数据 #73

tokenizer无法加载多条数据 #73

myaijarvis commented Apr 12, 2023

tokenizer无法加载多条数据 #73

tokenizer无法加载多条数据 #73

Comments

myaijarvis commented Apr 12, 2023