Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question-Answer Dataset Format ? #38

Open
szh-max opened this issue Jun 5, 2023 · 0 comments
Open

Question-Answer Dataset Format ? #38

szh-max opened this issue Jun 5, 2023 · 0 comments

Comments

@szh-max
Copy link

szh-max commented Jun 5, 2023

Hello, may I ask if you are using the model for question answering, and what format is the dataset in?

I am using the retro model, and the dataset I created is:

question: Where is the Alberta Basin located?
answer: It is located in western Canada, between latitudes 49° to 60°.

question: What is the area of the Alberta Basin?
answer: The area is approximately 748,889 square kilometers.

question: What type of basin is the Alberta Basin?
answer: It is a foreland basin.

But there seems to be an error in the following generation:
prefix = 'Where is the Alberta Basin'
prompt = torch.LongTensor(tokenizer.encode(prefix, add_special_tokens=False)).unsqueeze(0)
sampled = wrapper.generate(prompt, filter_thres = 0.1, temperature = 0.1) # (1, <2049) terminates early if all
print(sampled)
print('#######')
print(tokenizer.decode(sampled.squeeze(), skip_special_tokens=True))

The output is garbled: (Where is the Alberta Basin: 。 question : 。 question : ? answer : : 。 question : 。 question : 。 question : : 。 question : 。 question : : 。 question : 。 question : : 。 question : 。 question : : 。 question :stion : 。 question : 。 question :ion : : : 。 question : 。 question : : 。 question : : 的 ? answer : 。 question : : 。 question : 。 question : 。 question : 。 question : 。 question : : : 。 question : 。 question :stion : 。 question : 。 question : 。)
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant