Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while executing qg_utils.py: ValueError: invalid literal for int() with base 10: 'where' #7

Open
mriganktiwari opened this issue Jan 18, 2021 · 6 comments

Comments

@mriganktiwari
Copy link

While executing the qg_utils.py,
Line# 132 in https://github.com/W4ngatang/qags/blob/master/qg_utils.py gives below issue.

ValueError: invalid literal for int() with base 10: 'where'

The string tokens in variable tok_str are of str type and thereby causing the issue.

I would like to ask if this is not the expected type of elements in tok_str?

@sonsus
Copy link

sonsus commented Mar 22, 2021

I'm facing the same issue. I believe this is some kind of legacy from the author (including GPT tokenizer decoding follows after) considering the log file we put contains plain texts as questions, and the fact that there is replacing lines for <s> and <mask>.

@W4ngatang correct me if I'm wrong.

@gaozhiguang
Copy link

gaozhiguang commented Mar 30, 2021

I'm facing the same issue. I believe this is some kind of legacy from the author (including GPT tokenizer decoding follows after) considering the log file we put contains plain texts as questions, and the fact that there is replacing lines for and .

@W4ngatang correct me if I'm wrong.

Have you solved the problem ?@mriganktiwari @sonsus

@bigabig
Copy link

bigabig commented Apr 1, 2021

Hey, I just encountered the same problem. Is there a solution?

@g-vallejo
Copy link

Hi everyone,

I'm the next one with the same issue. Could someone solve it?

My solution was to write the raw in the gen_fh-file instead of decoding. Any comments on that?
Best,
Gisela

@mriganktiwari
Copy link
Author

I never found the solution, and long back moved away from trying as well. If someone finds the solution please provide here.

@Zhou-Zoey
Copy link

My solution to this is to delete the tokenize step in qg_utils(line 135-136), because I assume that the questions in log file are what we need.

Wandering whether am I correct. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants