-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to to_tf_dataset
#3085
Fixes to to_tf_dataset
#3085
Conversation
Hi ! Can you give some details about why you need these changes ? |
Hey, sorry, I should have explained! I've been getting a lot of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see :)
for col in cols_to_retain: | ||
if col not in self.features: | ||
raise ValueError(f"Couldn't find column {col} in dataset.") | ||
if col not in self.features and not (col in ("attention_mask", "labels") and collate_fn is not None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why hardcode some column names here ? It feels hacky
Changing the collate_fn function could break this no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very hacky, yeah. I need to change this to make it work properly, but I was under time pressure to get notebooks and everything ready in time to record videos for the course.
I think a better solution would be to take a remove_columns
list instead of columns
, and then I wouldn't have to worry so much about new columns being added by the data collator - I assume that all of those are being kept. WDYT?
240a61c
to
a1d21ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these comments :)
Let's have this for now !
No description provided.