-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New schema: Add chat
schema
#679
Conversation
seacrowd/utils/schemas/chat.py
Outdated
{ | ||
"id": datasets.Value("string"), | ||
"input": datasets.Sequence({ | ||
"role": datasets.ClassLabel(names=["system", "user", "assistant"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the role should be implemented as string
instead of ClassLabel
due to 2 reasons:
- There might be some cases where the role can be multiple for a single-sequence dialogues
- The actual data (when the examples are generated) will result in an integer instead of string
seacrowd/utils/schemas/chat.py
Outdated
# 2. defining meta as dict of key with intended colname meta and its val with dataset.Features class | ||
# in `_info` Dataloader method then populate it with the values in `_general_examples` Dataloader method | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add whitespace at EoF
seacrowd/utils/schemas/__init__.py
Outdated
@@ -42,4 +44,4 @@ | |||
"text2text_features", | |||
"video_features", | |||
"tod_features", | |||
] | |||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add whitespace at EoF
note to Holy & Sam: btw we might want to revisit the ToD (Task-Oriented Dialogue) & DS (Dialogue System) tasks later on whether we should use this schema too (since HF tokenizers already support the format of |
Hi @sabilmakbar, sorry for the late response. Replied on #635. |
Hi @patrickamadeus and @sabilmakbar, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then. |
add whitespace at EoF
change the `role` field schema from ClassLabel to string
did the changes bcs the initial assignee had no response |
Adding new
chat
schema to support #635