Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-default dataset configurations #277

Merged
merged 4 commits into from
Jul 16, 2023

Conversation

cg123
Copy link
Contributor

@cg123 cg123 commented Jul 14, 2023

Just a tiny change to make it possible to use HuggingFace datasets with multiple configurations. For example, to train on the 'enron_emails' configuration of The Pile, you could use this configuration:

datasets:
  - path: EleutherAI/pile
    name: enron_emails
    type: completion

@winglian
Copy link
Collaborator

🔥 This has been on my todo list, thank you! Does this breaks functionality if the dataset doesnt have a named configuration?

@cg123
Copy link
Contributor Author

cg123 commented Jul 15, 2023

Nope! This should still work exactly the same as before if the config field is not present.

The default value of the name argument is None, so load_datasets should get the exact same arguments for existing configs. I've tested a partial run of the openllama 3b config to be sure.

@NanoCode012
Copy link
Collaborator

Hey, nice work! Could you also please add this somewhere in the Readme for visibility?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this example. Could you also add this name: parameter within the "all yaml options" below?

@winglian winglian merged commit 334af62 into axolotl-ai-cloud:main Jul 16, 2023
@winglian winglian added the enhancement New feature or request label Jul 22, 2023
mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023
Allow non-default dataset configurations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants