-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Granite code support #1336
base: main
Are you sure you want to change the base?
Granite code support #1336
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1336
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New FailuresAs of commit 2041515 with merge base b809b69 (): NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Also, I used the following script to perform conversion of a pre-existing HF snapshot. It's similar to the convert_existing_checkpoint.py#!/usr/bin/env python
"""
Simple script to convert an existing HF snapshot into torchchat format
"""
# Standard
import argparse
from pathlib import Path
# Local
from torchchat.cli.convert_hf_checkpoint import convert_hf_checkpoint, convert_hf_checkpoint_to_tune
def main():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("checkpoint_dir", help="Directory containing HF checkpoint")
parser.add_argument("--name", "-n", default=None, help="Name to use for the model")
parser.add_argument("--torchtune", "-t", action="store_true", default=False, help="Convert to torchtune format")
args = parser.parse_args()
if args.torchtune:
convert_hf_checkpoint_to_tune(model_dir=Path(args.checkpoint_dir), model_name=args.name)
else:
convert_hf_checkpoint(model_dir=Path(args.checkpoint_dir), model_name=args.name)
if __name__ == "__main__":
main() |
daeeb79
to
19aa6c7
Compare
I confirmed that it was falling back to the |
A pointer to this PR and the example commands from the PR description would make a good starting point for docs/new_model.md to (at least partially?) address #1038 / #1041 in conjunction with some explanatory text
Explain how to add to model list....
if added to |
02a1b47
to
b59e840
Compare
@Jack-Khuu I'm a bit stumped trying to get the 8B model working. I'm trying to mentally diff the Attention implementation in torchchat vs transformers to see if I can find anything that would indicate something behaving differently with Grouped Query Attention. I'm not really following the different way that the Results with 3B
NOTE (because I feel compelled): The above snippet uses zero-width-spaces to escape the triple backticks inside the code blocks, so copy-paste at your own peril! Results with 8B
|
Thanks for the details @gabe-l-hart I'll try to give it a gander this weekend. It's weird that 3B works, but 8B doesn't. I assume they use the same template so that at least clears that part |
11102e7
to
1b6d63e
Compare
Looks like this has been open for a several weeks now. It's been on @varunfb and @Jack-Khuu 's plate for a while but they've been swamped with other work. |
Thanks @byjlw! I definitely understand juggling priorities. The path to adding new models in the For the specific issues for Granite Code, the place I'm a bit stuck is trying to figure out why the |
Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* Use the right tokenizer_file name * Use the right transformer_params_key based on the file name in model_params * Use the updated name to indicate HF tokenizers Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Something isn't quite working with this model yet, but the config should be accurate at this point. Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
It was implicitly being pulled in via lm_eval -> transformers, but it's better to have it explicit since we use it directly Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
…HF tokenizers This is a much simplified version of the corresponding logic in transformers. I opted for this so that the full transformers dependency is not added here. CITE: https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L1522 Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
This will allow the jinja2 templates for HF tokenizers to be applied without needing to hard-code the formatter logic. This will likely need to be duplicated in the embedded code version of chat. Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
It was getting pulled in implicitly via flask and lm_eval -> transformers, but better to have it explicit. Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
1b6d63e
to
5607fff
Compare
I just rebased on |
Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
In generate, there were a number of commented-out log lines. These are safe to leave in as long as lazy string interpolation is used. Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Have you tried bisecting the 3B fail? Even if the change was legit and necessary, the type of change that would break the 3B model might give insight in how to "fix" both the 3B and 8B models? |
I'm a bit surprised by this because chatgpt had this to say (understanding that I'm quoting chatgppt about an IBM model to an IBMer, so skating on seriously thin ice!!!): what tokenization scheme does the ibm granite model use Searched 4 sites So in theory, SentencePiece should do the trick? Is it the pre and post processing with regexps? (I think I saw some discussion about regexps in one of your PRs or issues?) In any event, it's cool that we have HF tokenizers because they are a proper superset of SentencePiece+TikToken. (I think @lessw2020 and @kwen2501 had also added some HF tokenizer support for distributed if I remember correctly?) |
That's on my todo list for my next chunk of uninterrupted dev time! I'm hoping that will be today.
Heh, as you know I'm sure, IBM is a big place, so I'm definitely doing a lot of learning myself in this space. My info from the models team is that we've been using the |
And disable it for Granite Code models Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
… in classes The formatted strings may not be perfectly 1:1 with the previous impl, but they should be in line with the official model guidelines: * https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3 * https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-2 Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
There's no formal execution framework for pytest yet, but these were helpful in ensuring that the formatting was working correctly! To run them, install pytest and run `pytest tests/` Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
@Jack-Khuu @mikekgfb @byjlw I figured out where the issues were coming from. It was two things:
To get to the bottom of all of this, I also tweaked the logging a bit. There was already a hard-coded logging config call in NOTE: Many of the existing log lines were using f-strings which will cause the string to be interpolated regardless of whether the logger/level are enabled. I switched all of these to use lazy interpolation with percent-encoding so that it's safe to have them uncommented without a performance hit. Finally, I was getting lost trying to make sure I didn't break anything in the chat templating, so I bit the bullet and added some basic unit tests. They only cover the chat formatting, but they're a place to start. I did not go any further with unit testing, including not adding |
@@ -9,6 +9,10 @@ gguf | |||
# Tiktoken tokenizer for Llama 3 and other advanced models | |||
tiktoken | |||
|
|||
# Tokenizers and jinja2 for other non-llama models that use HF tokenizers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these here, but did not add pytest
(yet). I think there's a pending conversation about introducing optional dependency sets, so it would make sense to add a test
or dev
set at that point, but I didn't want to accidentally carry pytest
along as a runtime dependency.
import os | ||
import sys | ||
|
||
# Make sure tests can import torchchat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a lot cleaner if we move to having a pyproject.toml
or setup.py
to bundle torchchat
as a package that could be installed with pip install -e
.
return tokens | ||
|
||
|
||
B_INST, E_INST = "[INST]", "[/INST]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved these into the class as members to enforce the encapsulation that they should only be used in the context of this formatter
Thanks @gabe-l-hart Will be able to share the details soon and will have them as RFCs on GH so everyone can comment and contribute. |
Dependencies
This PR is part of a sequence in support of adding Granite Code. It depends on merging the following PRs:
Issues
Closes #1262
Description
This PR adds support for Granite Code in 3B and 8B sizes. Given current limitations with the export of tokenizers, they will only work in the python environment with this PR.
Discussion
Usage
To test using these models, I did it both by running with the aliases and by running pointing directly at the checkpoint/tokenizer:
Open Questions
There are several outstanding issues, beyond the upstream tokenizers PR, that need to be solved before this PR is ready for full review:
chat
mode, the models produce very unreliable results, sometimes generating a single token while other times generating a reasonable result but stopping mid-sentence before reaching the max token limit. My current hypothesis is that the chat template is not being used anywhere and we're therefore using thellama
chat template automatically.