Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to figure out model sequence length limits #1579

Merged
merged 7 commits into from
Jun 28, 2023

Conversation

JosselinSomervilleRoberts
Copy link
Contributor

This script is used to find the limits required by the window service

@percyliang
Copy link
Contributor

Can we make this a scenario that we run officially (kind of like a unit test) rather than a one-off script?

@JosselinSomervilleRoberts
Copy link
Contributor Author

Sure, I could look into that. Fox now, since @yifanmai needs this (I think) to fix the canary runs. Should we keep it as a script and then I could make it into a scenario in a different PR?

@yifanmai
Copy link
Collaborator

This can't be a scenario because scenarios cannot access the tokenizer. synthetic_efficiency has the same issue; we have to manually generate a static file for each tokenizer.

@yifanmai yifanmai changed the title Script to figure out limits Script to figure out model sequence length limits May 23, 2023
Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@teetone could you also take a look at this?

scripts/compute_request_limits.py Outdated Show resolved Hide resolved
scripts/compute_request_limits.py Outdated Show resolved Hide resolved
scripts/compute_request_limits.py Outdated Show resolved Hide resolved
scripts/compute_request_limits.py Outdated Show resolved Hide resolved

# model_name, tokenizer_name, prefix and suffix are passed as arguments
parser = argparse.ArgumentParser()
parser.add_argument("--model_name", type=str, default="writer/palmyra-base")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional suggestion: helm-run and helm-summarize use hyphens for flags; you could considering doing that here for consistency. Note that argparse will autoconvert field names to use underscore e.g. args.model_name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this, I will just skip it for now

scripts/compute_request_limits.py Outdated Show resolved Hide resolved
scripts/compute_request_limits.py Outdated Show resolved Hide resolved
scripts/compute_request_limits.py Outdated Show resolved Hide resolved
scripts/compute_request_limits.py Show resolved Hide resolved
return lower_bound + max_prompt_length


def check_limits(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for double-checking i.e. if the measurement was performed correctly, should this method always return that the limits are correct??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally it should, it's was just for me to check that my implementation was correct and that I did not have a +1 problem with my binary search

Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@teetone could you also take a look at this?

@yifanmai yifanmai requested a review from teetone May 23, 2023 20:30
@yifanmai yifanmai merged commit 259a355 into main Jun 28, 2023
@yifanmai yifanmai deleted the josselin-script-limits branch June 28, 2023 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants