fix: Prevent going past token limit in OpenAI calls in PromptNode #4179

sjrl · 2023-02-16T13:25:23Z

Related Issues

fixes feat: PromptModel should ensure user's prompts don't overflow the token limit #4059

Proposed Changes:

Refactor code used to interact with OpenAI API. Aggregating code into one file and removing duplicate code.
retry_with_exponential_backoff has been moved to one location. Wraps the new util function openai_request which makes the request to the OpenAI API and handles the raising of appropriate errors and retries. Tested locally to make sure the retry mechanism still works.
Updated to use MODEL_TO_ENCODING added in tiktoken==0.2.0 so we can automatically look up the correct tokenizer for the requested model.
Prevent token overflow when calling OpenAI API in PromptNode. I accomplished this by truncating the prompt to fit within the max_token_limit. I considered using the solution in the OpenAIAnswerGenerator since it removes documents from the context until it fits within the max token length, which I think is a better solution for Retrieval Augmented QA. However, for the PromptNode it is not easily possible to determine the use case at model invocation time so I opted to truncate the end of the prompt instead and log a warning to the user.

How did you test it?

Add a new test to check truncation
Refactorings are tested using existing tests

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added tests that demonstrate the correct behavior of the change
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

…ion.

… variable.

… PromptNode and OpenAIAnswerGenerator

…automatically determine correct tokenizer for the requested model

…en_limit

TuanaCelik · 2023-02-17T11:06:25Z

Hey @sjrl - I just tried this branch out with @julian-risch and by calling the following:

result = prompter.prompt(prompt_template=template, tweets=twitter_stream)

We get the following response from OpenAI:

Status code: 400
Response body: {
  "error": {
    "message": "This model's maximum context length is 4097 tokens, however you requested 4100 tokens (4000 in your prompt; 100 for the completion). Please reduce your prompt; or completion length.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

sjrl · 2023-02-17T11:59:53Z

Hey @TuanaCelik thanks for checking this and catching this bug! I didn't take into account the answer length into the token limit. I'll do that now.

…ncation amount. Moved truncation higher up to PromptNode level.

…en_limit

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

…en_limit

mathislucka · 2023-02-28T08:08:43Z

@sjrl just a question, wouldn't we need the actual vocabulary of GPT-3.5 to check the number of tokens reliably. GPT-2 might tokenize differently, am I wrong? Or is this more of an approximation and we are fine with some divergence?

sjrl · 2023-02-28T08:42:41Z

@sjrl just a question, wouldn't we need the actual vocabulary of GPT-3.5 to check the number of tokens reliably. GPT-2 might tokenize differently, am I wrong? Or is this more of an approximation and we are fine with some divergence?

That is a great point @mathislucka, which is why we also support for the tiktoken library which is the official OpenAI repo for tokenizing GPT-3.5 models. However, we have the GPT-2 tokenizer as a fallback because the tiktoken library does not have wheels built for ARM64 Linux (issue here) so we aren't able to provide it in the Haystack Docker Image.

sjrl · 2023-02-28T08:43:43Z

haystack/utils/openai_utils.py

+    if "davinci" in model_name:
+        max_tokens_limit = 4000
+        if USE_TIKTOKEN:
+            tokenizer_name = MODEL_TO_ENCODING.get(model_name, "p50k_base")


@mathislucka This is where we automatically load the correct GPT-3.5 tokenizer if the tiktoken library is available.

julian-risch

Almost ready to go. 👍 The test case for the token limit is failing because the log message is different than the one we check for in the test. This needs to be changed. There are two lines with tt = PromptTemplate that you could refactor to prompt_template = PromptTemplate to increase readability. Other than that the changes look good to me.

test/nodes/test_prompt_node.py

julian-risch

Looks good to me! 🚀

…en_limit

zoltan-fedor · 2023-03-12T00:04:55Z

@sjrl, @julian-risch ,

I believe this PR has broken the use of flan-t5 models, which do not have a token limit (well, there is one in the tokenizer, see https://huggingface.co/google/flan-t5-xl/blob/main/tokenizer_config.json#L106, but that is NOT limiting what that model can actually handle, so for that type of model automatically using the limit from the tokenizer is wrong).

This is because the T5 models use a relative attention mechanism and so can handle any sequence with the only constraint being the memory of the GPU. (see google-research/text-to-text-transfer-transformer#273)

Now automatically the 512 token limit from the tokenizer is applied to the prompt supplied to the flan-t5 models.

Shouldn't there be a way to overwrite this new token limit coming from the tokenizer for scenarios like this?

sjrl added 3 commits February 16, 2023 10:20

Refactoring to remove duplicate code when using OpenAI API

05780e1

Adding docstrings

7b425c4

Fix mypy issue

f4a8e58

github-actions bot added topic:LLM topic:retriever type:documentation Improvements on the docs labels Feb 16, 2023

sjrl marked this pull request as ready for review February 16, 2023 13:26

sjrl requested a review from a team as a code owner February 16, 2023 13:26

sjrl requested review from julian-risch and removed request for a team February 16, 2023 13:26

Moved retry mechanism to openai_request function in openai_utils

5e851ab

github-actions bot added the topic:tests label Feb 16, 2023

sjrl added 8 commits February 16, 2023 14:55

Migrate OpenAI embedding encoder to use the openai_request util funct…

6b8dbb0

…ion.

Adding docstrings.

c274885

pylint import errors

4479f41

More pylint import errors

34a7943

Move construction of headers into openai_request and api_key as input…

31f707a

… variable.

Made _openai_text_completion_tokenization_details so can be resued in…

feb1a24

… PromptNode and OpenAIAnswerGenerator

Add prompt truncation to the PromptNode.

952214e

Removed commented out test.

68e18fe

github-actions bot removed the topic:tests label Feb 16, 2023

Bump version of tiktoken to 0.2.0 so we can use MODEL_TO_ENCODING to …

7453b9a

…automatically determine correct tokenizer for the requested model

github-actions bot added topic:build/distribution topic:dependencies labels Feb 16, 2023

sjrl added 2 commits February 17, 2023 08:55

Merge branch 'main' of github.com:deepset-ai/haystack into openai/tok…

8e017b4

…en_limit

Change one method back to public

88444a0

sjrl added 2 commits February 17, 2023 13:16

Fixed bug in token length truncation. Included answer length into tru…

2793ffb

…ncation amount. Moved truncation higher up to PromptNode level.

Merge branch 'main' of github.com:deepset-ai/haystack into openai/tok…

017acb9

…en_limit

sjrl and others added 4 commits February 23, 2023 13:25

Update haystack/utils/openai_utils.py

922d141

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

Updated docstrings, and added integration marks

1dd9485

Merge branch 'main' of github.com:deepset-ai/haystack into openai/tok…

2261bf9

…en_limit

Remove comment

bf37ff2

sjrl requested a review from agnieszka-m February 27, 2023 16:12

agnieszka-m approved these changes Feb 28, 2023

View reviewed changes

sjrl commented Feb 28, 2023

View reviewed changes

sjrl requested a review from julian-risch February 28, 2023 08:44

julian-risch requested changes Feb 28, 2023

View reviewed changes

test/nodes/test_prompt_node.py Outdated Show resolved Hide resolved

Update test

ff51d35

This was referenced Feb 28, 2023

Consistent naming for counting tokens #4297

Closed

In OpenAIAnswerGenerator avoid tokenizing all documents #4298

Closed

sjrl added 2 commits March 1, 2023 09:24

Fix test

aa80052

Update test

8134d76

sjrl requested a review from julian-risch March 1, 2023 10:25

sjrl mentioned this pull request Mar 1, 2023

feat: Add Azure as OpenAI endpoint #4170

Merged

6 tasks

julian-risch approved these changes Mar 1, 2023

View reviewed changes

sjrl and others added 5 commits March 2, 2023 09:49

Merge branch 'main' of github.com:deepset-ai/haystack into openai/tok…

e47de9f

…en_limit

Merge branch 'main' of github.com:deepset-ai/haystack into openai/tok…

1185ce0

…en_limit

Updated openai_request function to work with the azure api

ae5d933

Fixed error in _openai_encodery.py

5a50ce5

Merge branch 'main' into openai/token_limit

3c8db04

vblagoje merged commit 1a42166 into main Mar 3, 2023

vblagoje deleted the openai/token_limit branch March 3, 2023 12:49

TuanaCelik mentioned this pull request Mar 10, 2023

Prompt Truncation Bug for models with smaller max_seq_length - truncation of all prompt #4379

Closed

1 task

sjrl mentioned this pull request Mar 13, 2023

Revisit token limit for HuggingFace models #4388

Closed

LisaGLH mentioned this pull request Jun 21, 2023

Token Limit in Embedding Calculation doesn't correspond to newest OpenAI limits #5184

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Prevent going past token limit in OpenAI calls in PromptNode #4179

fix: Prevent going past token limit in OpenAI calls in PromptNode #4179

sjrl commented Feb 16, 2023 •

edited

Loading

TuanaCelik commented Feb 17, 2023

sjrl commented Feb 17, 2023

mathislucka commented Feb 28, 2023

sjrl commented Feb 28, 2023

sjrl Feb 28, 2023

julian-risch left a comment

julian-risch left a comment

zoltan-fedor commented Mar 12, 2023 •

edited

Loading

fix: Prevent going past token limit in OpenAI calls in PromptNode #4179

fix: Prevent going past token limit in OpenAI calls in PromptNode #4179

Conversation

sjrl commented Feb 16, 2023 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

TuanaCelik commented Feb 17, 2023

sjrl commented Feb 17, 2023

mathislucka commented Feb 28, 2023

sjrl commented Feb 28, 2023

sjrl Feb 28, 2023

Choose a reason for hiding this comment

julian-risch left a comment

Choose a reason for hiding this comment

julian-risch left a comment

Choose a reason for hiding this comment

zoltan-fedor commented Mar 12, 2023 • edited Loading

sjrl commented Feb 16, 2023 •

edited

Loading

zoltan-fedor commented Mar 12, 2023 •

edited

Loading