Gateway error when processing non-200 model response #722

mkrueger12 · 2024-11-01T18:55:47Z

What Happened?

Issue Description

The gateway fails to properly parse error messages when accessing Claude models on GCP Vertex AI via the streaming endpoint. The issue occurs when the model returns a non-200 response.

Environment

Gateway Endpoint: /v1/chat/completions
Model: anthropic.claude-3-5-sonnet@20240620
Provider: GCP Vertex AI
Gateway is locally hosted

Configuration

{
  "strategy": {
    "mode": "loadbalance"
  },
  "targets": [
    {
      "provider": "vertex-ai",
      "vertex_project_id": "dev",
      "vertex_service_account_json": "SA",
      "vertex_region": "us-east5",
      "weight": 1
    }
  ]
}

Error Output

Gateway logs:

event: error
2024-11-01 11:43:41 ^
2024-11-01 11:43:41 
2024-11-01 11:43:41 SyntaxError: Unexpected token 'e', "event: err"... is not valid JSON
2024-11-01 11:43:41     at JSON.parse (<anonymous>)
2024-11-01 11:43:41     at Xt (file:///app/build/start-server.js:2:71350)
2024-11-01 11:43:41     at file:///app/build/start-server.js:2:146756
2024-11-01 11:43:41     at async file:///app/build/start-server.js:2:146361

Application logs:

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Root Cause Analysis

The error occurs when the model returns a non-200 response
Verified by running parallel requests directly against the GCP API

Additional Notes

This issue reproduces with other configurations, including fallback mode
The error is not specific to the provided configuration

What Should Have Happened?

The gateway should return an appropriate error response to the application.

Hitting the GCP api directly returns:

529 {"type":"error","error":{"type":"overloaded_error","message":"Overloaded"}}

Relevant Code Snippet

No response

Your Twitter/LinkedIn

https://www.linkedin.com/in/maxkrueger1/

The text was updated successfully, but these errors were encountered:

mkrueger12 · 2024-11-02T20:22:37Z

2024-11-02 14:21:05 Your AI Gateway is now running on http://localhost:8787 🚀
2024-11-02 14:21:07 chunk event: error
2024-11-02 14:21:07 data: {"type":"error","error":{"details":null,"type":"overloaded_error","message":"Overloaded"}           }
2024-11-02 14:21:07 undefined:1
2024-11-02 14:21:07 event: error
2024-11-02 14:21:07 ^
2024-11-02 14:21:07 
2024-11-02 14:21:07 SyntaxError: Unexpected token 'e', "event: err"... is not valid JSON
2024-11-02 14:21:07     at JSON.parse (<anonymous>)
2024-11-02 14:21:07     at Xt (file:///app/build/start-server.js:2:71373)
2024-11-02 14:21:07     at file:///app/build/start-server.js:2:146779
2024-11-02 14:21:07     at async file:///app/build/start-server.js:2:146384

mkrueger12 · 2024-11-02T21:21:31Z

Here is the file where the error is occurring: src/providers/google-vertex-ai/chatComplete.ts

This method does not have error handling:

export const VertexAnthropicChatCompleteStreamChunkTransform: (
  response: string,
  fallbackId: string,
  streamState: Record<string, boolean>
) => string | undefined = (responseChunk, fallbackId, streamState) => {
  let chunk = responseChunk.trim();
  
  if (
    chunk.startsWith('event: ping') ||
    chunk.startsWith('event: content_block_stop') ||
    chunk.startsWith('event: vertex_event')
  ) {
    return;
  }

  if (chunk.startsWith('event: message_stop')) {
    return 'data: [DONE]\n\n';
  }

  chunk = chunk.replace(/^event: content_block_delta[\r\n]*/, '');
  chunk = chunk.replace(/^event: content_block_start[\r\n]*/, '');
  chunk = chunk.replace(/^event: message_delta[\r\n]*/, '');
  chunk = chunk.replace(/^event: message_start[\r\n]*/, '');
  chunk = chunk.replace(/^data: /, '');
  chunk = chunk.trim();

  const parsedChunk: AnthropicChatCompleteStreamResponse = JSON.parse(chunk);

  if (
    parsedChunk.type === 'content_block_start' &&
    parsedChunk.content_block?.type === 'text'
  ) {
    streamState.containsChainOfThoughtMessage = true;
    return;
  }

  if (parsedChunk.type === 'message_start' && parsedChunk.message?.usage) {
    return (
      `data: ${JSON.stringify({
        id: fallbackId,
        object: 'chat.completion.chunk',
        created: Math.floor(Date.now() / 1000),
        model: parsedChunk.message?.usage,
        provider: GOOGLE_VERTEX_AI,
        choices: [
          {
            delta: {
              content: '',
            },
            index: 0,
            logprobs: null,
            finish_reason: null,
          },
        ],
        usage: {
          prompt_tokens: parsedChunk.message?.usage?.input_tokens,
        },
      })}` + '\n\n'
    );
  }

  if (parsedChunk.type === 'message_delta' && parsedChunk.usage) {
    return (
      `data: ${JSON.stringify({
        id: fallbackId,
        object: 'chat.completion.chunk',
        created: Math.floor(Date.now() / 1000),
        model: '',
        provider: GOOGLE_VERTEX_AI,
        choices: [
          {
            index: 0,
            delta: {},
            finish_reason: parsedChunk.delta?.stop_reason,
          },
        ],
        usage: {
          completion_tokens: parsedChunk.usage?.output_tokens,
        },
      })}` + '\n\n'
    );
  }

  const toolCalls = [];
  const isToolBlockStart: boolean =
    parsedChunk.type === 'content_block_start' &&
    !!parsedChunk.content_block?.id;
  const isToolBlockDelta: boolean =
    parsedChunk.type === 'content_block_delta' &&
    !!parsedChunk.delta.partial_json;
  const toolIndex: number = streamState.containsChainOfThoughtMessage
    ? parsedChunk.index - 1
    : parsedChunk.index;

  if (isToolBlockStart && parsedChunk.content_block) {
    toolCalls.push({
      index: toolIndex,
      id: parsedChunk.content_block.id,
      type: 'function',
      function: {
        name: parsedChunk.content_block.name,
        arguments: '',
      },
    });
  } else if (isToolBlockDelta) {
    toolCalls.push({
      index: toolIndex,
      function: {
        arguments: parsedChunk.delta.partial_json,
      },
    });
  }

  return (
    `data: ${JSON.stringify({
      id: fallbackId,
      object: 'chat.completion.chunk',
      created: Math.floor(Date.now() / 1000),
      model: '',
      provider: GOOGLE_VERTEX_AI,
      choices: [
        {
          delta: {
            content: parsedChunk.delta?.text,
            tool_calls: toolCalls.length ? toolCalls : undefined,
          },
          index: 0,
          logprobs: null,
          finish_reason: parsedChunk.delta?.stop_reason ?? null,
        },
      ],
    })}` + '\n\n'
  );
};

narengogi · 2024-11-11T06:47:14Z

Here is the file where the error is occurring: src/providers/google-vertex-ai/chatComplete.ts

This method does not have error handling:

Thanks for reporting this @mkrueger12 and thanks for being so detailed in the description!!
I'll fix this.

Usually no provider returns an error in a chunk, so there is no error handling done here, but it's google, they always gotta do something weird with their API standards, xd.

mkrueger12 added the bug Something isn't working label Nov 1, 2024

github-actions bot added the triage label Nov 1, 2024

narengogi mentioned this issue Nov 11, 2024

chore: Handle overloaded error in anthropic streaming response #741

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gateway error when processing non-200 model response #722

Gateway error when processing non-200 model response #722

mkrueger12 commented Nov 1, 2024 •

edited

Loading

mkrueger12 commented Nov 2, 2024

mkrueger12 commented Nov 2, 2024 •

edited

Loading

narengogi commented Nov 11, 2024

Gateway error when processing non-200 model response #722

Gateway error when processing non-200 model response #722

Comments

mkrueger12 commented Nov 1, 2024 • edited Loading

What Happened?

Issue Description

Environment

Configuration

Error Output

Root Cause Analysis

Additional Notes

What Should Have Happened?

Relevant Code Snippet

Your Twitter/LinkedIn

mkrueger12 commented Nov 2, 2024

mkrueger12 commented Nov 2, 2024 • edited Loading

narengogi commented Nov 11, 2024

mkrueger12 commented Nov 1, 2024 •

edited

Loading

mkrueger12 commented Nov 2, 2024 •

edited

Loading