Skip to content
This repository has been archived by the owner on Feb 5, 2023. It is now read-only.

[New Feature] Add default retry to every /api/ask endpoint to utilize connection pool. #23

Open
ahmetkca opened this issue Dec 28, 2022 · 11 comments

Comments

@ahmetkca
Copy link

I think we can add default number of retry to each incoming request to /api/ask endpoint. Instead of returning Content-Type: 'application/json' we could return Content-Type: 'text/event-stream'. With this change it might be slightly slower than the original but we would at least try at least 3 times with different agents from connection pool.

For example, proposed change:

/api/ask endpoint content type would be text/event-stream instead of application/json:

curl "http://localhost:8080/api/ask" -X POST --header 'Authorization: <API_KEY>' -d '{"content": "Hello world"}'

default minimum retry # is 3.

In this example /api/ask endpoint failed with first two agent and was successful at the third one.

data: retry #1 failed.

data: retry #2 failed.

data: {"message": { ... }, "conversation_id": " ... "}

data: [DONE]

I believe this would utilize the connection pool even better.

@acheong08
Copy link
Owner

The retry functionality has been added to https://github.com/ChatGPT-Hackers/ChatGPT-API-server/tree/dev

@acheong08
Copy link
Owner

I'm still testing it out

@acheong08
Copy link
Owner

There is a bug I don't know how to fix

@0xRaduan
Copy link
Contributor

0xRaduan commented Dec 29, 2022

IMO, you need to use some sort of blocking mechanism, so you can retrieve quickly websockets, which are available at the moment, or just passively wait for any new one to be available.

One way to do that is to use a sort of priority queue, where you will be grading just based on this parameter(available/not_available). And make sure, that you can write to it in a multi-threaded fashion updates on whether websocket is used or not.

Also, a good feature to have would be to add a number of request, that was already made to a particular connection. I am not aware of exact number of request per hour, that are permitted, but in my testing I have hit some limits within an hour. In this case we don't really want to navigate our requests to this websocket.

@acheong08
Copy link
Owner

I was gonna write a queue system but I'm not quite sure how to implement it correctly. Gin is inherently multi-threaded and there is already a blocking mechanism in place for the connection pool though.

@acheong08
Copy link
Owner

Also, a good feature to have would be to add a number of request, that was already made to a particular connection. I am not aware of exact number of request per hour, that are permitted, but in my testing I have hit some limits within an hour. In this case we don't really want to navigate our requests to this websocket.

Since it is cycling by oldest connection first, each connection should have a similar number of requests. If limits are hit, all existing connections would also be rate limited in the subsequent request.

@icycandy
Copy link

icycandy commented Jan 4, 2023

In my experience, one conversation_id is bind to one OpenAI account, one conversation_id can be used multi times, representing a long multi round conversation. So different accounts may be used at different frequencies.

Another experience is that when the api server has not been connected for a period of time, the first request will returns {"id": "65f76efa-e0cb-47c1-a054-6f6b5fd5888d", "message": "error", "data": "Wrong response code"}} . But the immediately next request will return normally. Guess that the connection was becoming invalid because there was no connection for long time? (the firefox tab needs refresh?)

If we maintain rate of each account, then

  • if one account was rate limited, we can redirect request to other accounts
  • if one account has no connection for a period of time (such as 10 minutes), we can send a fake request to keep it alive

@acheong08
Copy link
Owner

The error handling can be done on the client side. If you get a message of "error", let it sleep a second or two and then try again. Doing this from the server could clog up the connection and compete with actual requests, introducing additional downtime and errors

@acheong08
Copy link
Owner

if one account was rate limited, we can redirect request to other accounts

Possible. Will consider it.

@icycandy
Copy link

icycandy commented Jan 4, 2023

How about the other one: regularly send fake request to idle connection?

@icycandy
Copy link

icycandy commented Jan 4, 2023

How about the other one: regularly send fake request to idle connection?

I'm not sure whether regulary send fake request will help keeping the connection alive. Ignore me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants