Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

[FEATURE] OPT-175B service authentication and new priority queue #852

Merged
merged 2 commits into from
Jan 27, 2023

Conversation

JubilantJerry
Copy link
Contributor

See the issue page #700. I made some design choices based on my comment there, see if the design choices make sense.

@JubilantJerry
Copy link
Contributor Author

JubilantJerry commented Jan 11, 2023

Example informal test that I've done:

I set the weight for API key "key1" to be default (10) and "key2" to be 3 times higher (30).

I've run the following on the command line

for i in $(seq 0 20); do (sleep 0.$[100 + ($RANDOM % 900)];  curl -d '{"model": "default", "prompt": ["This year,", "This month,", "This day,", "Today, we"], "max_tokens": 16, "api_key": "key2"}' -H 'Content-Type:application/json' localhost:20001/completions -s > /dev/null && echo "AAAAAAA$i" ) & done
for i in $(seq 0 20); do (sleep 0.$[100 + ($RANDOM % 900)];  curl -d '{"model": "default", "prompt": ["This year,", "This month,", "This day,", "Today, we"], "max_tokens": 17, "api_key": "key1"}' -H 'Content-Type:application/json' localhost:20001/completions -s > /dev/null && echo "BBBBBBB$i" ) & done
for i in $(seq 0 20); do (sleep 0.$[100 + ($RANDOM % 900)];  curl -d '{"model": "default", "prompt": ["This year,", "This month,", "This day,", "Today, we"], "api_key": "key1", "top_k": 10}' -H 'Content-Type:application/json' localhost:20001/logprobs -s > /dev/null && echo "CCCCCCC$i" ) & done
for i in $(seq 0 20); do (sleep 0.$[100 + ($RANDOM % 900)];  curl -d '{"model": "default", "prompt": ["This year,", "This month,", "This day,", "Today, we"], "max_tokens": 18}' -H 'Content-Type:application/json' localhost:20001/completions -s > /dev/null && echo "DDDDDDD$i" ) & done
for i in $(seq 0 20); do (sleep 0.$[100 + ($RANDOM % 900)]; curl -d '{"model": "default", "prompt": ["This year,", "This month,", "This day,", "Today, we"], "top_k": 10}' -H 'Content-Type:application/json' localhost:20001/logprobs -s > /dev/null && echo "EEEEEEE$i" ) & done

At the same time, I start making requests with several tabs on the web interface.

The result is that:

  1. The web interface requests are handled with low latency,
  2. The AAAAAAA requests are served about 3 times more often than the BBBBBBB requests and CCCCCCC requests,
  3. The DDDDDDD and EEEEEEE requests are served about 10 times less often than the BBBBBBB requests,

@merrymercy
Copy link
Member

Thanks for your contribution. I replied to your comments. I will review this PR later this week.

@merrymercy merrymercy self-assigned this Jan 11, 2023
@JubilantJerry JubilantJerry force-pushed the pr_branch branch 2 times, most recently from 1fb4654 to 6c33a6d Compare January 11, 2023 15:58
…a-projects#700)

This commit also adds some checks to improve the robustness of the server
process against adversarial inputs, since there are plans to open the API
to non-key users.
@merrymercy
Copy link
Member

merrymercy commented Jan 27, 2023

Looks like the code is well-documented and well-tested. The design is what we want in #700 and it carefully considers a lot of potential issues.
Thanks for your contribution! I will do more tests and try to deploy it on our website. I will merge this first as the code holds very high quality.

Copy link
Contributor Author

@JubilantJerry JubilantJerry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to hear, I haven't used automated test suites unlike the rest of Alpa but hopefully the informal tests are good enough coverage. I realize that one of the comments is wrong, but that can be fixed elsewhere (I'll include it in #870 in case that feature gets added)

api_key_weights = keys["api_key_weights"]

# Scheduling
# Each authentication choice - endpoint pair contains a separate queue,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found an error in the comment, there's a separate queue for each authentication choice, not for the pair of authentication choice and endpoint

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants