-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up server code #5762
Comments
CC @phymbert and @ggerganov Do you have any idea for the html / js stuff? Personally I never touched this part before, because I have my own frontend implementation on my side using vitejs. |
This is not needed. The reason to embed them as byte arrays in .hpp files is to have the UI embedded in the application. Otherwise, the app would need to read external files
The UI code is fine - I wouldn't look to change it for now I would probably start with:
void update_system_prompt();
void notify_system_prompt_changed();
void process_system_prompt_data(); These would be better named as: void system_prompt_update();
void system_prompt_notify();
void system_prompt_process(); Similar thoughts about variable names. For example, we have:
But then also:
It might seem like unimportant or opinionated change, but IMO having consistency in the names significantly improves the code quality and makes it easier to understand the relations between things
Overall, just try to establish some patterns and consistency. No need for abstractions |
I mean these .hpp file instead of being inside the repo, they can be generated from the source file (html / js) in build time, much like the way we generate
Thanks for the confirmation, I'll leave them untouched. Your suggestions regarding renaming are clear for me and I'll having a look into this in the next days, I also agree that there is no need for more abstractions at the moment. The |
Motivation
As seen on #4216 , one of the important task is to refactor / clean up the server code so that it's easier to maintain. However, without a detailed plan, personally I feel like it's unlikely to be archived.
This issue is created so that we can discuss about how to refactor or clean up the code.
The goal is to help existing and new contributors to easily find out where to work in the code base.
Current architecture
The current server implementation has 2 thread: one for HTTP part and one for inference.
llama_server_queue.post(task)
llama_server_response.send(result)
Ideas
Feel free to suggest any ideas that you find helpful (please keep in mind that we do not introduce new features here, just to re-write the code):
Abstract out
llama_server_queue
andllama_server_response
, mutexes are now bound to these 2 structs (already finished)Server: try to refactor server.cpp #5065
Renaming and move structs to
utils.hpp
: Clean up server code #5762 (comment)Server: normalize naming #5779
Investigate httplib to see if we can use more functions already exist in this lib, for example CORS can be done using
set_post_routing_handler
(the same idea with "middleware" in high level web frameworks)Merge handlers of
/v1/{endpoints}
and/{endpoints}
to prevent code duplicationsAdd "/chat/completions" as alias for "/v1/chat/completions" #5722
No more hard-coding js files into hpp, as these files pollute the code base. They should be converted to hpp by using code generation (like how
build-info.cpp
is generated incommon.cpp
)build
: generate hex dump of server assets during build #6661The text was updated successfully, but these errors were encountered: