Clean up server code #5762

ngxson · 2024-02-28T10:32:39Z

Motivation

As seen on #4216 , one of the important task is to refactor / clean up the server code so that it's easier to maintain. However, without a detailed plan, personally I feel like it's unlikely to be archived.

This issue is created so that we can discuss about how to refactor or clean up the code.

The goal is to help existing and new contributors to easily find out where to work in the code base.

Current architecture

The current server implementation has 2 thread: one for HTTP part and one for inference.

The direction from HTTP ==> inference thread is done by llama_server_queue.post(task)
The direction from inference ==> HTTP thread is done by llama_server_response.send(result)

Ideas

Feel free to suggest any ideas that you find helpful (please keep in mind that we do not introduce new features here, just to re-write the code):

Abstract out llama_server_queue and llama_server_response, mutexes are now bound to these 2 structs (already finished)
Server: try to refactor server.cpp #5065
Renaming and move structs to utils.hpp: Clean up server code #5762 (comment)
Server: normalize naming #5779
Investigate httplib to see if we can use more functions already exist in this lib, for example CORS can be done using set_post_routing_handler (the same idea with "middleware" in high level web frameworks)
Merge handlers of /v1/{endpoints} and /{endpoints} to prevent code duplications
Add "/chat/completions" as alias for "/v1/chat/completions" #5722
No more hard-coding js files into hpp, as these files pollute the code base. They should be converted to hpp by using code generation (like how build-info.cpp is generated in common.cpp)
build: generate hex dump of server assets during build #6661

The text was updated successfully, but these errors were encountered:

ngxson · 2024-02-28T11:06:41Z

CC @phymbert and @ggerganov

Do you have any idea for the html / js stuff? Personally I never touched this part before, because I have my own frontend implementation on my side using vitejs.

ggerganov · 2024-02-28T12:23:52Z

No more hard-coding js files into hpp, as these files pollute the code base. They should be converted to hpp by using code generation

This is not needed. The reason to embed them as byte arrays in .hpp files is to have the UI embedded in the application. Otherwise, the app would need to read external files

Our html / js code is not that big, is it worth to split them into separated html / js files? In fact, html + css + js can be in one single file, but should we do that in our case?

Do you have any idea for the html / js stuff? Personally I never touched this part before, because I have my own frontend implementation on my side using vitejs.

The UI code is fine - I wouldn't look to change it for now

I would probably start with:

Better naming of things. We normally prefix names with some common part. As an example, now we have the following methods:

void update_system_prompt();
void notify_system_prompt_changed();
void process_system_prompt_data();

These would be better named as:

void system_prompt_update();
void system_prompt_notify();
void system_prompt_process();

Similar thoughts about variable names. For example, we have:

n_decoded
n_predicted
n_remain

But then also:

num_prompt_tokens
num_prompt_tokens_processed
sent_count

It might seem like unimportant or opinionated change, but IMO having consistency in the names significantly improves the code quality and makes it easier to understand the relations between things

server.cpp has a lot of stuff that should be in utils.hpp
Normalize the coding style with the rest of the project (put opening braces on the same line and add spaces before and after pointers).
Just move things around so that logically similar things are together. As example, looking at server.cpp here is what I see:
- There is struct server_params at the top of the file
- Further down we have struct llama_client_slot which uses struct slot_params located in another file. Probably bring these together and rename llama_client_slot to server_slot
- Then we have struct llama_metrics which would probably be better to call server_metrics
- Near the end we have struct token_translator which seems more suitable for the utils.hpp file

Overall, just try to establish some patterns and consistency. No need for abstractions

ngxson · 2024-02-28T12:56:26Z

This is not needed. The reason to embed them as byte arrays in .hpp files is to have the UI embedded in the application. Otherwise, the app would need to read external files

I mean these .hpp file instead of being inside the repo, they can be generated from the source file (html / js) in build time, much like the way we generate build-info.cpp

The UI code is fine - I wouldn't look to change it for now

Thanks for the confirmation, I'll leave them untouched.

Your suggestions regarding renaming are clear for me and I'll having a look into this in the next days, I also agree that there is no need for more abstractions at the moment.

The utils.hpp does not have much inside it because initially I don't want the move to introduce too many breaking changes to other opening PR. But with your confirmation I'm more confident now to start moving struct to utils.hpp progressively.

ngxson added the enhancement New feature or request label Feb 28, 2024

ngxson added the good first issue Good for newcomers label Feb 28, 2024

ngxson mentioned this issue Mar 12, 2024

server : improvements and maintenance #4216

Open

10 tasks

HanClinto mentioned this issue Jun 5, 2024

Feature Request: Multi session chat support #7758

Closed

4 tasks

ngxson closed this as completed Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up server code #5762

Clean up server code #5762

ngxson commented Feb 28, 2024 •

edited

Loading

ngxson commented Feb 28, 2024

ggerganov commented Feb 28, 2024 •

edited

Loading

ngxson commented Feb 28, 2024 •

edited

Loading

Clean up server code #5762

Clean up server code #5762

Comments

ngxson commented Feb 28, 2024 • edited Loading

Motivation

Current architecture

Ideas

ngxson commented Feb 28, 2024

ggerganov commented Feb 28, 2024 • edited Loading

ngxson commented Feb 28, 2024 • edited Loading

ngxson commented Feb 28, 2024 •

edited

Loading

ggerganov commented Feb 28, 2024 •

edited

Loading

ngxson commented Feb 28, 2024 •

edited

Loading