Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TransferEngine] Refactor code to hide transport logics from user APIs #51

Merged
merged 22 commits into from
Jan 2, 2025

Conversation

alogfans
Copy link
Collaborator

@alogfans alogfans commented Dec 26, 2024

This PR refactors the code of TransferEngine, so that the following methods are implemented by TransferEngine, i.e., users do not need to obtain Transport pointers beforing calling these methods. This also enables using multiple transports (e.g., rdma and shm) in the same transfer batch.

    BatchID allocateBatchID(size_t batch_size);
    int freeBatchID(BatchID batch_id);
    int submitTransfer(BatchID batch_id,
                       const std::vector<TransferRequest> &entries) ;
    int getTransferStatus(BatchID batch_id, size_t task_id,
                          TransferStatus &status);

We also extract metadata drivers (etcd/redis/http) seperately. User can fill the metadata server parameters with etcd://, redis:// or http:// respectively.

This patch introduces major modification to TransferEngine, and still WIP.

Update on 2024/12/30: We add a patch to extract topology logics outside rdma transport, so that it can be used for other transports/medias

@doujiang24
Copy link
Contributor

Awesome, glad to see this happen.
As we talked in chat, we could add more kinds of devices and protocols base on this PR in the feature.

@alogfans alogfans marked this pull request as ready for review December 27, 2024 05:49
int index = 0;
for (auto &entry : local_topology_.getHcaList()) {
if (entry == local_nic_name) {
context = context_list_[index];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add break after this line would be better?

Copy link
Contributor

@doujiang24 doujiang24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, lgtm~

@ShangmingCai ShangmingCai self-requested a review January 2, 2025 08:25
@stmatengss
Copy link
Collaborator

LGTM

Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run some verification tests. Except for metadata_server will be configured to etcd by default when no prefix is given, other changes LGTM.

I will propose a PR to modify the related part in the vllm integration doc later.

@ShangmingCai ShangmingCai merged commit 64ddda4 into kvcache-ai:main Jan 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants