Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ucx support (prototype) #18631

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

lucyge2022
Copy link
Contributor

add UcpServer to accept ucx(ucp) protocols thru network.

Author: Lucy Ge <lucy.ge@alluxio.com>
Date:   Mon Jun 24 15:23:06 2024 -0700

    Squashed commit of the following:

    commit 36f0ea4
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Thu Dec 14 15:31:43 2023 -0800

        add test to alloc gpu mem and call readRMA, mark test ignore for now whenever there's available hardward fabric env

    commit f0ba8ac
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Tue Dec 12 17:07:21 2023 -0800

        1. make LocalCacheManager.cache() available in interface
        2. allocate direct ByteBuffer and then register with UcpMemory to provide UcpMemory wrapped buffer to avoid ucx failure to allocate user mem ( mm_sysv.c:114  UCX  ERROR   failed to allocate 4096 bytes with mm for user memory )
        3. make UcxReadTest#testClientServerr do random unaligned read, and add testStandaloneServer as a sanity test for standalone UcpServer
        4. downgrade debugging logs' level from info to debug
        5. remove standalone testing process class UcpClientTest

    commit 7154d4f
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Fri Dec 8 22:47:37 2023 -0800

        add getUcpMemory in wrapper cachemgr implementations

    commit 5b72806
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Fri Dec 8 22:27:51 2023 -0800

        instantiate listener in start() instead of constructor

    commit 1def30d
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Fri Dec 8 21:43:36 2023 -0800

        fixes for ucp server module

    commit 99eea03
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Thu Dec 7 17:39:46 2023 -0800

        additonal changes to make UcpServer a module

    commit a7237fb
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Thu Dec 7 13:05:14 2023 -0800

        WIP - make UcpServer a module

    commit c6cd05a
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Dec 6 17:19:31 2023 -0800

        stash changes - worker version using alluxioworker process to start standalone ucp server

    commit f293e49
    Author: LucyGe <lucy.ge@alluxio.com>
    Date:   Wed Dec 6 21:13:50 2023 +0000

        compile error and add start scripts for UcpServer / UcpClientTest

    commit 3a2c618
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Dec 6 11:04:14 2023 -0800

        stash local changes to debug stressucxbench

    commit 9bc39c5
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Thu Nov 30 17:01:14 2023 -0800

        1. add cache / getUcpMemory api in CacheManager interface
        2. add error case handling in getUcpMemory
        3. add UcxConnectionPool
        4. ReadRequestRMAHandler should break without error if can't serve requested read len
        5. have UcpServer own its own cachemanager instead of relying on worker, add temporary prefill func to warm up cache
        6. fix UcxDataReader to return correct read len
        7. add StressUcxBench

    commit 4eb130b
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Fri Nov 17 14:27:09 2023 -0800

        WIP - making multi-iteration read UT work

    commit c573235
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Nov 8 15:07:00 2023 -0800

        Initial working version of ReadRequestRMA for both client and server + add UT UcxReadTest

    commit 3a13395
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Fri Nov 3 17:18:16 2023 -0700

        mv UT class

    commit 2150e4d
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Fri Nov 3 17:00:05 2023 -0700

        WIP -add readRMA in reader client + add related read UT

    commit 131441d
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Fri Nov 3 12:24:18 2023 -0700

        1. fixes on the buffer to send back info in accepting conn
        2. fix for UcxConnectionTest.testEstablishConnection, it's now working
        to test the UcxConnection establishment logics

    commit 653da91
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Nov 1 16:55:56 2023 -0700

        test file name change

    commit 5bbe0fc
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Nov 1 15:51:43 2023 -0700

        sort pom

    commit e0d8592
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Nov 1 15:47:07 2023 -0700

        compile errors

    commit 49944e2
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Nov 1 15:04:30 2023 -0700

        add missing files

    commit 29106a7
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Nov 1 15:02:41 2023 -0700

        WIP - add init new conn / accept income conn logics

    commit 2104843
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Tue Oct 31 11:08:20 2023 -0700

        WIP - 1) add RMA read request handler 2) tag establishment fixes

    commit 9e2648e
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Oct 25 21:38:02 2023 -0700

        WIP - basic skeleton

    commit c4206be
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Mon Oct 23 10:22:48 2023 -0700

        add missing file in refactoring

    commit 2928a78
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Mon Oct 23 10:10:18 2023 -0700

        WIP - refactor

    commit 2e7950b
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Oct 18 16:21:13 2023 -0700

        worker end-to-end read workflow version of ucpserver/ucxDataReader

    commit f443ec6
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Wed Oct 11 16:45:50 2023 -0700

        WIP:
        1. req sendTagged and recvTagged should have same buffersize
        2. use different tag for different client inetaddr
        3. start recvReq on accepting conn on server side, and keep recvReq for
        the same client one after another

    commit 37ee296
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Tue Oct 10 21:21:37 2023 -0700

        WIP - add UcpServer / UcpClientTest standalone main()

    commit 9d58cc4
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Mon Oct 9 14:54:57 2023 -0700

        WIP -
        1. use correct dependency in pom
        2. add test to initially test client/server

    commit 3d5c353
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Fri Oct 6 12:37:37 2023 -0700

        use abs path from local for now

    commit 95a69e0
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Tue Sep 19 14:17:39 2023 -0700

        jar change

    commit 1472081
    Author: Lucy Ge <lucy.ge@alluxio.com>
    Date:   Tue Sep 19 14:01:03 2023 -0700

        ucp server/client WIP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant