Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev(hansbug): add stream support for paralleling the calculations in tree #10

Merged
merged 5 commits into from
Aug 14, 2022

Conversation

HansBug
Copy link
Member

@HansBug HansBug commented Aug 12, 2022

Here is an example:

import time

import numpy as np
import torch

import treetensor.torch as ttorch

N, M, T = 200, 2, 50
S1, S2, S3 = 512, 1024, 2048


def test_min():
    a = ttorch.randn({f'a{i}': (S1, S2) for i in range(N // M)}, device='cuda')
    b = ttorch.randn({f'a{i}': (S2, S3) for i in range(N // M)}, device='cuda')

    result = []
    for i in range(T):
        _start_time = time.time()

        _ = ttorch.matmul(a, b)
        torch.cuda.synchronize()

        _end_time = time.time()
        result.append(_end_time - _start_time)

    print('time cost: mean({}) std({})'.format(np.mean(result), np.std(result)))


def test_native():
    a = {f'a{i}': torch.randn(S1, S2, device='cuda') for i in range(N)}
    b = {f'a{i}': torch.randn(S2, S3, device='cuda') for i in range(N)}

    result = []
    for i in range(T):
        _start_time = time.time()

        for key in a.keys():
            _ = torch.matmul(a[key], b[key])
        torch.cuda.synchronize()

        _end_time = time.time()
        result.append(_end_time - _start_time)

    print('time cost: mean({}) std({})'.format(np.mean(result), np.std(result)))


def test_linear():
    a = ttorch.randn({f'a{i}': (S1, S2) for i in range(N)}, device='cuda')
    b = ttorch.randn({f'a{i}': (S2, S3) for i in range(N)}, device='cuda')

    result = []
    for i in range(T):
        _start_time = time.time()

        _ = ttorch.matmul(a, b)
        torch.cuda.synchronize()

        _end_time = time.time()
        result.append(_end_time - _start_time)

    print('time cost: mean({}) std({})'.format(np.mean(result), np.std(result)))


def test_stream():
    a = ttorch.randn({f'a{i}': (S1, S2) for i in range(N)}, device='cuda')
    b = ttorch.randn({f'a{i}': (S2, S3) for i in range(N)}, device='cuda')

    ttorch.stream(M)
    result = []
    for i in range(T):
        _start_time = time.time()

        _ = ttorch.matmul(a, b)
        torch.cuda.synchronize()

        _end_time = time.time()
        result.append(_end_time - _start_time)

    print('time cost: mean({}) std({})'.format(np.mean(result), np.std(result)))


def warmup():
    # warm up
    a = torch.randn(1024, 1024).cuda()
    b = torch.randn(1024, 1024).cuda()
    for _ in range(20):
        c = torch.matmul(a, b)


if __name__ == '__main__':
    warmup()
    test_min()
    test_native()
    test_linear()
    test_stream()

不过讲真,这个stream实际效果挺脆弱的,非常看tensor尺寸,大了小了都不行,GPU性能不够也不行,一弄不好还容易负优化,总之挺难伺候的。这部分如果想实用化的话得再研究研究。

@HansBug HansBug added the enhancement New feature or request label Aug 12, 2022
@HansBug HansBug requested a review from PaParaZz1 August 12, 2022 15:14
@HansBug HansBug self-assigned this Aug 12, 2022
@codecov
Copy link

codecov bot commented Aug 14, 2022

Codecov Report

Merging #10 (6a3a7dd) into main (5d34647) will decrease coverage by 0.64%.
The diff coverage is 92.89%.

@@            Coverage Diff             @@
##             main      #10      +/-   ##
==========================================
- Coverage   97.84%   97.19%   -0.65%     
==========================================
  Files          32       33       +1     
  Lines        1576     1607      +31     
==========================================
+ Hits         1542     1562      +20     
- Misses         34       45      +11     
Flag Coverage Δ
unittests 97.19% <92.89%> (-0.65%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
treetensor/torch/stream.py 52.17% <52.17%> (ø)
treetensor/torch/tensor.py 99.36% <98.50%> (+<0.01%) ⬆️
treetensor/torch/__init__.py 97.50% <100.00%> (+0.13%) ⬆️
treetensor/torch/funcs/comparison.py 100.00% <100.00%> (ø)
treetensor/torch/funcs/construct.py 100.00% <100.00%> (ø)
treetensor/torch/funcs/math.py 100.00% <100.00%> (ø)
treetensor/torch/funcs/matrix.py 100.00% <100.00%> (ø)
treetensor/torch/funcs/operation.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@HansBug HansBug merged commit 554b8ca into main Aug 14, 2022
@HansBug HansBug deleted the dev/stream branch August 14, 2022 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants