Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed fails when nSets > cpu_count() in docker #66

Open
2 tasks
dimalvovs opened this issue Nov 27, 2023 · 3 comments
Open
2 tasks

Distributed fails when nSets > cpu_count() in docker #66

dimalvovs opened this issue Nov 27, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@dimalvovs
Copy link
Contributor

The latest version available fails when run in distributed mode in docker if:

  • 1. nSets >= mp.cpu_count()-1
  • 2. nSets = 1

How to reproduce 1

  1. run the latest image with 4cpus: docker run -it --cpus=4 --entrypoint /bin/bash ghcr.io/fertiglab/pycogaps
  2. validate that the Standard config works:
echo "if __name__ == '__main__':
    from PyCoGAPS.parameters import *
    from PyCoGAPS.pycogaps_main import CoGAPS
    import scanpy as sc

    modsimpath = 'data/ModSimData.txt'
    modsim = sc.read_text(modsimpath)

    params = CoParams(path=modsimpath)
    params.printParams()

    setParams(params, {
        'nIterations':10000,
        'seed': 42,
        'nPatterns': 3
    })

    params.printParams()
    start = time.time()
    result = CoGAPS(modsimpath, params)
    end = time.time()
    print('TIME:', end - start)

    result.write('data/dist_modsim.h5ad')" > test.py

python3 test.py

output:

...
GapsResult result object with 25 features and 20 samples
3 patterns were learned

TIME: 0.9117803573608398

test_standard.log

  1. Validate that distributed config works with nSets < mp.cpu_count():
#see how many cores we have
root@7f3405e40da1:/pycogaps# python3
Python 3.8.18 (default, Sep 20 2023, 11:41:31) 
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing as mp
>>> mp.cpu_count()
4  

run distributed with 2 cores:

echo "if __name__ == '__main__':
    from PyCoGAPS.parameters import *
    from PyCoGAPS.pycogaps_main import CoGAPS
    import scanpy as sc

    modsimpath = 'data/ModSimData.txt'
    modsim = sc.read_text(modsimpath)

    params = CoParams(path=modsimpath)
    params.printParams()

    setParams(params, {
        'nIterations':10000,
        'seed': 42,
        'nPatterns': 3,
        'useSparseOptimization': True,
        'distributed': 'genome-wide'
    })

    params.setDistributedParams(nSets=2)
    params.printParams()
    start = time.time()
    result = CoGAPS(modsimpath, params)
    end = time.time()
    print('TIME:', end - start)

    result.write('data/dist_modsim.h5ad')">test_2nsets.py

python3 test_2nsets.py

output:

...
GapsResult result object with 13 features and 20 samples
2 patterns were learned

Stitching results together...
TIME: 2.979750871658325

test_2nsets.log

  1. Observe that values of nSets=1, nSets>2 fail:
    nSets=1
    nSets=3
    nSets=4
@dimalvovs dimalvovs added the bug Something isn't working label Nov 27, 2023
@dimalvovs
Copy link
Contributor Author

I was able to reconstruct the above on a different machine with 12 cores.

@dimalvovs
Copy link
Contributor Author

setting .pool(processes=4) instead of .pool(processes=nsets) does not change the behaviour.

@dimalvovs
Copy link
Contributor Author

dimalvovs commented Dec 1, 2023

stats for running in distributed mode nsets 1 to 4

Arch. nsets=1 nsets=2 nsets=3 nsets=4
docker fail pass fail fail
mb12c fail pass pass pass
git ub fail
win
linux

docker - an image from ghcr.io ran on a mac host
mb12c - 12 core 2023 macbook pro @ Sonoma 14.1.1 (23B81)
git ub - github ubuntu worker, also run in docker btw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant