Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Does SPU support bigdata or parallel computing #761

Closed
xyz-scorpio opened this issue Jul 9, 2024 · 4 comments
Closed

[Question]: Does SPU support bigdata or parallel computing #761

xyz-scorpio opened this issue Jul 9, 2024 · 4 comments

Comments

@xyz-scorpio
Copy link

Feature Request Type

Performance

Have you searched existing issues?

Yes

Is your feature request related to a problem?

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe features you want to add to SPU

Just want to know if SPU supports bigdata / parallel computing, and if yes, in what way did it support such a think. TY.

Describe features you want to add to SPU

A clear and concise description of any alternative solutions or features you've considered.

@tpppppub
Copy link
Member

There are various granularities of parallelism and vectorization within the SPU. It's not clear what your specific requirements are. How much data do you need to process and what granularity of parallelism do you need?

@xyz-scorpio
Copy link
Author

xyz-scorpio commented Jul 10, 2024

There are various granularities of parallelism and vectorization within the SPU. It's not clear what your specific requirements are. How much data do you need to process and what granularity of parallelism do you need?

image

Let us take the SPU workflow as an example. My question is:
i) What is the upper bound of data scale that SPU could handle? If I write some code to train a model with tensorflow, on very large datasets (e.g. ~TB) from different parties, how does SPU handle that? Will it do data parallel automaticly?
ii) How many resources can a SPU VM take advantage of? Say, if I have 4 AWS EC2s, can one SPU VM take advantage of all 4 EC2s, or just one EC2 instance? And what is the parallel model of SPU VM?

@anakinxc
Copy link
Contributor

There are various granularities of parallelism and vectorization within the SPU. It's not clear what your specific requirements are. How much data do you need to process and what granularity of parallelism do you need?

image

Let us take the SPU workflow as an example. My question is: i) What is the upper bound of data scale that SPU could handle? If I write some code to train a model with tensorflow, on very large datasets (e.g. ~TB) from different parties, how does SPU handle that? Will it do data parallel automaticly? ii) How many resources can a SPU VM take advantage of? Say, if I have 4 AWS EC2s, can one SPU VM take advantage of all 4 EC2s, or just one EC2 instance? And what is the parallel model of SPU VM?

  1. We do not have a hard limit on data size. It is up to frameworks like Tensorflow to properly handles such large data through batching or some other techniques.
  2. At this point, SPU can only take one machine. Right now, SPU only supports DLP and ILP on one machine.

@anakinxc
Copy link
Contributor

no activity. close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants