Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multi-thread for cloud experiments and mutli-process for local ones #29

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

DonggeLiu
Copy link
Collaborator

With clould experiments enabled, local program is left with no computationally heavy tasks. Most of the time, it only waits for cloud build results and writes them to a file.
using multi-thread can achieve higher parallelism (e.g., avoid overhead in creation, management, context switching) and require less resource (e.g., reduce memory usage).

@DonggeLiu DonggeLiu requested a review from oliverchang January 29, 2024 05:35
elif args.cloud_experiment_name:
# Use multi-threads for cloud experiments, because each thread only needs to
# wait for cloud build results or conduct simple I/O tasks.
with ThreadPoolExecutor(max_workers=NUM_EXP) as executor:
Copy link
Collaborator

@oliverchang oliverchang Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can consolidate the two code paths more?

e.g. something like:

if ...:
  pool =  ThreadPool
else:
  pool = Pool

Not sure what the difference is exactly between ThreadPoolExecutor and ThreadPool

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.ThreadPool

I reckon the doc implies ThreadPoolExecutor is more modern and supports thread-level parallelism better?
I can definitely try to consolidate the two code parts to make them look better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an attempt to clean up the related code in run_all_experiments.py a bit, will do the same in run_one_experiment.py.
The code should be more readable and have less repetition, but I did not find a perfect way to consolidate them further without over-engineering this.

Please let me know if there are other options.

@oliverchang
Copy link
Collaborator

Can we use https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor instead to make the code paths more consistent for the process case?

@DonggeLiu
Copy link
Collaborator Author

Can we use https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor instead to make the code paths more consistent for the process case?

Yep, sure.
Thanks!

@DonggeLiu DonggeLiu marked this pull request as draft February 26, 2024 23:52
@DonggeLiu
Copy link
Collaborator Author

Not high priority, converting it to a draft for now and will come back to this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants