Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DDP logging #2826

Closed
edenlightning opened this issue Aug 4, 2020 · 2 comments · Fixed by #3819
Closed

Fix DDP logging #2826

edenlightning opened this issue Aug 4, 2020 · 2 comments · Fixed by #3819
Labels
bug Something isn't working distributed Generic distributed-related topic priority: 0 High priority task
Milestone

Comments

@edenlightning
Copy link
Contributor

Add a global_zero_only=true flag, if false- create individual files, prefixed with machine nun
Write a logging callback that will do map reduce
Can we do this in the metrics?
Aggregate all tensors on global zero first
(might run into memory issues)
Gather each output individually in CPU memory
We want to preserve the fact that logging is at 0

@edenlightning edenlightning added bug Something isn't working allowed_pre_1.0 distributed Generic distributed-related topic labels Aug 4, 2020
@edenlightning edenlightning added this to the 0.9.x milestone Sep 1, 2020
@edenlightning
Copy link
Contributor Author

@justusschock

@justusschock
Copy link
Member

@edenlightning reducing is something we're currently working on at metrics.

But the logging stuff should be separate I guess. Currently we also gather all results to all devices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed Generic distributed-related topic priority: 0 High priority task
Projects
None yet
3 participants