Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

moving from deepcopy to pickle files in to_job method #246

Merged
merged 19 commits into from
Jun 2, 2020

Conversation

djarecka
Copy link
Collaborator

@djarecka djarecka commented May 8, 2020

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Summary

this is a part of an effort to decrease memory load (#226)

to_job creates pickle files for a task (without input) and the input (right now the input is saved together with the original task) instead of creating deep copies. The files are load and the specific input is set before running the task/wf.

Checklist

  • All tests passing
  • I have added tests to cover my changes
  • I have updated documentation (if necessary)
  • My code follows the code style of this project
    (we are using black: you can pip install pre-commit,
    run pre-commit install in the pydra directory
    and black will be run automatically with each commit)

Acknowledgment

  • I acknowledge that this contribution will be available under the Apache 2 license.

…sk has a state, but instead a soft copy of the task and the input is saved in pickle files, so it could be load later (in specific workers)
@djarecka djarecka changed the title [wip] moving from deepcopy to pickle files in to_job method moving from deepcopy to pickle files in to_job method May 8, 2020
@codecov
Copy link

codecov bot commented May 8, 2020

Codecov Report

Merging #246 into master will decrease coverage by 0.87%.
The diff coverage is 87.02%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #246      +/-   ##
==========================================
- Coverage   86.74%   85.86%   -0.88%     
==========================================
  Files          19       19              
  Lines        3077     3156      +79     
  Branches      824      837      +13     
==========================================
+ Hits         2669     2710      +41     
- Misses        256      285      +29     
- Partials      152      161       +9     
Flag Coverage Δ
#unittests 85.86% <87.02%> (-0.88%) ⬇️
Impacted Files Coverage Δ
pydra/engine/helpers.py 85.01% <83.09%> (-2.08%) ⬇️
pydra/engine/submitter.py 88.28% <84.61%> (-3.23%) ⬇️
pydra/engine/workers.py 78.60% <92.10%> (-6.14%) ⬇️
pydra/conftest.py 77.77% <100.00%> (-22.23%) ⬇️
pydra/engine/core.py 89.95% <100.00%> (ø)
pydra/utils/profiler.py 38.16% <0.00%> (-5.35%) ⬇️
pydra/__init__.py 93.33% <0.00%> (+33.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f042738...d17b593. Read the comment docs.

@satra
Copy link
Contributor

satra commented May 9, 2020

just some thoughts here:

deepcopy in to_job may be essential to ensure that tasks inside a Workflow are copied instead of referenced since to_job applies to all Tasks. a shallow copy will not make copies of nested objects. depending when a copy happens, this may require deepcopy, but i haven't though through the entire workflow.

i don't think getting input ind should happen in the worker task load func. workers should just process a single task that they are given. so architecturally i would like that function not to evaluate inputs corresponding to indices, but just run a specific job.

finally it will be important to understand how workflow splits are being handled as all of that runs on the main thread and not in workers. so

wf = Workflow("foo")
wf.add(FunctionTask(...))
wf.splits(...)

then wf will make copies in the main thread, and any nested workflow with splits will also make copies in the main thread.

so this will require creating a concept diagram of how a workflow is executed and identifying ways to minimize memory usage on the main thread. perhaps understanding the aysnc workflow execution is more important first than fixing the memory issue. it will help then to think about how best to handle memory issue.

@djarecka
Copy link
Collaborator Author

djarecka commented May 9, 2020

@satra - you are right that tasks within wf are referenced, and this was the reason why deepcopy had to be there, but now I'm saving the task in the pickle file, so the pickling library does it (as we were discussing this on wed).

I'm not asking for the input in the worker part anymore, could you point me to the line?

I will think about your comments, but the first thing I have to fix is the slurm, made some changes (not commited), but doesn't work...

pydra/engine/workers.py Outdated Show resolved Hide resolved
pydra/engine/core.py Outdated Show resolved Hide resolved
pydra/engine/core.py Outdated Show resolved Hide resolved
pydra/engine/core.py Outdated Show resolved Hide resolved
pydra/engine/helpers.py Outdated Show resolved Hide resolved
djarecka added 3 commits May 31, 2020 11:39
…e task with full input; submitter passes the same pickle file, but different indices for tasks with states (so load_and_runwould be able to set proper inputs)
pydra/engine/core.py Outdated Show resolved Hide resolved
pydra/engine/helpers.py Outdated Show resolved Hide resolved
Comment on lines 518 to 519
task = load_task(task_pkl=task_pkl, ind=ind)
task._run(rerun=rerun, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would put this in a try - except clause, since things could fail for various reasons and then update the result file appropriately whether things crashed or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which result file you have in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result file created by the task. in case the task crashes then a result file created by the code that traps that exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you want to write an error into _result.pklz? not sure if I follow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've added a simple crashfile


if not task.result():
raise Exception("Something went wrong")
return task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this return a result pickle file, not task?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it returns task after running it. it's likely taht doesn't have to return anything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still returning task though, not result. i still think this should return a pointer to a pickled result file. and the result file should have a result object that indicates that the task has crashed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was just double checking this, returned object is not used anywhere, so I'm removing this for now. it really could be anything here...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I could put the file name, so could be used when function is used outside worker or submitter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's ok to return the result file name. however see the previous comment about improving what load and run traps.

pydra/engine/helpers.py Outdated Show resolved Hide resolved
@djarecka
Copy link
Collaborator Author

djarecka commented Jun 2, 2020

@satra - let me know if I can merge this when tests pass

@djarecka djarecka merged commit 63d2bd6 into nipype:master Jun 2, 2020
@djarecka djarecka deleted the mnt/memory_tojob branch December 30, 2022 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants