-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
moving from deepcopy to pickle files in to_job method #246
Conversation
…sk has a state, but instead a soft copy of the task and the input is saved in pickle files, so it could be load later (in specific workers)
Codecov Report
@@ Coverage Diff @@
## master #246 +/- ##
==========================================
- Coverage 86.74% 85.86% -0.88%
==========================================
Files 19 19
Lines 3077 3156 +79
Branches 824 837 +13
==========================================
+ Hits 2669 2710 +41
- Misses 256 285 +29
- Partials 152 161 +9
Continue to review full report at Codecov.
|
just some thoughts here:
i don't think getting input finally it will be important to understand how workflow splits are being handled as all of that runs on the main thread and not in workers. so wf = Workflow("foo")
wf.add(FunctionTask(...))
wf.splits(...) then so this will require creating a concept diagram of how a workflow is executed and identifying ways to minimize memory usage on the main thread. perhaps understanding the aysnc workflow execution is more important first than fixing the memory issue. it will help then to think about how best to handle memory issue. |
@satra - you are right that tasks within wf are referenced, and this was the reason why I'm not asking for the input in the worker part anymore, could you point me to the line? I will think about your comments, but the first thing I have to fix is the |
…t to decrease the time)
…copy): modyfying create_pyscript to use load_and_run for task with a state, changes in _prepare_runscript and run_el to deal with a tuple instead of a runnable
…e task with full input; submitter passes the same pickle file, but different indices for tasks with states (so load_and_runwould be able to set proper inputs)
pydra/engine/helpers.py
Outdated
task = load_task(task_pkl=task_pkl, ind=ind) | ||
task._run(rerun=rerun, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would put this in a try - except clause, since things could fail for various reasons and then update the result file appropriately whether things crashed or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which result file you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the result file created by the task. in case the task crashes then a result file created by the code that traps that exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you want to write an error into _result.pklz
? not sure if I follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've added a simple crashfile
pydra/engine/helpers.py
Outdated
|
||
if not task.result(): | ||
raise Exception("Something went wrong") | ||
return task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this return a result pickle file, not task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it returns task
after running it. it's likely taht doesn't have to return anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still returning task though, not result. i still think this should return a pointer to a pickled result file. and the result file should have a result object that indicates that the task has crashed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was just double checking this, returned object is not used anywhere, so I'm removing this for now. it really could be anything here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I could put the file name, so could be used when function is used outside worker or submitter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's ok to return the result file name. however see the previous comment about improving what load and run traps.
@satra - let me know if I can merge this when tests pass |
Types of changes
Summary
this is a part of an effort to decrease memory load (#226)
to_job
creates pickle files for a task (without input) and the input (right now the input is saved together with the original task) instead of creating deep copies. The files are load and the specific input is set before running the task/wf.Checklist
(we are using
black
: you canpip install pre-commit
,run
pre-commit install
in thepydra
directoryand
black
will be run automatically with each commit)Acknowledgment