Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import plugin: append logs for bulk import rather than overwriting #223

Merged
merged 2 commits into from
Jun 29, 2020

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Jun 23, 2020

Fixes #214

Rather than unconditionally open log files in w mode, this allows to pass the opening mode via a key/value parameter.
In the case of a bulk import, the first iteration will write logs in w mode but all subsequent iterations should append to the log files.

@@ -583,7 +583,11 @@ def bulk_import(self, command_args, xargs):
# FIXME: this assumes 'omero'
print(sys.argv[0], "import", rv, file=o)
else:
self.do_import(command_args, xargs)
if incr == 0:
mode = "w"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overwriting on the first of multiple imports and subsequently appending is certainly an improvement.
I wonder a bit if one ever wants to overwrite, though that may be part of a larger refactoring. Two other options would be: (1) failing if an overwrite would occur or (2) auto-incrementing the file name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like (2) as an alternate behavior. It is inline with the classical behavior of distributed tooling like GNU parallel. One implementation would be to support regexp arguments like bin/omero import --bulk bulk --file log_%d.log --err log_%d.err, try to substitute the increment number and fallback to writing to single files in appending mode. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about merge this as it's definitely an improvement, then think about whether it's worth the additional complexity? We've had bulk import for ages but the issue was only discovered last month, so either no-one uses it which suggests it's not worth adding more complexity, or no-one's noticed it in which case this fix is fine.

Copy link
Member

@joshmoore joshmoore Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSTM. (I think I often opted to used parallel rather than bulk which perhaps is what left bulk in an incomplete state.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I never used that, just redirect out and err to a "log" file, assuming everything important is printed out on the command line.

src/omero/plugins/import.py Outdated Show resolved Hide resolved
Co-authored-by: Simon Li <orpheus+devel@gmail.com>
Copy link
Member

@manics manics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, LGTM

@sbesson sbesson merged commit 59cf539 into ome:master Jun 29, 2020
@sbesson sbesson deleted the bulk_file_errs branch June 29, 2020 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

import --log/--err logs are overwritten with bulk import
4 participants