-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Selection of format and compression in save data widget #3147
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3147 +/- ##
=========================================
Coverage ? 82.73%
=========================================
Files ? 344
Lines ? 59374
Branches ? 0
=========================================
Hits ? 49125
Misses ? 10249
Partials ? 0 |
Orange/widgets/data/owsave.py
Outdated
else: | ||
self.error() | ||
|
||
|
||
def get_writer_selected_for_testing(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this method doing here?
Duplicate of get_writer_selected
?
Orange/widgets/data/owsave.py
Outdated
|
||
@classmethod | ||
def get_writers(cls, sparse): | ||
return [f for f in FileFormat.formats | ||
if getattr(f, 'write_file', None) and getattr(f, "EXTENSIONS", None) | ||
and (not sparse or getattr(f, 'SUPPORT_SPARSE_DATA', False))] | ||
|
||
def get_writer_selected(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You now use get_writer_selected
instead of get_writers
which makes the above method dead code (I see only one use in a test).
Also, note that the previous method checked for compatibility with sparse data, which I can not find anywhere in the new code. The options in the combo box should probably be refreshed when the data input changes and only show compatible options.
Orange/widgets/data/owsave.py
Outdated
} | ||
COMPRESSIONS_NAMES = ["gzip (.gz)", "bbzip2 (.bz2)", "Izma (.xz)"] | ||
COMPRESSIONS = { | ||
"gzip (.gz)": ".gz", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have two sets of constants (COMPRESSIONS_NAMES and COMPRESSIONS) with duplicated values that need to match.
Why not just have a single list of tuples, COMPRESSIONS, and use everything from there?
You should also avoid hard coding the compressions here, when they need to match something in another part of the library. You should just map names to be shown in the widget to existing compression extensions which you get by from Orange.data.io import Compression
.
E.g. something like:
COMPRESSIONS = [
"gzip ({})".format(Compression.GZIP): Compression.GZIP,
...
]
Orange/widgets/data/owsave.py
Outdated
"Comma-seperated values (.csv)": ".csv", | ||
"Pickle (.pkl)": ".pkl", | ||
} | ||
COMPRESSIONS_NAMES = ["gzip (.gz)", "bbzip2 (.bz2)", "Izma (.xz)"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bbzip2 -> bzip2
Izma -> lzma
the last one could also be uppercase (LZMA), since it is an acronym, but for consistency lower case is also fine. The first letter however is L not I (visible difference in some fonts and not others)!
Orange/widgets/data/owsave.py
Outdated
os.path.join(self.last_dir or os.path.expanduser("~"), | ||
getattr(self.data, 'name', '')) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many blank lines
Orange/widgets/data/owsave.py
Outdated
type = FILE_TYPES[self.filetype] | ||
compression = COMPRESSIONS[self.compression] if self.compress else '' | ||
writer = FileFormat.get_reader(type) | ||
writer.EXTENSIONS = [writer.EXTENSIONS[writer.EXTENSIONS.index(type + compression)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are looking up an index of something in the list and then indexing that list with the found index... you will probably be getting that "something" back most of the time ;)
Writing this directly writer.EXTENSIONS = [type + compression]
looks a bit dangerous since you are overwriting the list, but it should always be one of the available options anyway. Probably it would not be bad to have an assert before this to make sure that the chosen extension was really supported (more for the benefit of someone reading the code).
Orange/widgets/data/owsave.py
Outdated
|
||
gui.comboBox( | ||
box, self, "compression", | ||
callback=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of callback=None
for both of these combo boxes, you could call a function that fixes self.filename.
E.g. I save a tab file with gzip compression to test.tab.gz
, then when changing to csv, the stored filename could be updated to test.csv.gz
and similarly for changes to compression.
Currently, after changing the type and clicking Save again, the file is stored with the previous type since that is what the stored filename reflects...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, but I still see 2 things that could be improved.
Orange/widgets/data/owsave.py
Outdated
writer = FileFormat.get_reader(self.type_ext) | ||
try: | ||
writer.EXTENSIONS = [ | ||
writer.EXTENSIONS[writer.EXTENSIONS.index(self.type_ext + self.compress_ext)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still have the problem mentioned in #3147 (comment)
Instead of the try/except, you should just check
ext = self.type_ext + self.compress_ext
if ext not in writer.EXTENSIONS:
self.Error.not_supported_extension()
return None
writer.EXTENSIONS = [ext]
return writer
BTW, just a minor suggestion, but a better name for not_supported_extension
would be unsupported_extension
.
Orange/widgets/data/owsave.py
Outdated
FILE_TYPES = [ | ||
("Tab-delimited (.tab)", ".tab", False), | ||
("Comma-seperated values (.csv)", ".csv", False), | ||
("Pickle (.pkl)", ".pkl", True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, these are hard coded constants, which could become incorrect after changes elsewhere.
How did you know where to write False and where True? If you checked the attribute SUPPORT_SPARSE_DATA in reader classes it is better to import and reference that here directly...
Maybe even construct the whole triple directly from the class:
FILE_TYPES = [
("{} ({})".format(w.DESCRIPTION, w.EXTENSIONS[0]),
w.EXTENSIONS[0],
w.SUPPORT_SPARSE_DATA)
for w in (TabReader, CSVReader, PickleReader)
]
Please test the above code and fix if necessary. It was written here as a suggestion without trying it out.
Also, I think the description for PickledReader could be changed to "Pickled Orange data".
Issue
Resolves issue #3136
Description of changes
Updated widget according to above issue.
Does not include changes for biolab/orange3-network#80.
Includes