[ENH] Selection of format and compression in save data widget #3147

golbog · 2018-07-23T12:26:41Z

Issue

Resolves issue #3136

Description of changes

Updated widget according to above issue.
Does not include changes for biolab/orange3-network#80.

Includes

Code changes
Tests
Documentation

CLAassistant · 2018-07-23T12:26:48Z

All committers have signed the CLA.

codecov-io · 2018-07-26T09:17:22Z

Codecov Report

❗ No coverage uploaded for pull request base (master@dc51c3d). Click here to learn what that means.
The diff coverage is 91.48%.

@@            Coverage Diff            @@
##             master    #3147   +/-   ##
=========================================
  Coverage          ?   82.73%           
=========================================
  Files             ?      344           
  Lines             ?    59374           
  Branches          ?        0           
=========================================
  Hits              ?    49125           
  Misses            ?    10249           
  Partials          ?        0

lanzagar · 2018-08-01T09:18:46Z

Orange/widgets/data/owsave.py

            else:
                self.error()


+    def get_writer_selected_for_testing(self):


What is this method doing here?
Duplicate of get_writer_selected?

lanzagar · 2018-08-01T09:35:28Z

Orange/widgets/data/owsave.py


    @classmethod
    def get_writers(cls, sparse):
        return [f for f in FileFormat.formats
                if getattr(f, 'write_file', None) and getattr(f, "EXTENSIONS", None)
                and (not sparse or getattr(f, 'SUPPORT_SPARSE_DATA', False))]

+    def get_writer_selected(self):


You now use get_writer_selected instead of get_writers which makes the above method dead code (I see only one use in a test).
Also, note that the previous method checked for compatibility with sparse data, which I can not find anywhere in the new code. The options in the combo box should probably be refreshed when the data input changes and only show compatible options.

lanzagar · 2018-08-01T09:48:19Z

Orange/widgets/data/owsave.py

+}
+COMPRESSIONS_NAMES = ["gzip (.gz)", "bbzip2 (.bz2)", "Izma (.xz)"]
+COMPRESSIONS = {
+    "gzip (.gz)": ".gz",


You have two sets of constants (COMPRESSIONS_NAMES and COMPRESSIONS) with duplicated values that need to match.
Why not just have a single list of tuples, COMPRESSIONS, and use everything from there?

You should also avoid hard coding the compressions here, when they need to match something in another part of the library. You should just map names to be shown in the widget to existing compression extensions which you get by from Orange.data.io import Compression.
E.g. something like:

COMPRESSIONS = [ "gzip ({})".format(Compression.GZIP): Compression.GZIP, ... ]

lanzagar · 2018-08-01T09:51:53Z

Orange/widgets/data/owsave.py

+    "Comma-seperated values (.csv)": ".csv",
+    "Pickle (.pkl)": ".pkl",
+}
+COMPRESSIONS_NAMES = ["gzip (.gz)", "bbzip2 (.bz2)", "Izma (.xz)"]


bbzip2 -> bzip2
Izma -> lzma
the last one could also be uppercase (LZMA), since it is an acronym, but for consistency lower case is also fine. The first letter however is L not I (visible difference in some fonts and not others)!

lanzagar · 2018-08-01T10:00:15Z

Orange/widgets/data/owsave.py

+                    os.path.join(self.last_dir or os.path.expanduser("~"),
+                                 getattr(self.data, 'name', ''))
+
+


too many blank lines

lanzagar · 2018-08-01T14:33:14Z

Orange/widgets/data/owsave.py

+        type = FILE_TYPES[self.filetype]
+        compression = COMPRESSIONS[self.compression] if self.compress else ''
+        writer = FileFormat.get_reader(type)
+        writer.EXTENSIONS = [writer.EXTENSIONS[writer.EXTENSIONS.index(type + compression)]]


You are looking up an index of something in the list and then indexing that list with the found index... you will probably be getting that "something" back most of the time ;)
Writing this directly writer.EXTENSIONS = [type + compression] looks a bit dangerous since you are overwriting the list, but it should always be one of the available options anyway. Probably it would not be bad to have an assert before this to make sure that the chosen extension was really supported (more for the benefit of someone reading the code).

lanzagar · 2018-08-01T14:37:40Z

Orange/widgets/data/owsave.py

+
+        gui.comboBox(
+            box, self, "compression",
+            callback=None,


Instead of callback=None for both of these combo boxes, you could call a function that fixes self.filename.
E.g. I save a tab file with gzip compression to test.tab.gz, then when changing to csv, the stored filename could be updated to test.csv.gz and similarly for changes to compression.
Currently, after changing the type and clicking Save again, the file is stored with the previous type since that is what the stored filename reflects...

lanzagar

Almost there, but I still see 2 things that could be improved.

lanzagar · 2018-08-10T14:24:08Z

Orange/widgets/data/owsave.py

+        writer = FileFormat.get_reader(self.type_ext)
+        try:
+            writer.EXTENSIONS = [
+                writer.EXTENSIONS[writer.EXTENSIONS.index(self.type_ext + self.compress_ext)]]


You still have the problem mentioned in #3147 (comment)

Instead of the try/except, you should just check

ext = self.type_ext + self.compress_ext if ext not in writer.EXTENSIONS: self.Error.not_supported_extension() return None writer.EXTENSIONS = [ext] return writer

BTW, just a minor suggestion, but a better name for not_supported_extension would be unsupported_extension.

lanzagar · 2018-08-10T14:41:27Z

Orange/widgets/data/owsave.py

+FILE_TYPES = [
+    ("Tab-delimited (.tab)", ".tab", False),
+    ("Comma-seperated values (.csv)", ".csv", False),
+    ("Pickle (.pkl)", ".pkl", True),


Again, these are hard coded constants, which could become incorrect after changes elsewhere.
How did you know where to write False and where True? If you checked the attribute SUPPORT_SPARSE_DATA in reader classes it is better to import and reference that here directly...
Maybe even construct the whole triple directly from the class:

FILE_TYPES = [ ("{} ({})".format(w.DESCRIPTION, w.EXTENSIONS[0]), w.EXTENSIONS[0], w.SUPPORT_SPARSE_DATA) for w in (TabReader, CSVReader, PickleReader) ]

Please test the above code and fix if necessary. It was written here as a suggestion without trying it out.

Also, I think the description for PickledReader could be changed to "Pickled Orange data".

Updated according to issue biolab#3136

3a43361

golbog added 2 commits July 25, 2018 13:55

Adapted tests according to GUI changes

b2ed4af

reformating for pylint

cecb88a

lanzagar requested changes Aug 1, 2018

View reviewed changes

lanzagar added this to the 3.16 milestone Aug 3, 2018

golbog added 2 commits August 6, 2018 13:18

Fixes according to the review

2e4c184

Pylint fixes

6902f7b

lanzagar requested changes Aug 10, 2018

View reviewed changes

golbog added 2 commits August 24, 2018 10:19

Changed ext check and FILE_TYPES not hard coded anymore

35fd10a

Minor fixes

8687b30

lanzagar approved these changes Aug 24, 2018

View reviewed changes

lanzagar merged commit 565c22c into biolab:master Aug 24, 2018

This was referenced Aug 29, 2018

Save Data enhancement #3136

Closed

Redesign of Save widget (Orange) biolab/orange3-single-cell#242

Closed

markotoplak mentioned this pull request Sep 17, 2018

Save Data crashes on no data #3256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Selection of format and compression in save data widget #3147

[ENH] Selection of format and compression in save data widget #3147

golbog commented Jul 23, 2018 •

edited

Loading

CLAassistant commented Jul 23, 2018 •

edited

Loading

codecov-io commented Jul 26, 2018 •

edited

Loading

lanzagar Aug 1, 2018

lanzagar Aug 1, 2018

lanzagar Aug 1, 2018

lanzagar Aug 1, 2018

lanzagar Aug 1, 2018

lanzagar Aug 1, 2018

lanzagar Aug 1, 2018

lanzagar left a comment

lanzagar Aug 10, 2018

lanzagar Aug 10, 2018

		os.path.join(self.last_dir or os.path.expanduser("~"),
		getattr(self.data, 'name', ''))

[ENH] Selection of format and compression in save data widget #3147

[ENH] Selection of format and compression in save data widget #3147

Conversation

golbog commented Jul 23, 2018 • edited Loading

Issue

Description of changes

Includes

CLAassistant commented Jul 23, 2018 • edited Loading

codecov-io commented Jul 26, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lanzagar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

golbog commented Jul 23, 2018 •

edited

Loading

CLAassistant commented Jul 23, 2018 •

edited

Loading

codecov-io commented Jul 26, 2018 •

edited

Loading