Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Selection of format and compression in save data widget #3147

Merged
merged 7 commits into from
Aug 24, 2018

Conversation

golbog
Copy link
Contributor

@golbog golbog commented Jul 23, 2018

Issue

Resolves issue #3136

Description of changes

Updated widget according to above issue.
Does not include changes for biolab/orange3-network#80.

Includes
  • Code changes
  • Tests
  • Documentation

@CLAassistant
Copy link

CLAassistant commented Jul 23, 2018

CLA assistant check
All committers have signed the CLA.

@codecov-io
Copy link

codecov-io commented Jul 26, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@dc51c3d). Click here to learn what that means.
The diff coverage is 91.48%.

@@            Coverage Diff            @@
##             master    #3147   +/-   ##
=========================================
  Coverage          ?   82.73%           
=========================================
  Files             ?      344           
  Lines             ?    59374           
  Branches          ?        0           
=========================================
  Hits              ?    49125           
  Misses            ?    10249           
  Partials          ?        0

else:
self.error()


def get_writer_selected_for_testing(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this method doing here?
Duplicate of get_writer_selected?


@classmethod
def get_writers(cls, sparse):
return [f for f in FileFormat.formats
if getattr(f, 'write_file', None) and getattr(f, "EXTENSIONS", None)
and (not sparse or getattr(f, 'SUPPORT_SPARSE_DATA', False))]

def get_writer_selected(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You now use get_writer_selected instead of get_writers which makes the above method dead code (I see only one use in a test).
Also, note that the previous method checked for compatibility with sparse data, which I can not find anywhere in the new code. The options in the combo box should probably be refreshed when the data input changes and only show compatible options.

}
COMPRESSIONS_NAMES = ["gzip (.gz)", "bbzip2 (.bz2)", "Izma (.xz)"]
COMPRESSIONS = {
"gzip (.gz)": ".gz",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have two sets of constants (COMPRESSIONS_NAMES and COMPRESSIONS) with duplicated values that need to match.
Why not just have a single list of tuples, COMPRESSIONS, and use everything from there?

You should also avoid hard coding the compressions here, when they need to match something in another part of the library. You should just map names to be shown in the widget to existing compression extensions which you get by from Orange.data.io import Compression.
E.g. something like:

COMPRESSIONS = [
    "gzip ({})".format(Compression.GZIP): Compression.GZIP,
    ...
]

"Comma-seperated values (.csv)": ".csv",
"Pickle (.pkl)": ".pkl",
}
COMPRESSIONS_NAMES = ["gzip (.gz)", "bbzip2 (.bz2)", "Izma (.xz)"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bbzip2 -> bzip2
Izma -> lzma
the last one could also be uppercase (LZMA), since it is an acronym, but for consistency lower case is also fine. The first letter however is L not I (visible difference in some fonts and not others)!

os.path.join(self.last_dir or os.path.expanduser("~"),
getattr(self.data, 'name', ''))


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many blank lines

type = FILE_TYPES[self.filetype]
compression = COMPRESSIONS[self.compression] if self.compress else ''
writer = FileFormat.get_reader(type)
writer.EXTENSIONS = [writer.EXTENSIONS[writer.EXTENSIONS.index(type + compression)]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are looking up an index of something in the list and then indexing that list with the found index... you will probably be getting that "something" back most of the time ;)
Writing this directly writer.EXTENSIONS = [type + compression] looks a bit dangerous since you are overwriting the list, but it should always be one of the available options anyway. Probably it would not be bad to have an assert before this to make sure that the chosen extension was really supported (more for the benefit of someone reading the code).


gui.comboBox(
box, self, "compression",
callback=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of callback=None for both of these combo boxes, you could call a function that fixes self.filename.
E.g. I save a tab file with gzip compression to test.tab.gz, then when changing to csv, the stored filename could be updated to test.csv.gz and similarly for changes to compression.
Currently, after changing the type and clicking Save again, the file is stored with the previous type since that is what the stored filename reflects...

@lanzagar lanzagar added this to the 3.16 milestone Aug 3, 2018
Copy link
Contributor

@lanzagar lanzagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, but I still see 2 things that could be improved.

writer = FileFormat.get_reader(self.type_ext)
try:
writer.EXTENSIONS = [
writer.EXTENSIONS[writer.EXTENSIONS.index(self.type_ext + self.compress_ext)]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have the problem mentioned in #3147 (comment)

Instead of the try/except, you should just check

ext = self.type_ext + self.compress_ext
if ext not in writer.EXTENSIONS:
    self.Error.not_supported_extension()
    return None
writer.EXTENSIONS = [ext]
return writer

BTW, just a minor suggestion, but a better name for not_supported_extension would be unsupported_extension.

FILE_TYPES = [
("Tab-delimited (.tab)", ".tab", False),
("Comma-seperated values (.csv)", ".csv", False),
("Pickle (.pkl)", ".pkl", True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, these are hard coded constants, which could become incorrect after changes elsewhere.
How did you know where to write False and where True? If you checked the attribute SUPPORT_SPARSE_DATA in reader classes it is better to import and reference that here directly...
Maybe even construct the whole triple directly from the class:

FILE_TYPES = [
    ("{} ({})".format(w.DESCRIPTION, w.EXTENSIONS[0]),
     w.EXTENSIONS[0],
     w.SUPPORT_SPARSE_DATA)
    for w in (TabReader, CSVReader, PickleReader)
]

Please test the above code and fix if necessary. It was written here as a suggestion without trying it out.

Also, I think the description for PickledReader could be changed to "Pickled Orange data".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants