Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended export plugin support #3400

Merged
merged 29 commits into from
Oct 16, 2019
Merged

Extended export plugin support #3400

merged 29 commits into from
Oct 16, 2019

Conversation

austinmm
Copy link
Contributor

The original export plugin only exported as JSON so I added the ability to export in CSV or XML format. To do this I created a new option -f or --format which specifies the export format, the default is still JSON. For example...
$ beet export -f csv -i "title,album" ACDC
I also updated the plugin documentation to reflect the changes I've made to the plugin. Lastly, I also created a unit test for the export plugin; since one hadn't been created yet.

@sampsyo
Copy link
Member

sampsyo commented Oct 12, 2019

Cool; thanks for getting this started!

Here's one question for context: while I can totally see the utility of a CSV export, I'm struggling to think of a use case for an XML export. Can you elaborate a little more on what you want to use CSV and XML data for? I think that would help calibrate our approach to these features.

@@ -40,6 +42,9 @@ class ExportPlugin(BeetsPlugin):

def __init__(self):
super(ExportPlugin, self).__init__()
# Used when testing export plugin
self.run_results = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm; I don't think I see where this is actually used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to use that class field as a way of checking the output during the testing of the export plugin but I ended up going another route so I will go ahead and remove it.

@@ -40,6 +42,9 @@ class ExportPlugin(BeetsPlugin):

def __init__(self):
super(ExportPlugin, self).__init__()
# Used when testing export plugin
self.run_results = None
self.export_format = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't quite see why this is a class field instead of just being a local variable in the run method. Can we please revert to the old way?

Copy link
Contributor Author

@austinmm austinmm Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revert it.

'indent': 4,
'separators': ('>'),
'sort_keys': True
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of these config options are actually used in the CSV or XML emitters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just trying to follow the same format, but you are right they are not being used. I have updated them to now be format specific to json, csv and xml and updated the csv and xml outputs to use them...

'json': {
                # json module formatting options
                'formatting': {
                    'ensure_ascii': False,
                    'indent': 4,
                    'separators': (',', ': '),
                    'sort_keys': True
                }
            },
            'csv': {
                # csv module formatting options
                'formatting': {
                    'delimiter': ',', # column seperator
                    'dialect': 'excel', # the name of the dialect to use
                    'quotechar':'|'
                }
            },
            'xml': {
                # xml module formatting options
                'formatting': {
                    'encoding': 'UTF-8', # the output encoding
                    'xml_declaration':'True', # controls if an XML declaration should be added to the file
                    'default_namespace': 'None', # sets the default XML namespace (for “xmlns”)
                    'method': 'xml', # either "xml", "html" or "text" (default is "xml")
                    'short_empty_elements': 'True' # controls the formatting of elements that contain no content.
                }

@@ -78,17 +101,21 @@ def commands(self):
u'-o', u'--output',
help=u'path for the output file. If not given, will print the data'
)
cmd.parser.add_option(
u'-f', u'--format', default='json',
help=u'specify the format of the exported data. Your options are json (deafult), csv, and xml'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be shorter: perhaps, "the output format: json (default), csv, or xml".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

file_mode = 'a' if opts.append else 'w'
file_format = opts.format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please preserve the effect of default_format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you prefer I get rid of the default=json in...

cmd.parser.add_option(
            u'-f', u'--format', default='json',
            help=u"the output format: json (default), csv, or xml"
        )

and instead have something like...

file_format = opts.format if opts.format else self.config['default_format'].get(str)

"""Saves in a json file"""
def __init__(self, file_path, file_mode=u'w', encoding=u'utf-8'):
super(JsonFormat, self).__init__(file_path, file_mode, encoding)
self.export = self.export_to_file if self.path else self.export_to_terminal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This technique where a method is defined conditionally is somewhat surprising. I now see how it works after some puzzling about it, but it could use an explanatory comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll add some comments explaining its purpose


def export_to_file(self, data, **kwargs):
with codecs.open(self.path, self.mode, self.encoding) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be really useful to factor out the file-opening part of the logic so it doesn't need to be re-implemented for every export format. In general, the output format and output destination (file or stdout) seem orthogonal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll explore that idea. Sounds like a better way


def export(self, data, **kwargs):
json.dump(data, sys.stdout, cls=ExportEncoder, **kwargs)
if data and type(data) is list and len(data) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by the list type check here—when will the passed object not be a list? And what happens when it's not—so self.header remains empty?

Also, this condition could be simplified to:

if data and isinstance(data, list):

because data is only "true" if it is non-empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just trying to be extra cautious but I completely agree that it is a little overkill, this should work just fine instead; if data and len(data) > 0:


$ beet export -f csv beatles
$ beet export -f json beatles
$ beet export -f xml beatles
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do without the full command-line examples here. Can we just list the options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

ensure_ascii: False
indent: 4
separators: ['>']
sort_keys: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I don't think these options are actually used for the new formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated them to work correctly now with the added formats CSV and XML

@austinmm
Copy link
Contributor Author

I am still working on all the changes but almost done. Thank you for all the feedback as it was very helpful and constructive. Also to address your question about the XML format. The reason I added the ability to export in the XML format was that this is the preferred export format used by Apple in Apple Music and simply figured that if Apple does it then there must exist a good reason. For CSV exporting, I imagined that some people, including myself, might want to use Excel tools to better analyze their music library.

@austinmm
Copy link
Contributor Author

I have just pushed the changes that were discussed above. Please let me know if you recommend any other changes. :)

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good so far!

Could you please look into the CI results above? We use a style checker to enforce PEP-8 conventions.

One other thing that the style checker won't catch: can you please try to use complete sentences in comments, complete with a capital letter and a period? It would be nice to be consistent with the way (most) other comments in beets are written.

'formatting': {
'delimiter': ',', # column seperator
'dialect': 'excel', # the name of the dialect to use
'quotechar': '|'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you probably want quotechar to default to "? In fact, it doesn't seem like the most useful thing to want to override… maybe we can get away without this option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll remove it

# csv module formatting options
'formatting': {
'delimiter': ',', # column seperator
'dialect': 'excel', # the name of the dialect to use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option seems useful enough! But maybe it's worth linking (in the documentation) to the place in the Python docs where dialects are described?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes I forgot to do that, thank you for reminding me


def export(self, data, **kwargs):
if data and len(data) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if data and len(data) > 0:
if data:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

ensure_ascii: False
indent: 4
separators: ['>']
sort_keys: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these examples are now out of sync with the implementation. It would also be good to describe the options in English above.

…d up code to better follow PEP-8 conventions and just work more efficiently all around.
@austinmm
Copy link
Contributor Author

I have made the changes you suggested alongside some other just general code cleanup. I am not sure where I might be disobeying the PEP-8 conventions so I apologize if I am anywhere. I am still new to CIs and so would love any recommendations you might have so I can pass the two CI test.

@sampsyo
Copy link
Member

sampsyo commented Oct 13, 2019

For the CI results, take a look at GitHub's "checks" interface on this page. It has "Details" links to the logs from Travis and CircleCI. For example, here's the log from the style checker:
https://travis-ci.org/beetbox/beets/jobs/597112966

You can also consider installing flake8 and running it locally.

Also, about the docs: while the text seems good, could you please move these from inline comments in the example YAML to the prose text above? There's already a section there with a bulleted list describing the options, before the page gets to the YAML example. That would be a nice place to add similar descriptions of the new options.

@austinmm
Copy link
Contributor Author

Both the test_export.py and export.py pass the flake8 test so I'm not sure why there is still an issue with the CLI. How do I run the test_export.py on my computer because if I just run python test/test_export.py it doesn't compile correctly and neither do any of the other python tests files?

@sampsyo
Copy link
Member

sampsyo commented Oct 14, 2019

We have a wiki page about testing that might help! You might want to try using Tox or check out the section about test dependencies.

You can also see the results of the tests directly on Travis:
https://travis-ci.org/beetbox/beets/jobs/597334455

@austinmm
Copy link
Contributor Author

I was able to get all the tests to pass :). Please let me know if you want me to update anything else, but I believe it is production-ready.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! I left a few more comments inline.

I also took a look at the exported XML format. It seems to look like this:

<library>
  <key>tracks</key>
  <dict>
    <key>0</key>
    <dict>
      <Track ID>0</Track ID>
      <artist>...</artist>
      ...
    </dict>
    <key>1</key>
    ...
  </dict>
</library>

But I guess I would have expected there to be a tag called <tracks> that contained individual <track> elements with the attributes of each. That would be more XML-like—I'm not sure I see the utility of the extra level of "dict" indirection here.

Also, "Track ID" is not a valid XML tag name—they can't contain spaces.

Comment on lines 71 to 72
# Controls if XML declaration should be added to the file.
# 'xml_declaration': True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is no longer used and can be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Comment on lines 112 to 113
file_format = opts.format if opts.format else \
self.config['default_format'].get(str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should suffice: opts.format or self.config['default_format'].get(str)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

Comment on lines 152 to 155
""" self.out_stream =
sys.stdout if path doesn't exit
codecs.open(..) else
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is out of date? Or it can be rephrased as a natural-language comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the new comment...
# creates a file object to write/append or sets to stdout

track_details = ET.SubElement(track_dict, key)
track_details.text = value

# tree = ET.ElementTree(element=library)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of date?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah


- **separators**: A ``[item_separator, dict_separator]`` tuple.
- **dialect**: A dialect, in the context of reading and writing CSVs, is a construct that allows you to create, store, and re-use various formatting parameters for your data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating the text from the Python CSV documentation, can we instead link to the explanation there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will link to an external description but I would imagine it is still good to have some description in the document itself. I just shortened it a little. Let me know if you want me to get rid of the description altogether.


- **sort_keys**: Sorts the keys in JSON dictionaries.
- **XML Formatting**
- **encoding**: Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… how does this affect the usage of the plugin? Because we're writing to a file, it seems like the kind of Python string that gets generated is irrelevant. Perhaps there's an argument to be made that this should not be configurable in that case?

Copy link
Contributor Author

@austinmm austinmm Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is not a config option now.

- **XML Formatting**
- **encoding**: Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated).

- **xml_declaration**: Controls if an XML declaration should be added to the file. Use False for never, True for always, None for only if not US-ASCII or UTF-8 or Unicode (default is None).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm not sure if I know what an "XML declaration" is. Is this something that people will actually want to control?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its been removed


- **xml_declaration**: Controls if an XML declaration should be added to the file. Use False for never, True for always, None for only if not US-ASCII or UTF-8 or Unicode (default is None).

- **method**: Can be either "xml", "html" or "text" (default is "xml")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth explaining what these options actually do? I'm not sure how setting method to text will affect the output, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user changes the config to use text or HTML then the XML structure built will reformat to be displayed as text or as HTML. It kind of adds two other export formats unintentionally...
XML:

<library><key>tracks</key><dict><key>0</key><dict><Track ID>0</Track ID><title>Catch Me</title><album>BRÅVES</album></dict><key>1</key><dict><Track ID>1</Track ID><title>A Toast</title><album>BRÅVES</album></dict><key>2</key><dict><Track ID>2</Track ID><title>Joan of Arc</title><album>BRÅVES</album></dict></dict></library>

Text:

tracks00Catch MeBRÅVES11A ToastBRÅVES22Joan of ArcBRÅVES

HTML:

<library><key>tracks</key><dict><key>0</key><dict><Track ID>0</Track ID><title>Catch Me</title><album>BRÅVES</album></dict><key>1</key><dict><Track ID>1</Track ID><title>A Toast</title><album>BRÅVES</album></dict><key>2</key><dict><Track ID>2</Track ID><title>Joan of Arc</title><album>BRÅVES</album></dict></dict></library>

Copy link
Member

@sampsyo sampsyo Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… correct me if I'm wrong, but isn't the HTML example here identical to the XML example? Is there a difference I'm missing?

Also, TBH, the text format doesn't look very useful—it's just all the values strung together, without spacing…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the HTML is the same but I don't think that is always the case. I agree the text is not very useful. Would you like me to remove this option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so! Unless you can think of a compelling reason why someone would want to use it, let's take it out.

@austinmm
Copy link
Contributor Author

I believe I have successfully implemented all your recommended changes. Let me know if I need to make any other changes. Also, here is the updated XML format...

<library>
    <tracks>
        <track>
            <title>Catch Me</title>
            <album>BRÅVES</album>
            ....
        </track>
        ....
    </tracks>
</library>

@sampsyo
Copy link
Member

sampsyo commented Oct 15, 2019

Cool! Thanks for continuing to make progress!

I have only two more things in mind:

  • We'll need a changelog entry about the new feature.
  • I'm still uncertain about the utility of XML export formats. It seems to me that "HTML" export is exactly the same as the "XML" format of XML, isn't it? And the text format doesn't seem all that useful to me…

@austinmm
Copy link
Contributor Author

How do we create a changelog entry?

@sampsyo
Copy link
Member

sampsyo commented Oct 15, 2019

There's a file in docs/changelog.rst that serves that purpose. 😃

@austinmm
Copy link
Contributor Author

Let me know if I didn't create the changelog entry correctly or if you notice anything else that needs fixing :)

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for your continued effort on this. I made some tiny comments which I will apply now before merging.

- **CSV Formatting**
- **delimiter**: Used as the separating character between fields. The default value is a comma (,).

- **dialect**: A dialect is a construct that allows you to create, store, and re-use various formatting parameters for your data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **dialect**: A dialect is a construct that allows you to create, store, and re-use various formatting parameters for your data.
- **dialect**: The kind of CSV file to produce. The default is `excel`.

Keeping this focused on what matters to users of this plugin, not to Python programmers.

@@ -62,4 +77,8 @@ The default options look like this::
ensure_ascii: False
indent: 4
separators: [',' , ': ']
sort_keys: true
sort_keys: True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower case is OK here—it’s YAML, not Python.

Suggested change
sort_keys: True
sort_keys: true

csv:
formatting:
delimiter: ','
dialect: 'excel'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dialect: 'excel'
dialect: excel

No quotes are usually necessary in YAML.

@sampsyo
Copy link
Member

sampsyo commented Oct 16, 2019

Arg, turns out GitHub won’t let me commit my own suggestions on my phone. :( But I will wrap this up when I’m back at a normal computer.

@austinmm
Copy link
Contributor Author

Your changes look great! Thank you for all the help you provided me with this. This was my first time contributing to an open-source project so all your assistance was very much appreciated.

@sampsyo
Copy link
Member

sampsyo commented Oct 16, 2019

Wow! In that case, thank you for your diligent efforts, congratulations, and welcome to the world of open source. 😃 Nicely done!

@sampsyo sampsyo merged commit a1d1265 into beetbox:master Oct 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants