Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

print lyrics as HTML #2628

Merged
merged 22 commits into from
Jul 17, 2017
Merged

print lyrics as HTML #2628

merged 22 commits into from
Jul 17, 2017

Conversation

anarcat
Copy link
Contributor

@anarcat anarcat commented Jul 14, 2017

Here's another crazy idea: how about printing lyrics in a usable
format? Here, lyrics -p on the whole library just dumps a
concatenation of all lyrics, which is not very useful.

I was thinking of improving the output a little, maybe by adding HTML
tags or some form of markup. this pull request does exactly that,
although rather crudely: the resulting output is probably not valid
HTML (it's just snippets) and embeds the lyrics in <pre> tags (as
opposed to trying to format them correctly).

the output is, of course, not an ebook in any way, although it could
be used to construct an ebook or other things.

comments? other ideas? suggestions?

i figured it was better to implement this directly in beets instead of
hacking at the sqlite database on the side...

the idea here is to format the lyrics output a little better so that
it can (for example) be shown as a web page or an ebook.

the new skip option allows for faster generation of the output in the
(most common) case where not all lyrics are available.
@anarcat
Copy link
Contributor Author

anarcat commented Jul 14, 2017

this is a little crazy, i must admit, but it works. i can import the resulting output in Sigil and create a table of contents, then save that as an ePUB and it's read by my e-reader.

beets is still going through my library to get those lyrics, but a first dump of the database with the letters A and B done gave me a 840 pages ePUB.

so yeah, a little crazy... but i think it's worth exploring further.

@anarcat anarcat changed the title print lyrics as HTML WIP: print lyrics as HTML Jul 14, 2017
ReStructuredText has the advantage over HTML that it can be rendered
easily to multiple formats (HTML, ePUB, PDF) and it supports indexes.

the output needs to be fed into a file and integrated into an existing
Sphinx document, of course.
@anarcat
Copy link
Contributor Author

anarcat commented Jul 14, 2017

i'm now at the optimization stage. with a full library, generating the ePUB takes a considerable amount of time, which is a blocker for me right now. i am considering making the patch actually write all the necessary files in a given output directory, which would allow me to write per-artist RST files, which would optimize the workflow a lot.

does that sound reasonable?

@SusannaMaria
Copy link
Contributor

Nice enhancement. My collection is not very cultivated in terms of lyrics but I tried your code, which looks nice. Some handwork has to be done, maybe in later versions we can add some automation using sphinx in beet directly. Would like to see if we can give the filename as option and doing the rst generation without any fetching of lyrics.

Tested on Windows10 x64 with Python3

grafik

this makes the ePUB easier to parse by e-readers, because they do not
need to load one giant HTML file, but one per author. it also makes
sphinx rendering more efficient and interactive
this makes the code more readable and reduces the number of syscalls
to write files
@anarcat
Copy link
Contributor Author

anarcat commented Jul 15, 2017

Some handwork has to be done, maybe in later versions we can add some automation using sphinx in beet directly.

That may be a little over the top, in my opinion. It would add additional dependencies into beets and i'm not sure the maintainers here would like that. Furthermore, who knows what format the end user will want? HTML? PDF? ePUB? ePUB3? Let's just let each tool do its job. We can tell the user explicitly how to do theirs. :)

Would like to see if we can give the filename as option [...]

Ask and you shall receive. ;) The latest commits pushed actually write RST files directly, in a given output directory, instead of outputting a long single RST document. that was necessary because the size of the file (and the resulting ePUB) was getting unwidely and hard to deal with by all parts of the toolchain. sphinx now takes much faster to parse the resulting documents (with progress information) and ePUB readers are much more responsive, as they need to open only smaller files now.

[...] and doing the rst generation without any fetching of lyrics.

you should already be able to do that with the newly introduced --skip / -s option.

the only question that remains for me is whether the conf.py and index.rst files belong to the documentation or the program: should we automatically generate them or tell the user how to create the files? I tend towards the former, but right now the documentation is where those files are. i would think it better to create the files once, if they don't already exist, and tell the user how to use them interactively.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! This is a funky and cool feature. I like the modern default Sphinx theme quite a lot, and from the above screenshot it works great for lyrics. Nice work!

Here are a few miscellaneous suggestions. I don’t have a strong feeling about automatically generating the Sphinx config—producing an index file seems very easy to justify (it’s “just ReST”), and while the config file is more tool-specific, it could be convenient to have too.

@@ -84,10 +84,57 @@ lyrics will be added to the beets database and, if ``import.write`` is on,
embedded into files' metadata.

The ``-p`` option to the ``lyrics`` command makes it print lyrics out to the
console so you can view the fetched (or previously-stored) lyrics.
console so you can view the fetched (or previously-stored) lyrics. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This discussion might deserve its own, separate section heading in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done too.

u'-s', u'--skip', dest='skip_fetched',
action='store_true', default=False,
help=u'skip already fetched lyrics',
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, the option name “skip” seems a little counterintuitive to me. (And “skip already fetched lyrics” is the opposite of what it does, right? It skips lyrics that have not been fetched, if I understand correctly.) Maybe another name like “local” or “print-only” or something would be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's true, the description is completely backwards. i was trying to make the opposite option of --force, so --skip made sense. not sure what would be a better name... -l and --local seem better, but don't fit well with --force...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: renamed to --local.

if item.lyrics:
if opts.printlyr:
ui.print_(item.lyrics)
if opts.writerst:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please consider moving the ReST stuff to its own function? Having it separated from the top-level workflow will make things easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd be happy to. would it be a different func hook? or a different plugin? or just a different function? i'm not sure which level you're asking for here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, just a separate function (or method, really) will do nicely.

Copy link
Contributor Author

@anarcat anarcat Jul 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can see moving it to a different function, but the problem is that we'd need to iterate over all the items another time - we wouldn't write the lyrics as soon as we download them, whereas we do for the --print function.

we need a complete loop access because of the way we open that rst file: one per artist means we need to close the file outside of the loop when we're done.

unless i missed something obvious here of course. been staring at that loop too long already. ;)

update: actually, is there a way to do those things asynchronously in beets? like download in the background and deal with the results with a hook or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it to a different function. required moving more state into the class, but i think it works out okay. we still write as we go, i hope that's what you had in mind too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! This looks good from here.

@anarcat
Copy link
Contributor Author

anarcat commented Jul 15, 2017

@sampsyo i'd love to hear your comments on whether the conf.py and index.rst files belong to the documentation or to the code. they could be auto-generated as well, which would streamline the process neatly.

@sampsyo
Copy link
Member

sampsyo commented Jul 15, 2017

Great! Here are my brief thoughts from the overall review comment:

I don’t have a strong feeling about automatically generating the Sphinx config—producing an index file seems very easy to justify (it’s “just ReST”), and while the config file is more tool-specific, it could be convenient to have too.

@anarcat
Copy link
Contributor Author

anarcat commented Jul 15, 2017

ah yes, sorry i missed that. will add conf.py and index.rst generation as well then.

skip was a misnomer: we actually skip "unfetched" lyrics. this means
it's somewhat of a double-negative and really confusing.

--local is clearer, although less in opposition with --force
@anarcat anarcat force-pushed the lyrics-html branch 2 times, most recently from 8b0e9a8 to 5f478b7 Compare July 15, 2017 19:06
this simplifies and clarifies the code, although we need to call the writerst function twice to wrap up at the end of the loop
we write the artists files in a subdirectory, to avoid infinite
recursions or flooding the current directory needlessly.

this way, the user has a good base structure and can just chain the
command into sphinx to continue building the next format, after
possible tweaks.
@anarcat
Copy link
Contributor Author

anarcat commented Jul 15, 2017

@sampsyo alright, i believe all your concerns have been addressed.

@SusannaMaria the command now also generates the necessary conf.py and index.rst base structures so you should be able to just run the sphinx-build commands, which are shown when the rst generation completes, to generate your format of choice.

i think this is pretty much ready to sail now.

@anarcat anarcat changed the title WIP: print lyrics as HTML print lyrics as HTML Jul 15, 2017
if self.rst is not None:
slug = re.sub(r'\W+', '-', unidecode(self.artist).lower())
path = os.path.join(directory, 'artists', slug + u'.rst')
with open(path, 'w') as output:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got an error "TypeError: write() argument must be str, not bytes"
this fixed it:
with open(path, 'wb') as output:

don't know if others face the same problem, win10x64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh. typical python crap. let me guess: you're using Python 3?

isn't the encode('utf-8') idiom okay anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed a commit to add the b, not sure what's idiomatic anymore...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are fast! Guess its okay to use utf-8.

anarcat and others added 3 commits July 15, 2017 16:21
when we encode explicitly, we return bytes, so open files as binary
Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I made a few changes of my own to the docs and added a changelog entry (and a metavariable for the CLI option). I have just a few more scattered suggestions here, and then this should be good to go.

if item.lyrics:
if opts.printlyr:
ui.print_(item.lyrics)
if opts.writerst:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! This looks good from here.

@@ -659,26 +671,133 @@ def commands(self):
help=u'print lyrics to console',
)
cmd.parser.add_option(
u'-r', u'--write-rst', dest='writerst',
action='store', default='.', metavar='directory',
help=u'write lyrics to given directory as RST files',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is totally bikeshedding, but reStructuredText is usually abbreviated reST or ReST rather than RST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine with changing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed. i changed all occurences of rst (except the .rst file extension), i hope this is what you meant. it has the nice side-effect of making the self.rst variable renamed to the appropriate self.rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; perfect!

indexfile = os.path.join(directory, 'index.rst')
if not os.path.exists(indexfile):
with open(indexfile, 'wb') as output:
output.write(u'''Lyrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For legibility, would you mind moving these long string literals to constants at the top of the file? I find that usually makes it easier to follow the code around the strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

conffile = os.path.join(directory, 'conf.py')
if not os.path.exists(conffile):
with open(conffile, 'wb') as output:
output.write(u'''# -*- coding: utf-8 -*-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the similar stanza above) actually breaks on Python 3 (and would break on Python 2 if this string were non-ASCII). When a file is opened in binary mode (with b in the flags to open), then you can only write raw bytes to the file, not Unicode strings like u'''...'''. Here's the error, FWIW:

Traceback (most recent call last):
  File "beet", line 23, in <module>
    beets.ui.main()
  File "/Users/asampson/Documents/code/beets/beets/ui/__init__.py", line 1256, in main
    _raw_main(args)
  File "/Users/asampson/Documents/code/beets/beets/ui/__init__.py", line 1243, in _raw_main
    subcommand.func(lib, suboptions, subargs)
  File "/Users/asampson/Documents/code/beets/beetsplug/lyrics.py", line 694, in func
    self.writerst_indexes(opts.writerst)
  File "/Users/asampson/Documents/code/beets/beetsplug/lyrics.py", line 779, in writerst_indexes
    ''')
TypeError: a bytes-like object is required, not 'str'

The right resolution is probably to do an .encode('utf8') on the string before writing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or not use the b flag in this specific case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed the b flag and this works both in 2 and 3 now.

not sure if you want to remove the encode() call in the rest of the patch... but it would be tricky because we do need to write unicode there...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, since those files are ASCII, this will work great on both Python 2 and 3. 🎉

slug = re.sub(r'\W+', '-', unidecode(self.artist).lower())
path = os.path.join(directory, 'artists', slug + u'.rst')
with open(path, 'wb') as output:
output.write(self.rst.encode('utf-8'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Regarding the comment below: this works fine because you're now writing encoded bytes to a binary file handle.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right.

"""Write the item to an RST file

this will keep state (in the `rst` variable) in order to avoid
writing continuously to the same files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another incredibly tiny thing, but can I convince you to use full sentences (capital letters and periods) in docstrings and comments? That will help match the rest of the beets code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine with this too, feel free to correct, otherwise i may take a day or two to catchup here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, i believe.

the unicode strings are not binary - rely on Python to do the right
thing here instead of encoding a string we know is already properly
encoded
@anarcat
Copy link
Contributor Author

anarcat commented Jul 17, 2017

how's that? all good? :)

this would cause problems with songs that had trailing spaces with the
index directive
@sampsyo sampsyo merged commit 93966ed into beetbox:master Jul 17, 2017
sampsyo added a commit that referenced this pull request Jul 17, 2017
@sampsyo
Copy link
Member

sampsyo commented Jul 17, 2017

Awesome! Nice work on this. ✨ I've merged everything up. Woohoo!

@anarcat
Copy link
Contributor Author

anarcat commented Jul 17, 2017

whoohoo! awesome! :)

@anarcat anarcat deleted the lyrics-html branch July 17, 2017 15:14
@anarcat
Copy link
Contributor Author

anarcat commented Jul 17, 2017

i noticed a few problems remaining with the code:

  1. there is a duplication of code between the "slugification code" that i introduced to generate filenames and the already existing "slugify" function
  2. a newline is missing after the lyrics block, which generates a warning for every song (!!) when the resulting rst is built
    3, the last commit, which aimed at fixing some :index: directive that were not rendered correctly, doesn't seem to be working here.

i'll work on a new PR to fix those issues, sorry for the trouble.

i must say i'm not sure what to do about slug thing... the slugify function provided by beets does more than just create a slug: it removes stuff in parens, for example. it also uses NKFD normalization which doesn't actually work so well for asian characters, if i understand correctly. i would argue for adopting a simpler approach, but that involves changing that piece of code which introduces all sorts of other issues so i'm hesitant in doing so...

@sampsyo
Copy link
Member

sampsyo commented Jul 18, 2017

Thanks for catching those! About slugify in particular: it looks like that is probably tuned specifically for getting Google results, which might result in slightly different decisions from a similar function for filename generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants