Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String queries should be case-insensitive even for non-ASCII characters #3160

Closed
netvl opened this issue Feb 22, 2019 · 8 comments · Fixed by #3226
Closed

String queries should be case-insensitive even for non-ASCII characters #3160

netvl opened this issue Feb 22, 2019 · 8 comments · Fixed by #3226
Labels
feature features we would like to implement

Comments

@netvl
Copy link

netvl commented Feb 22, 2019

Problem

When searching for metadata written in English or, in other words, in latin script, beet ls performs case-insensitive search, which is very nice. With non-latin scripts (I've checked with cyrillic, since it is the only one in which I have tracks and which has the case concept), however, case-insensitive search does not work.

Running this command:

$ beet ls сплин

outputs nothing, while this one:

$ beet ls Сплин

shows all the tagged files correctly.

I would expect this to work for any kind of scripts, probably based on unicode case transformations.

Setup

  • OS: Archlinux
  • Python version: 3.7.2
  • beets version: 1.4.7
  • Turning off plugins made problem go away (yes/no): no

My configuration (output of beet config) is:

lyrics:
    bing_lang_from: []
    auto: yes
    bing_client_secret: REDACTED
    bing_lang_to:
    google_API_key: REDACTED
    google_engine_ID: REDACTED
    genius_api_key: REDACTED
    fallback:
    force: no
    local: no
    sources:
    - google
    - lyricwiki
    - musixmatch
    - genius
directory: ~/Music
library: ~/.local/share/beets/music.db
sort_item: artist+ year+ album+ disc+ track+
original_date: yes

ui:
    color: yes

import:
    move: no
    log: ~/.local/share/beets/import.log
    incremental: yes

paths:
    default: $albumartist/$year - $album%aunique{albumartist year album}/%if{$multidisc,Disc $disc/}$track - $title
    comp: Compilations/$album%aunique{}/$track - $title
    singleton: Singletons/$artist - $title
item_fields:
    multidisc: 'True if disctotal > 1 else False

        '

plugins: chroma fromfilename edit embedart fetchart lyrics zero bpd convert info export alternatives types inline web
chroma:
    auto: no
embedart:
    compare_threshold: 50
    maxwidth: 0
    auto: yes
    ifempty: no
    remove_art_file: no
fetchart:
    store_source: yes
    auto: yes
    minwidth: 0
    maxwidth: 0
    enforce_ratio: no
    cautious: no
    cover_names:
    - cover
    - front
    - art
    - album
    - folder
    sources:
    - filesystem
    - coverart
    - itunes
    - amazon
    - albumart
    google_key: REDACTED
    google_engine: 001442825323518660753:hrh5ch1gjzm
    fanarttv_key: REDACTED
zero:
    fields: month day
    auto: yes
    keep_fields: []
    update_database: no
bpd:
    host: 127.0.0.1
    port: 6600
    password: REDACTED
    volume: 100
convert:
    never_convert_lossy_files: yes
    format: opus
    dest:
    pretend: no
    threads: 12
    formats:
        aac:
            command: ffmpeg -i $source -y -vn -acodec aac -aq 1 $dest
            extension: m4a
        alac:
            command: ffmpeg -i $source -y -vn -acodec alac $dest
            extension: m4a
        flac: ffmpeg -i $source -y -vn -acodec flac $dest
        mp3: ffmpeg -i $source -y -vn -aq 2 $dest
        opus: ffmpeg -i $source -y -vn -acodec libopus -ab 96k $dest
        ogg: ffmpeg -i $source -y -vn -acodec libvorbis -aq 3 $dest
        wma: ffmpeg -i $source -y -vn -acodec wmav2 -vn $dest
    max_bitrate: 500
    auto: no
    tmpdir:
    quiet: no
    embed: yes

    paths: {}
    no_convert: ''
    copy_album_art: no
    album_art_maxwidth: 0
types:
    phone_export: bool
alternatives:
    phone:
        directory: ~/tmp/phone-music
        formats: opus mp3 ogg
        query: phone_export:true
export:
    default_format: json
    json:
        formatting:
            ensure_ascii: no
            indent: 4
            separators: [',', ': ']
            sort_keys: yes
edit:
    albumfields: album albumartist
    itemfields: track title artist album
    ignore_fields: id path
pathfields: {}
album_fields: {}
web:
    host: 127.0.0.1
    port: 8337
    cors: ''
    cors_supports_credentials: no
    reverse_proxy: no
    include_paths: no
@sampsyo sampsyo added the feature features we would like to implement label Feb 22, 2019
@sampsyo sampsyo changed the title Case insensitive search does not work for non-latin scripts String queries should be case-insensitive even for non-ASCII characters Feb 22, 2019
@sampsyo
Copy link
Member

sampsyo commented Feb 22, 2019

Sounds good! See also #2626 for some prior discussion.

@netvl
Copy link
Author

netvl commented Feb 23, 2019

I think the the solutions has two parts: first, the user of beets should have the ICU extension for SQLite on their machine (at least for Linux systems, where system SQLite libraries are usually used). This seems to be quite obscure, unfortunately - it seems that there is no ICU extension anywhere in Archlinux packages, including AUR, which suggests than it is even worse in other distros (maybe sans Gentoo and similar distributions). All instructions on the net suggest building the extension manually.

Second, beets should load this extension when it queries the database. From what I understand, this should be an explicit action on the client side of the database. After it is done, the LIKE queries should handle character case automatically.

I wonder, could this (loading an SQLite extension) be done through a plugin? If yes, then I think adding a plugin with an optional explicit path to the extension shared object (possibly using some system directories, if it is not specified), which loads this extension, could be a nice solution.

@netvl
Copy link
Author

netvl commented Feb 23, 2019

Building sqlite with ICU module enabled does indeed help, and makes it work automatically without any changes necessary from the beets side. This kind of solves this issue specifically for me, since I can build sqlite with necessary options. But I wonder - across all platforms, how often maintainers build sqlite with ICU extension built-in?

@jackwilsdon
Copy link
Member

jackwilsdon commented Feb 23, 2019

That's great to hear @netvl, thanks for doing some digging! It's awesome to hear that beets doesn't need any changes to support this, and I've been playing around with using a dynamic library for it too.

It does seem that we can load the ICU extension into beets' SQLite database at startup using a plugin, although we do need to reach into the beets internals a bit;

from __future__ import division, absolute_import, print_function

from beets.plugins import BeetsPlugin

class LoadIcuPlugin(BeetsPlugin):
    def __init__(self):
        super(LoadIcuPlugin, self).__init__()
        self.register_listener('library_opened', self.library_opened)

    def library_opened(self, lib):
        lib._connection().load_extension('libicu.so')

It seems that we could also call db.enable_load_extension(True) inside dbcore and then just use the Transaction#query method to run SELECT load_extension("libicu.so"), which doesn't require calling any private methods (like _connection()).

I think this could be made into a simple LoadExtPlugin which takes a list of plugins to load from your beets configuration.

@sampsyo
Copy link
Member

sampsyo commented Feb 24, 2019

That's cool! We could even fold this into beets core if we could make it optional. Is it easy to load the plugin if it's available, and silently do nothing if it's not?

@jackwilsdon
Copy link
Member

The main issue is that you need a path to the plugin, I guess we could make it a core configuration option to provide the path? I'm not sure if there's a "standardised" path for SQLite plugins.

@sampsyo
Copy link
Member

sampsyo commented Feb 24, 2019

Ah, of course, that makes sense! In that case, a beets plugin for loading these could be the right path.

@Moonbase59
Copy link

Moonbase59 commented Oct 1, 2019

Even with ICU enabled, there are still edge cases that won’t work, I find many with French accented letters: The song "À quoi ça sert l'amour (avec Théo Sarapo)" by "Édith Piaf" cannot be found using these:

beet ls edith piaf
beet ls Edith piaf
beet ls a quoi ca

but will be found using these:

beet ls édith piaf
beet ls Édith piaf
beet ls a quoi ça
beet ls A QUOI ÇA

It appears that an accented À is handled differently from an accented É, ditto for ç.
This is not a beets problem but a problem with SQLite’s handling of ICU, LIKE and level 1 collation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature features we would like to implement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants