Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

web: web page search box doesn't work for regex searches #3867

Closed
GrahamCobb opened this issue Mar 6, 2021 · 8 comments · Fixed by #3869
Closed

web: web page search box doesn't work for regex searches #3867

GrahamCobb opened this issue Mar 6, 2021 · 8 comments · Fixed by #3869
Labels
bug bugs that are confirmed and actionable

Comments

@GrahamCobb
Copy link
Contributor

GrahamCobb commented Mar 6, 2021

Problem

This is not a problem in the web API itself, but in the web pages which provide the simple web user interface.

Bringing up the web interface and entering a query such as somefield::. never returns any results.
The problem is that the web page ends up double URI encoding the search before passing it to GET /item/query.

I have a fix (in static/beets.js) which I can submit once the current PR is done.

However, I have no idea how to create a test for this as it would mean starting the webserver, submitting an HTTP request and checking the resulting (complex) HTML. Does anyone have any example of doing that in the beets pytest environment? I know very little python and nothing about pytest but I may be able to steal a similar test if one exists!

EDIT: Actually, it is the last step - parsing and checking the resulting HTML which is hard (the rest is what the tests already do - but they are dealing with JSON responses, not HTML responses). Does anyone have any tools or examples of checking HTML responses? Or do I just do some simple string searches and hope nothing changes too much to change the page in the future?

@sampsyo
Copy link
Member

sampsyo commented Mar 6, 2021

Good catch; that seems like a good thing to fix!

My honest feeling here is that this is probably not worth writing a test for. I think it would be pretty annoying to test a combination of JavaScript and Python, and especially so to test something actually running in a browser. Tests are nice to have, certainly, but in some circumstances they are just not worth the hassle. 😃

@sampsyo sampsyo added the bug bugs that are confirmed and actionable label Mar 6, 2021
@GrahamCobb
Copy link
Contributor Author

Thanks for the advice Adrian. I will be including a couple of tests for the underlying (JSON) web API operations. These currently work but it doesn't do any harm to have sometests to make sure they don't get broken in the future.

Unless I can think of an easy (and reliable) way to add a test for the JS code I will leave it out. Tests which are flaky, or break due to harmless other changes, tend to make things harder instead of easier!

@GrahamCobb
Copy link
Contributor Author

GrahamCobb commented Mar 7, 2021

This has got a bit more complex. There are two related but different bugs, both in web/__init__.py class QueryConverter:

  1. Regex's which include \ (e.g. \d or match a square bracket \[) fall foul of special code which converts \ to the os fileseparator.
  2. Any query which includes / fails because the class joins / separated parts of the URL together (dropping the /), but it does it after URL decode so there is no way to ever preserve a /.

I have a simple fix which just does not do replacement 1 if the query also includes : (e.g. field:value or field::value). This works to make regexs using \ work (which is critical if we are going to make regex useful). But it doesn't fix using / or using \ outside field matches.

To be honest, I am not convinced of the value of doing replacement 1 at all. It seems to be intended to allow path specifications to always use \, both for portability between OS and to avoid interacting with replacement 2. However, queries trying to match paths (even without any slashes) don't seem to work anyway (even with include_paths set) - with or without my change. Is there some circumstance where being able to use a filespec in a /item/query operation (as opposed to a /item/path operation) is useful?

A possibly related question is that explicitly trying to query on paths doesn't seem to work very well anywhere in beets! At the command line, beet ls path::Edition displays all my tracks which have Edition in the path. And beet ls path::Edition/01 displays the files which have a directory ending in Edition and a filename beginning 01, as expected. However neither beet ls path:Edition nor beet ls path:Edition/01 (i.e. without regex) display anything. Should they?

Anyway, I am not trying to open up a larger can of worms. But I am trying to work out whether to include my limited fix to stop replacement 1 in queries including : (or limit it only to regex queries (::)) or remove replacement 1 (and replacement 2?) altogether. My fix seems to allow the cases I care about (using useful regex matches for non-path fields), and avoids potential wider damage, so I will propose that unless someone cares enough to suggest something different.

@sampsyo
Copy link
Member

sampsyo commented Mar 7, 2021

Wow, that is pretty nasty! I think you're right about the reason why replacement 1 exists, and the original idea to allow queries like foo:bar/baz:qux to encode two query parts was probably an overly RESTful mistake. It might be useful to dig into the git history to see if we can uncover some history about why that was added—presumably somebody had a justification for why they wanted to be able to do path queries and couldn't without this change?

However neither beet ls path:Edition nor beet ls path:Edition/01 (i.e. without regex) display anything. Should they?

No—path queries get handled specially so they query entire directory names instead of substring matches. So path:Edition/01 finds everything in your library that's inside a directory ./Edition/01/. Here's where that happens in the source:

class PathQuery(dbcore.FieldQuery):

And here's the somewhat high-level documentation:
https://beets.readthedocs.io/en/stable/reference/query.html#path-queries

@GrahamCobb
Copy link
Contributor Author

Thanks for the hint. I checked the git log for web/__init__.py and discovered the background to the code. Basically it is described in issue #3566 and 6a03afc (@nmeum )

My proposed change (to stop doing the \ substitution if a field match is happening - i.e. a : is present) would explicitly break the example in #3566, which is a path: match. So that is not going to be acceptable.

I suspect the real fix is to undo both replacements. My guess is that there are a tiny number of people using either feature and they could easily change to a world without them. But that might mean code changes for existing users so is not desirable.

My new proposal is to change replacement 1 so it does not apply to regex queries (which are much more likely to use \ than normal queries). In other words, do not do it if :: is in the query. Although this is a change, and will break regex queries for paths, I think it is very unlikely anyone is doing regex queries on paths today.

An alternative is to deprecate but maintain the current syntax and introduce a new query format which does not do manipulation on / or \. But that seems a bit pointless as AURA is coming (although I haven't checked whether it supports these sorts of queries).

It would be good to hear from @nmeum if they are still around.

@sampsyo
Copy link
Member

sampsyo commented Mar 8, 2021

Great. Changing things just for queries that contain :: sounds like a good near-term/immediate fix; then, we can consider rethinking this more carefully in the future. (Or just switching over to the new AURA server plugin from #3758 instead…)

GrahamCobb added a commit to GrahamCobb/beets that referenced this issue Mar 8, 2021
As discussed in bug beetbox#3867, backslash replacement in query strings is a bit of a
hack but it is useful (see beetbox#3566 and beetbox#3567 for more discussion). However,
it breaks many regular expressions so this patch stops the replacement if the
query term contains '::', indicating it is a regex match.

This commit fixes beetbox#3867.

Signed-off-by: Graham R. Cobb <g+beets@cobb.uk.net>
GrahamCobb added a commit to GrahamCobb/beets that referenced this issue Mar 8, 2021
bugfix beetbox#3867.

Signed-off-by: Graham R. Cobb <g+beets@cobb.uk.net>
GrahamCobb added a commit to GrahamCobb/beets that referenced this issue Mar 8, 2021
Signed-off-by: Graham R. Cobb <g+beets@cobb.uk.net>
@GrahamCobb GrahamCobb mentioned this issue Mar 8, 2021
3 tasks
@GrahamCobb
Copy link
Contributor Author

When I opened this issue I thought the problem was in the web page javascript but I later determined there was an underlying problem in the web plugin code. I have submitted a PR with a fix for that.

I propose that if that PR is accepted, this bug should be closed, rather than going on further. I will then open a new bug if needed, if there is still a problem in the web page javascript when I return to that. (I am trying to make various improvements in that, which I will raise as a feature request soon). But first there is a separate small but important feature I would like to work on - I will propose that later today or tomorrow. I hope that is OK.

@GrahamCobb
Copy link
Contributor Author

The fix #3869 now passes CI tests even on Windows (no changes to the code, just to the tests to provide better test objects on Windows).

If/when you are happy to include that PR I believe this can be marked fixed. On to the readonly feature, but that may be tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bugs that are confirmed and actionable
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants