Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for glob/regex filtering on NestedAsset list endpoint #1022

Merged
merged 6 commits into from
Apr 19, 2022

Conversation

mvandenburgh
Copy link
Member

Closes #1002.

Note that Postgres doesn't support querying with globs, but it does support querying with a regex. So, the approach I've taken to support globbing here is to convert the glob pattern to a regex pattern and then use that in the query. Python has a built-in function (fnmatch.translate) that converts a glob expression to a regex, but the regex syntax it uses is incompatible with Postgres's regex syntax. I found this 81-line library that does work with Postgres, so I've used that here.

@mvandenburgh mvandenburgh force-pushed the asset-glob-param branch 2 times, most recently from 95786df to bed134f Compare April 12, 2022 18:36
@jjnesbitt
Copy link
Member

Python has a built-in function (fnmatch.translate) that converts a glob expression to a regex, but the regex syntax it uses is incompatible with Postgres's regex syntax.

What exactly from the translation is incompatible with Postgres? I'd hope that any differences are minor and could be addressed directly. I'd rather not pull in an obscure 3rd party library if it's not necessary.

@mvandenburgh
Copy link
Member Author

Python has a built-in function (fnmatch.translate) that converts a glob expression to a regex, but the regex syntax it uses is incompatible with Postgres's regex syntax.

What exactly from the translation is incompatible with Postgres? I'd hope that any differences are minor and could be addressed directly. I'd rather not pull in an obscure 3rd party library if it's not necessary.

fnmatch.translate('*.json') returns (?s:.*\.json)\Z while Postgres expects something like ^.*\.json$. I'm unsure of the differences between the two, although I've never seen the ?s \Z syntax from the first one before. I do know that support for regex varies across different languages. For example, apparently Python and JavaScript have slight differences in their regex syntaxes that make them incompatible. There's a surprisingly lack of information online about how to translate between the fnmatch format and postgres/Django format :(. I'll take another look

@jjnesbitt
Copy link
Member

fnmatch.translate('*.json') returns (?s:.*\.json)\Z while Postgres expects something like ^.*\.json$. I'm unsure of the differences between the two, although I've never seen the ?s \Z syntax from the first one before. I do know that support for regex varies across different languages. For example, apparently Python and JavaScript have slight differences in their regex syntaxes that make them incompatible. There's a surprisingly lack of information online about how to translate between the fnmatch format and postgres/Django format :(. I'll take another look

It seems to me a lot of this has to do with regex modes and differences with chars used. For example, the ?:s is the python regex mode for include newlines in ., which doesn't really apply to us. Toying around with it for a bit, I got this regex replacement which should solve most of our problems

re.sub(r'\(\?s?m?:([^\)]+)\)\\Z', r'^\1$', glob)

This essentially removes the (?s:) and \\Z part from the string. It also catches the m (multiline) modifier).

dandiapi/api/views/asset.py Outdated Show resolved Hide resolved
dandiapi/api/views/common.py Outdated Show resolved Hide resolved
@waxlamp
Copy link
Member

waxlamp commented Apr 13, 2022

fnmatch.translate('*.json') returns (?s:.*\.json)\Z while Postgres expects something like ^.*\.json$. I'm unsure of the differences between the two, although I've never seen the ?s \Z syntax from the first one before. I do know that support for regex varies across different languages. For example, apparently Python and JavaScript have slight differences in their regex syntaxes that make them incompatible. There's a surprisingly lack of information online about how to translate between the fnmatch format and postgres/Django format :(. I'll take another look

It seems to me a lot of this has to do with regex modes and differences with chars used. For example, the ?:s is the python regex mode for include newlines in ., which doesn't really apply to us. Toying around with it for a bit, I got this regex replacement which should solve most of our problems

re.sub(r'\(\?s?m?:([^\)]+)\)\\Z', r'^\1$', glob)

This essentially removes the (?s:) and \\Z part from the string. It also catches the m (multiline) modifier).

I don't love either including that library (even if it seems to be harmless) or this homebrew regex (even if we could write tests for it). The docs for fnmatch.translate imply that the globbing star is just converted to .*, while regular dots are converted to \.. (Not sure what happens to **, but we can defer that functionality for now.) Could we just do that?

@jjnesbitt
Copy link
Member

I don't love either including that library (even if it seems to be harmless) or this homebrew regex (even if we could write tests for it). The docs for fnmatch.translate imply that the globbing star is just converted to .*, while regular dots are converted to \.. (Not sure what happens to **, but we can defer that functionality for now.) Could we just do that?

I think it's worth testing out at least. For the record, ** isn't handled by fnmatch, that's only handled by the glob module.

@mvandenburgh
Copy link
Member Author

Agreed that @waxlamp's solution is cleaner. I tested it and it seems to work perfectly, I'll update this to do it that way 👍

Copy link
Member

@jjnesbitt jjnesbitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one request otherwise looks good

dandiapi/api/views/serializers.py Show resolved Hide resolved
@mvandenburgh mvandenburgh added patch Increment the patch version when merged release Create a release when this pr is merged labels Apr 14, 2022
@mvandenburgh mvandenburgh removed patch Increment the patch version when merged release Create a release when this pr is merged labels Apr 14, 2022
@jjnesbitt
Copy link
Member

@mvandenburgh it seems the test_zarr_rest_list_filter test failed? Have you seen that for yourself or is that just a fluke?

@mvandenburgh
Copy link
Member Author

@mvandenburgh it seems the test_zarr_rest_list_filter test failed? Have you seen that for yourself or is that just a fluke?

I assumed it was a fluke because I'm not seeing it when I run tests locally, but it happened again when I reran the job 😕

@mvandenburgh mvandenburgh merged commit a7f19f1 into master Apr 19, 2022
@mvandenburgh mvandenburgh deleted the asset-glob-param branch April 19, 2022 16:07
@dandibot
Copy link
Member

🚀 PR was released in v0.2.13 🚀

@dandibot dandibot added the released This issue/pull request has been released. label Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

find files matching a regex
4 participants