Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fast exact name search by default #66

Merged
merged 3 commits into from
Oct 27, 2020

Conversation

lidel
Copy link
Member

@lidel lidel commented Oct 16, 2020

tldr

This PR changes the default behavior of the name filter

from case-insensitive, partial match
to much faster case-sensitive, exact (full) match

and adds match parameter that enables opt-in into alternative strategy when performance is not an object:

Screenshot_2020-10-26 IPFS Pinning Service API

PREVIEW: https://ipfs.github.io/pinning-services-api-spec/#specUrl=https://raw.githubusercontent.com/ipfs/pinning-services-api-spec/fix/default-to-fast-name-match/ipfs-pinning-service.yaml

Rationale for this change

  • Case-insensitive partial "Fuzzy" search (ipartial) makes a bad default. (performance-wise)
    Database indexes and b-trees do not solve the performance fully.
    Exact match is always faster and less expensive.

  • Fuzzy search may produce bugs
    This PR introduces match query parameter, so developers who need fuzzy search can still opt-in into slower, but more flexible text matching strategy by passing match parameter.

  • We are moving away from slow defaults in IPFS ecosystem. Better to do this now, than later.

    • A good example of tyranny of small decisions is ipfs pin ls, which is abysmally slow when one has >1k pins. Nearly all pins in real life are recursive, and ipfs pin ls --type=recursive executes instantly, but is rarely used due to not being the default.
    • Future ipfs pin remote and ipfs pin local APIs will be inspired by this spec and will have name attribute, however we don't want to re-introduce bad choices into IPFS ecosystem. The default should be the fastest option available, and in this case it is the exact name match.

Implementation notes and migration plan

  • This change does not change the wire format and does not impact ongoing MVP integration in go-ipfs and ipfs-webui (Epic: Pinning service integration ipfs-gui#91, Add support for remote Pinning Services kubo#7559), but we want to include it in this spec to ensure MVP does the right thing.

  • Existing Pinning Services already implement the more difficult fuzzy ipartial strategy. Adding support for match filter and implementing much simpler exact strategy should be a small task, but will save everyone headache in the long run.


@aschmahmann @jacobheun @petar @gammazero @obo20 @andrew @GregTheGreek @priom @jsign @sanderpick @andrewxhill @ipfs/wg-pinning-services

This changes the default behavior of the `name` filter from
"case-insensitive, partial match" to "case-sensitive, exact (full)
match" and adds `match` parameter that enables opt-in into `fuzzy`
strategy only when performance is not an object.

The rationale for this change that fuzzy search is a bad default, does
not scale well and in most cases users and developers expect exact
match. We are moving away from slow defaults in go-ipfs, and since
`ipfs pin local` API will be inspired by this spec, we don't want to
re-introduce bad choices into IPFS ecosystem.

Developers who need fuzzy search, can always opt-in into slower, but
more flexible text matching strategy by passing `match` parameter, but
the default should be the fastest option available,
and in  this case it is the exact match.
@lidel lidel added P0 Critical: Tackled by core team ASAP need/community-input Needs input from the wider community need/maintainers-input Needs input from the current maintainer(s) topic/docs Improvements or additions to documentation topic/pin/status-check Issues related to checking pin status labels Oct 16, 2020
Copy link

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this seems much more sensible from an ecosystem perspective.

default: exact
enum:
- exact # full match, case-sensitive (the implicit default)
- fuzzy # partial match, case-insensitive

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚲 + 🏚️ :
"fuzzy" I like that it's short, but not so happy that it's not quite precise in what it does. I'm fine with fuzzy, but putting out the call for better names before we decide we're just going with this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am totally fine with renaming fuzzy to something better.
Food for thought: partial, relaxed, inexact, loose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for partial unless you specifically need to indicate it's case-insensitive too (if the latter, would suggest something like full-sensitive and partial-insensitive but that gets a little too bikesheddy, probably)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for partial

Copy link
Member Author

@lidel lidel Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to partial in 91299cc (#66) and added match=exact|iexact|partial|ipartial in
f96383a to fuel discussion

We need to figure out if thats enough, or do we want something more fancy like suggestion from #66 (comment)

default: exact
enum:
- exact # full match, case-sensitive (the implicit default)
- fuzzy # partial match, case-insensitive
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for partial

@achingbrain
Copy link
Member

achingbrain commented Oct 19, 2020

Do we need a match parameter at all? Can we not just look at the passed name and if it contains a * character it's a wildcard match and if not it's exact? Case sensitive/insensitive or search flags could be an option?

@lidel
Copy link
Member Author

lidel commented Oct 20, 2020

@achingbrain
If we replace match with support for * and add optional case=sensitive|insensitive, then services need to implement parser and 4 strategies instead of two (exact-casesensitive, exact-caseinsensitive, partial-casesensitive, partial-caseinsensitive), and API users need to learn that * has special meaning, and we need to spec that it can't be used in names (or spec how to escape it).

I'd argue match=exact|iexact|partial|ipartial added in f96383a is much easier for services and users,
but I'm open for suggestions if there is a simpler way to represent this (when match is missing, exact is used).

andrew added a commit to ipfs-shipyard/rb-pinning-service-api that referenced this pull request Oct 26, 2020
@andrew
Copy link

andrew commented Oct 26, 2020

PR for implementation on the ruby pinning api server over here: ipfs-shipyard/rb-pinning-service-api#5

andrew added a commit to ipfs-shipyard/rb-pinning-service-api that referenced this pull request Oct 26, 2020
andrew added a commit to ipfs-shipyard/rb-pinning-service-api that referenced this pull request Oct 26, 2020
andrew added a commit to ipfs-shipyard/rb-pinning-service-api that referenced this pull request Oct 26, 2020
@obo20
Copy link

obo20 commented Oct 27, 2020

This PR looks great to me. The changes will be helpful in keeping things performant by default while also allowing for some flexibility.

@lidel lidel changed the title feat: fast name search by default feat: fast exact name search by default Oct 27, 2020
@lidel lidel merged commit e4009fe into master Oct 27, 2020
@lidel lidel deleted the fix/default-to-fast-name-match branch October 27, 2020 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/community-input Needs input from the wider community need/maintainers-input Needs input from the current maintainer(s) P0 Critical: Tackled by core team ASAP topic/docs Improvements or additions to documentation topic/pin/status-check Issues related to checking pin status
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants