Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework location bias and query builder in general #580

Merged
merged 10 commits into from
Jun 16, 2021

Conversation

lonvia
Copy link
Collaborator

@lonvia lonvia commented May 17, 2021

This is work in progress to improve the location bias for searches. As the unusually high boost for name queries was interfering with the location scoring, the query part itself needed to be reworked as well.

In detail:

  • Location bias is switched to an exponential decay function with the given distance in kilometers as the 0.8 weight limit. To ensure that well-known places show up in the results nonetheless, use a second weight function based on importance and use the max of the two functions.
  • Introduce a new parameter ' zoom' for the location bias, which describes the radius for which the bias is effective as a function of the map zoom level. The precise relation is 0.25km*2^(18 - zoom). The location_bias_scale now describes how much weight is given to important places. Sensible values are between 0.0 and 1.0.
  • Weight function for importance switched to a simple linear decay function. This essentially means using importance as a direct weight factor with the twist that it must not become completely zero.
  • Reintroduce a subquery against the name.ngram index. This ensures similar results as the previous heavy boost on name.raw.
  • Add a subquery against collector.raw which gets a strong boost when the house number matches. Records with housenumbers usually have no name (the street not being in the name indexes), so that they get a huge malus over name matches. The boost offsets this malus.

The query structure is still not ideal but any further improvements require a change to the index mapping. I would like to defer that until after the next release.

With the modified weight functions we might even be able to get rid of painless. That would be a huge relief.

There is a global test instance (using last weeks planet dump) with the changes running at https://pagan.lonvia.de/ Feedback would be very much appreciated.

@hbruch @leonardehrenfried

@lonvia
Copy link
Collaborator Author

lonvia commented May 25, 2021

Here are a couple of queries that still fail:

Switch to a simple linear decay function for weighing with importance.
It uses the importance as a direct weigh factor. The only modification
here is to ensure that the value range is above zero to avoid
cancelling out the score when multiplying the weight.

Location bias is switched to an exponential decay function with the
given distance in kilometers as the 0.8 weight limit. To ensure that
well-known places show up in the results nonetheless, use a second
weight function based on importance and use the max of the two
functions.
These tests rather check that the syntax is right and are as such
of limited use. Given that they are very difficult to adapt to
changes in the query builder, raher get rid of them.
The new zoom parameter for search indicates the approximate area
the location bias should cover. For convenience, the area is
computed from the usual map zoom levels.

The 'location_bias_scale' parameter has slightly changed its meaning
and now expresses how much og an influence importance should have
when searching with location bias. Sensible values go from 0 to 1
with 0 meaning that importance has no influence at all and 1 that
importance has approximately the same influence.
Get rid of the extensive boost. Instead have a second check against
the ngram index for names. Also introduces a special path for
housenumber matches. Housenumber records cannot match against the
name index, so put more emphasis on the collector score instead.
This enables the use of different analysers.
There is no explicit index for the name in the default language.
It is included in any of the language-specific indexes, so query
them, when default is requested.
@lonvia lonvia force-pushed the location-bias-and-query-structure branch from 0b00fbd to da45392 Compare June 5, 2021 12:57
When requiring that a name or housenumber is present, then alternative
names must be taken into account as well. The current DB schema only
has a raw name index on these alternative names, so use that with a
very low boost.
@lonvia
Copy link
Collaborator Author

lonvia commented Jun 14, 2021

The latest commit fixes an issue where alternative names (alt_name, old_name, etc.) were not searchable anymore.

@kenseii
Copy link

kenseii commented Jun 16, 2021

@lonvia I think with the rework of the query builder searching with post code is no longer working. For example the results of /api?q=289-1606 on the #585 PR(which is based on this PR) returns data totally different to the ones of photon.komoot.io which are quite accurate. Not sure if its a bug or if the support of post code search are going to be dropped.

@lonvia lonvia changed the title [WIP] Rework location bias and query builder in general Rework location bias and query builder in general Jun 16, 2021
@lonvia
Copy link
Collaborator Author

lonvia commented Jun 16, 2021

Postcode search has never really worked because we don't import the artificial postcode from Nominatim, see #310. Current master gives you some random result that would happen to have the postcode (a river in your example) but not really a postcode result. So I don't really see that as a regression.

@lonvia lonvia merged commit 22d8cf0 into komoot:master Jun 16, 2021
@lonvia lonvia deleted the location-bias-and-query-structure branch June 16, 2021 07:47
@lonvia
Copy link
Collaborator Author

lonvia commented Jun 16, 2021

I've merged this now and deployed it on https://photon.komoot.io. We'll see if there is feedback from the wider user community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants