-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: back-end implementation of ranked link seach #210
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
JasonChong96
changed the title
feat: back-end implementation of link seach
feat: back-end implementation of ranked link seach
Jun 19, 2020
JasonChong96
force-pushed
the
search-data-collection
branch
from
June 22, 2020 07:52
8cd43b6
to
0cd76e2
Compare
JasonChong96
force-pushed
the
search-phase-1
branch
2 times, most recently
from
June 23, 2020 09:50
791722a
to
1082347
Compare
JasonChong96
force-pushed
the
search-phase-1
branch
from
June 24, 2020 07:28
cea5a39
to
a2b19d8
Compare
JasonChong96
force-pushed
the
search-phase-1
branch
from
June 25, 2020 04:59
db763e4
to
8261d1c
Compare
JasonChong96
force-pushed
the
search-phase-1
branch
from
June 25, 2020 07:20
3f232b1
to
ac87c21
Compare
liangyuanruo
requested changes
Jun 26, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes as commented - largely lgtm otherwise!
liangyuanruo
approved these changes
Jun 29, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to address conflicts before merging
LoneRifle
pushed a commit
that referenced
this pull request
Jun 30, 2020
* feat: endpoint for url search * fix: remove redundant log * fix: inappropriate error message * feat: rate limiting on search endpoint * refactor: use table name from orm * feat: hide link clicks from search response * feat: support different search orders * fix: update comments * fix: imports * fix: search order validation * feat: add unit test for search controller * fix: test request using wrong params * feat: search ignores inactive links * refactor: move stripping of clicks to service layer * feat: additional tests for new methods * feat: add more tests for textsearch * refactor: remove redundant coalesce * fix: error in sql statement for recency sort * docs: add comment explaining ts_rank_cd normalization * refactor: extract helper methods from search * fix: packagelock * fix: use more reasonable default limit * fix: typo in documentation * refactor: capitalize sql keywords * fix: count including inactive urls and not using index * feat: rate limit use real ip and logs when limit is reached * fix: formatting
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
"Members of the public (MOPs) view go.gov.sg as a central hub for accessing government resources. By providing a link search feature, we will be able to better direct MOPs to access the resources that they require."
Solution
This PR continues the implementation by providing the back-end API for fetching links using plain text queries.
Features:
api/search/urls
has been added for ranked plain text search with support for pagination.api/user/url
, the response contains the total count of matching urls and the urls within the requested range.ts_rank_cd
, which takes into account how far apart query terms are found in the urls. The further they are, the lower the ranking.INACTIVE
urls and they are not included in the partial inverted index used for search1 / log(doc length + 1)
. This is due to the assumption that if there are less words that do not match the query, then the terms are more important in the entry, making it more likely to be relevant to the user's query.The relevance ranking algorithm used is as follows:
(text ranking by PostgreSQL) * log(1 + clickCount)
Notes: The 1 is added to click counts to prevent 0 clickCount from causing an error. There is an assumption made that more popular links are more likely to be relevant to users' queries.
Additional notes
A request to the endpoint requires two separate database queries. This mimics the behavior of Sequelize's
findAndCountAll
which we use forapi/user/url
to support pagination.Deploy Notes
Dependencies
express-rate-limit
: rate limiter middleware for api endpoints@types/express-rate-limit
: type definitions forexpress-rate-limit
TODO:
Full documentation of this feature will be done on the wiki asap