Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a search engine into Plume #324

Merged
merged 11 commits into from
Dec 2, 2018
Merged

Add a search engine into Plume #324

merged 11 commits into from
Dec 2, 2018

Conversation

trinity-1686a
Copy link
Contributor

@trinity-1686a trinity-1686a commented Nov 27, 2018

Add support for search through tantivy
fix #149

This is mostly done, current issues are:

  • query parser might be a bit intolerant, we should probably implement our own
  • closing Plume leave the database blocked, currently you have to run plm search unlock -p to unlock it.
  • frontend is done by me (which means it's probably not beautiful), and is not translated
  • advanced search is working, but yet undocumented (i.e. filtering by date could be done if you know enough tantivy, but I don't expect common user to)
  • As it has grown more than I anticipated, I should probably add some more tests

After upgrading to this, you will have to run plm search init -p <path to plume wokingdir> to initialize search database, otherwise Plume will panic at launch

Add a Tantivy based search engine to the model
Implement most required functions for it
Implement indexaction on insert, update and delete
Modify func args to get the indexer where required
Add subcommand to initialise, refill and unlock search db
Add default fields for search
Add new routes and templates for search and result
Implement FromFormValue for Page to reuse it on search result pagination
Add optional query parameters to paginate template's macro
Update to newer rocket_csrf, don't get csrf token on GET forms
@trinity-1686a trinity-1686a added C: Feature This is a new feature S: Ready for review This PR is ready to be reviewed A: Backend Code running on the server labels Nov 27, 2018
@codecov
Copy link

codecov bot commented Nov 27, 2018

Codecov Report

Merging #324 into master will increase coverage by 5.55%.
The diff coverage is 65.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #324      +/-   ##
==========================================
+ Coverage   22.42%   27.97%   +5.55%     
==========================================
  Files          55       61       +6     
  Lines        4883     6419    +1536     
==========================================
+ Hits         1095     1796     +701     
- Misses       3788     4623     +835
Impacted Files Coverage Δ
src/routes/user.rs 0% <0%> (ø) ⬆️
src/routes/mod.rs 0% <0%> (ø) ⬆️
src/routes/posts.rs 0% <0%> (ø) ⬆️
plume-cli/src/search.rs 0% <0%> (ø)
src/api/posts.rs 0% <0%> (ø) ⬆️
src/main.rs 0% <0%> (ø) ⬆️
src/inbox.rs 0% <0%> (ø) ⬆️
src/routes/likes.rs 0% <0%> (ø) ⬆️
src/routes/comments.rs 0% <0%> (ø) ⬆️
src/routes/reshares.rs 0% <0%> (ø) ⬆️
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74c398d...6d83d5a. Read the comment docs.

Handle process termination
Add tests to search
Add an advanced search form to /search, in template and route
Modify Tantivy schema, add new tokenizer for some properties
Create new String query parser
Create Tantivy query AST from our own
@elegaanz elegaanz self-requested a review November 29, 2018 20:26
@trinity-1686a
Copy link
Contributor Author

trinity-1686a commented Nov 29, 2018

Small todo list of things that are not strictly needed atm, but should still be done:

  • split plume-models/src/search.rs into multiple files, under a new directory : it's "only" 670 lines, but they are very dense, and although all is related to search, having a sort of database connection, a query parser, and a tokenizer in the same file seems too much
  • comment the code, especially the parser
  • maybe make some part less generics but more logical. Currently if searching for author user1@plume1.org or user2@plume2.org, a post from user2@plume1.org would match.
  • add some tests at least for the parser
  • use ructe for the template, and maybe change how js change form on submit (currently going backward in history and re-submit has some issues)

@fulmicoton
Copy link

query parser might be a bit intolerant, we should probably implement our own

@fdb-hiroshima We have a dangling PR to add a lenient mode to search.

i.e. filtering by date could be done if you know enough tantivy, but I don't expect common user to)

@fdb-hiroshima Let me know if you have any question about tantivy.

@trinity-1686a
Copy link
Contributor Author

trinity-1686a commented Nov 30, 2018

@fulmicoton Both "issues" are fixed (you can see they are crossed out), I've implemented a parser which fit with our exact needs (we know more about what each field represent than Tantivy can).

Concerning dates, I don't have any question, it's just how we give date to tantivy is not evident to the end user, but tantivy can't change that. We store date as the result of num_days_from_ce(), basically the number of days since some point. This is because Tantivy's RangeQuery iterate over the terms within the range, so I tried to have the smallest ranges possible, with as few useless/empty elements as possible (iterating over 30 days vs 2,592,000 seconds, one seems faster than the other).

I really like your crate, it's very complete yet relatively easy to use. The only issue I had with it was about BooleanQuery, I had a hard time understanding how to build theme. I was looking for some kind of function, and took a lot of time to figure out it implement From. Maybe it could just simply be mentioned in the documentation of BooleanQuery::new_multiterms_query, "to create a BooleanQuery from Box<Query> use From<Vec<(Occur, Box)>> instead" or something like that

@fulmicoton
Copy link

Opened tantivy-search/tantivy#446 to make creation of BooleanQuery easier.

Your insight about range is correct. Eventually tantivy will be a bit smarter than what it does right now, but too be honest this is pretty low priority.

Looking forward to seeing Plume grow.

Split search.rs into multiple submodules
Add comments and tests for Query
Make user@domain be treated as one could assume
Copy link
Member

@elegaanz elegaanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work well! Thank you! (and thank you @fulmicoton for tantivy as well)

plume-models/src/search/query.rs Outdated Show resolved Hide resolved
plume-models/src/search/query.rs Outdated Show resolved Hide resolved
templates/search/index.html.tera Show resolved Hide resolved
Fix panic when searching for blog without precising instance
Make gen_parser! more idiomatic
Change js a bit to fix issue with history
@trinity-1686a
Copy link
Contributor Author

@BaptisteGelez in what order do we merge search ructe and the dependency upgrade?

@elegaanz
Copy link
Member

elegaanz commented Dec 2, 2018

@fdb-hiroshima I think we could merge the search first, then rebase ructe on master and convert the search page to Ructe, and finally the dependencies? I don't know if there is a better way to avoid conflicts, but it seems to be the more logical way to do it for me…

@trinity-1686a
Copy link
Contributor Author

I'll do so

@trinity-1686a trinity-1686a merged commit 449641d into master Dec 2, 2018
@trinity-1686a
Copy link
Contributor Author

trinity-1686a commented Dec 2, 2018

There is one thing I forgot. There is nowhere a link to /search
And there is also no instruction to run plm before

@elegaanz
Copy link
Member

elegaanz commented Dec 2, 2018

You're right, but I'll a a link the header when converting the template to ructe. And updating the docs can be done in another PR.

@trinity-1686a
Copy link
Contributor Author

if you want you can make a small form with an <input name="q">, it is enough to make a simple query, and would probably fit better with the ergonomic people expect (being able to do a basic search from anywhere, and an advanced search if the result were inconclusive)

@trinity-1686a trinity-1686a deleted the search branch December 3, 2018 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: Backend Code running on the server C: Feature This is a new feature S: Ready for review This PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Search
3 participants