Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full text search for queries based on the Postgres tsvector type. #2041

Merged
merged 1 commit into from
Jan 11, 2018

Conversation

jezdez
Copy link
Member

@jezdez jezdez commented Oct 17, 2017

This needs some more tests with a prod like database and probably another
look for someone with more experience with custom alembic migrations.

This is related to #1400 of course and uses SQLAlchemy-searchable.

@jezdez jezdez force-pushed the fulltext-search branch 2 times, most recently from 2d70074 to 6164efa Compare October 17, 2017 21:47
def downgrade():
conn = op.get_bind()

ss.drop_trigger(conn, 'article', 'search_vector')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

article?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes!

@@ -7,6 +7,8 @@
def create_tables():
"""Create the database tables."""
from redash.models import db
# make sure the additional search mappers are configured
db.configure_mappers()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the search mappers?

Copy link
Member Author

@jezdez jezdez Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is misleading, this basically makes sure that all mappers that have been registered with the SQLAlchemy ORM have been properly configured. Since SQLAlchemy-Searchable installs database triggers that have to act on when ORM model instances are created (e.g. to track updates), this call is needed.

Here's what the SA-Searchable docs say:

It is very important here that you either access your searchable class or call configure_mappers before the creation of tables. SA-Searchable adds DDL listeners on the configuration phase of models.

Since this code happens in the phase in which tables are created, we can't really access the searchable class, but have to call configure_mappers. I'll update the comment to reflect this.

redash/models.py Outdated
where |= Query.id == term

where &= Query.is_archived == False
def search_queries(cls, term, group_ids, include_drafts=False, limit=20):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why append _queries? It's a method on the Query class isn't it obvious that it searches for queries? And in the handlers code you still call .search...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, yeah, this is just a leftover from figuring out how to use the search method that the SearchQueryMixin adds to the query_class of the Query model. Will fix.

redash/models.py Outdated
from sqlalchemy.orm.exc import NoResultFound # noqa: F401
from sqlalchemy.types import TypeDecorator
from functools import reduce
from sqlalchemy_searchable import SearchQueryMixin, make_searchable, search
from sqlalchemy_utils.types import TSVectorType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I mind much about adding sqlalchemy_utils as a dependency, but doesn't sqlalchemy_searchable provide something for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the SQLAlchemy-Utils package is from the same author as SQLAlchemy-Searchable and contains a whole bunch of things.

The code behind TSVectorType is trivial relatively speaking, so I guess we could also just ship our own. The only risk I see with this is that we could trade one dependency less for having to keep an eye on changes in SQLAlchemy-Searchable that require changes in the TSVectorType class as well.

On the other hand, although quite out of scope of this PR, is the question if there are things in SQLAlchemy-Utils that may be useful for Redash in general, given its swiss-army-knife-ness. I'm fine with whatever way you want to go with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ok, let's use it. No need to find more places to use it, we can gradually introduce it where applicable.

nullable=True))

ss.sync_trigger(conn, 'queries', 'search_vector',
['name', 'description', 'query'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also update all existing queries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looking at the docs it says: "Updates all rows for given search vector by running a column=column update query for given table."

I've also manually added a bunch of queries against Redash's own Postgres database in the master branch, then switch to this branch, did the Alembic upgrade and then was able to use the search feature right away.

conn = op.get_bind()

ss.drop_trigger(conn, 'queries', 'search_vector')
op.drop_index('ix_queries_search_vector', table_name='queries')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a byproduct of ss.sync_trigger?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, although weirdly it raised a ProgrammingError exception when I tried roll it back, will look into this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an oversight when setting up SA-Searchable. The migration that I fixed in eb6966f now reflects adding an index for the search vector column. Another call to db.configure_mappers() was needed to have the mapper processor of SA-Searchable to correctly add the index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Another call to db.configure_mappers() was needed

It seems that it's at the top level of redash.models. So I guess the one in redash.cli.database is redundant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True indeed, good catch, removed it in 36aeef7.

@jezdez jezdez force-pushed the fulltext-search branch 2 times, most recently from 36aeef7 to 294cff0 Compare October 26, 2017 16:16
@jezdez
Copy link
Member Author

jezdez commented Oct 26, 2017

@arikfr Just added a bunch of tests for the more advanced search queries as described on https://sqlalchemy-searchable.readthedocs.io/en/latest/search_query_parser.html.

@arikfr
Copy link
Member

arikfr commented Nov 22, 2017

@jezdez this is finalyl ready to be merged, but I think I saw some fixes you did to this code in Mozilla's fork. Are they present here already or this need an update?

Thanks.

@jezdez
Copy link
Member Author

jezdez commented Nov 22, 2017

@arikfr They aren't in this PR yet, I'll get them in today and let you know. Thanks for reaching out again!

@arikfr
Copy link
Member

arikfr commented Nov 22, 2017

Super, thanks!

@jezdez jezdez force-pushed the fulltext-search branch 2 times, most recently from f80e088 to 388ec5e Compare January 8, 2018 20:46
@jezdez
Copy link
Member Author

jezdez commented Jan 8, 2018

@arikfr Okay, I've updated the PR with the recent things in our fork and updated the design to match the query-list and my-queries page as well. Does this work?

screen-shot-2018-01-08-22-48-58

@jezdez
Copy link
Member Author

jezdez commented Jan 8, 2018

@arikfr BTW, apologies for taking so long :(

@jezdez
Copy link
Member Author

jezdez commented Jan 10, 2018

@arikfr Any idea how we can retrigger the code climate check? It seems to be stuck.

jezdez added a commit to mozilla/redash that referenced this pull request Jan 10, 2018
@arikfr arikfr merged commit 691dc6a into getredash:master Jan 11, 2018
@arikfr
Copy link
Member

arikfr commented Jan 11, 2018

Thanks ! 👍

dairyo pushed a commit to KiiCorp/redash that referenced this pull request Mar 1, 2019
Add full text search for queries based on the Postgres tsvector type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants