Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid casting a tsvector column to text and back again #247

Closed
wants to merge 1 commit into from

Conversation

savef
Copy link

@savef savef commented May 19, 2015

When you have a tsvector column but want to search against both it and other text columns (without having to deal with triggers) the :tsvector_column option is no use as it causes everything in :against to be ignored. Also since a column's type is available I don't see why the need to specify something as a tsvector column. This change allows you to simply pass column names to :against, and if any of those are tsvectors then they will be used directly, otherwise they will be converted to tsvectors as before.

My data set is over 900 records and as well as the document I search 4 additional small text fields, also there is an postgres gin index being used. On a query that matched just 4 results the time improved from 40ms to 3ms. And on a query that matched 397 results the time improved from over 3000ms to 60ms.

I think this essentially solves the problem that the :tsvector_column option tries to solve, but automatically. It takes the columns in :against and if they are a tsvector it does not cast them (to text, and back to tsvector). I think the only slight difference is that with this commit the column will still be coalesced with "".

Here's what the relevant part of an example query looks like:
setweight(to_tsvector('english', coalesce("documents"."title"::text, '')), 'A') || setweight(to_tsvector('english', coalesce("documents"."author"::text, '')), 'B') || setweight(coalesce("documents"."content_tsv", ''), 'C')


Thanks for reading, if this is something you'll consider merging here are a couple of followup questions.

If you agree that this renders the :tsvector_column option obsolete is it worth removing that feature as part of this PR?

Do the changes here apply to either trigram or dmetaphone searching? I don't use those but would look at improving them too if the changes are relevant.

When you have a tsvector column but want to search against both it and other text columns (without having to deal with triggers) the :tsvector_column option is no use as it causes everything in :against to be ignored. Also since a column's type is available I don't see why the need to specify something as a tsvector column. This change allows you to simply pass column names to :against, and if any of those are tsvectors then they will be used directly, otherwise they will be converted to tsvectors as before.

My data set is over 900 records and as well as the document I search 4 additional small text fields, also there is an postgres gin index being used. On a query that matched just 4 results the speed improved from 40ms to 3ms. And on a query that matched 397 results the speed improved from over 3000ms to 60ms.
@savef
Copy link
Author

savef commented Jun 16, 2015

@nertzy Hey, would it be possible for me to get any feedback on this? Thanks!

@savef
Copy link
Author

savef commented Aug 24, 2015

I want this to point to a different branch now, I'll open a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant