-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escaping colons for Solr - fields erroneously detected #84
Comments
Hi @AlexAddisonLN thanks for entering this. I'm not sure I got your case. Could you please expand a bit? |
Where a query has a colon in it, Solr is detecting it as a field and returning an error. The ratings file entry would look like this:
and the error looks like this:
In our case searches with colons are most likely because that's a document title in our data. It's not clear to me that we should necessarily remove punctuation altogether, as we may at some point have to deal with citations, which can have a wide range of punctuation in them. While we can do some work to escape punctuation in our rated queries, I was wondering if it would be appropriate to do this escaping in RRE instead, perhaps controlled by a command-line flag or a line in the pom file. |
Great, so I agree with you, we need to escape such content. Thanks again |
In that case, want me to work on it and submit a pull request when I think it's ready? I've been warned that it may not be possible or practical to fix it. I've also been told the same issue applies for ElasticSearch, although I don't have any personal experience of that. I'm also concerned that addressing it may be a breaking change, unless we put it behind a flag of some form. |
I'd be in favor of supporting escaping queries in RRE via configuration but default to having this functionality turned off. Lucene supports special characters for specifying various clauses and I think it'll cause headaches for users with such queries if we escape by default. |
I see - escaping every string that is injected into the template is not necessarily correct, as the user has freedom to put any string for any purpose in there. For simple purposes, like ours, escaping it is the right thing to do, but someone out there is guaranteed to have all of their stuff broken by this if it's turned on by default, and to make matters worse, it will be much harder for them to debug, because it's the complex cases that would break. I can see three ways that we could present this change:
I think my favourite would be a global flag to enable placeholder escaping, either in the pom.xml or the ratings file. |
Had some more discussion on this and we think a flag for Engine-query mode vs User-query mode would be a good way to define this. The Engine-query mode (default) would not alter the query in any way. The latter User-query mode would escape punctuation. This is pretty much the same as Alex's proposal but makes a formal definition on the use case. |
We keep running into failures where Solr thinks we're specifying a field. While we could escape our data to avoid this, it's not trivial and the likely case is that this is happening accidentally for other users as well.
Would it be desirable to escape colons for Solr by default?
Would it be more desirable to add a flag to specify that colons should be escaped for Solr?
Happy to put the work in, but not sure what would be the most desirable behaviour.
The text was updated successfully, but these errors were encountered: