Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make facets work with Solr 7 #342

Merged
merged 13 commits into from
Oct 16, 2018
Merged

Make facets work with Solr 7 #342

merged 13 commits into from
Oct 16, 2018

Conversation

whikloj
Copy link
Member

@whikloj whikloj commented Jul 9, 2018

JIRA Ticket:

What does this Pull Request do?

We (U of Manitoba) want to upgrade the supporting software for Islandora, specifically Solr. We are trying with Solr 7 and found that date faceting has been removed. Instead we must use Range faceting only.

This PR adds in support for range facets. The framework was there so I just tried to fill in the bits and pieces.

What's new?

By default I pass facet.date parameters also as facet.range as I could not see how to tell whether a facet was a date or range type in the query_processor. It did not seem to have a noticeable effect.

Inside I just adjusted the facet results, to account for the differences but re-use most of the code.

I am assuming we will be supporting all versions of Solr from 3 to 100 so this probably needs a lot of testing, but I am also not convinced most people will upgrade their Solr so maybe the primary focus is to ensure the current functionality is maintained.

How should this be tested?

This is tough, I could not just point FedoraGSearch at Solr 7 as it now throws a stacktrace.

dk.defxws.fedoragsearch.server.errors.GenericSearchException: Mon Jul 09 20:17:08 UTC 2018 updateIndex NumberFormatException numFound=nseHeader
{  "responseHeader":{    "status":0,    "QTime":0,    "params":{      "q":"*:*",      "rows":"0"}},  "response":{"numFound":21,"start":0,"docs":[]  }}; nested exception is: 
	java.lang.NumberFormatException: For input string: "nseHeader"
	at dk.defxws.fgssolrremote.OperationsImpl.updateIndex(OperationsImpl.java:118)
	at dk.defxws.fedoragsearch.server.GenericOperationsImpl.updateIndex(GenericOperationsImpl.java:556)
	at dk.defxws.fedoragsearch.server.RESTImpl.updateIndex(RESTImpl.java:313)
	at dk.defxws.fedoragsearch.server.RESTImpl.doGet(RESTImpl.java:149)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:620)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.fcrepo.server.security.servletfilters.FilterSetup.doFilter(FilterSetup.java:247)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.fcrepo.server.security.servletfilters.FilterSetup.doFilter(FilterSetup.java:247)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.fcrepo.server.security.servletfilters.FilterSetup.doFilter(FilterSetup.java:247)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.fcrepo.server.security.servletfilters.FilterSetup.doFilter(FilterSetup.java:247)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
	at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
	at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070)
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
	at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NumberFormatException: For input string: "nseHeader"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.parseInt(Integer.java:615)
	at dk.defxws.fgssolrremote.OperationsImpl.updateIndex(OperationsImpl.java:116)
	... 37 more

So I had to use my own indexer to get stuff into Solr, I didn't spend a lot of time debugging GSearch though so I might have missed a simple fix.

Anyways, my major concern is that this doesn't harm everyone else as I imagine the number of people upgrading their Solr is in the minority.

Additional Notes:

Any additional information that you think would be helpful when reviewing this PR.

Example:

  • Does this change require documentation to be updated? Probably, though from the user's side nothing has changed
  • Does this change add any new dependencies? no
  • Does this change require any other modifications to be made to the repository (ie. Regeneration activity, etc.)? no
  • Could this change impact execution of existing code? no

Interested parties

@Islandora/7-x-1-x-committers @DiegoPino @giancarlobi

Handle new Solr date type field

Add both date and range parameters
Copy link

@DiegoPino DiegoPino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments for discussion

@@ -209,15 +209,19 @@ class IslandoraSolrQueryProcessor {
$gap = $value['solr_field_settings']['range_facet_gap'];
// Add date facet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment here needs an update

@@ -232,6 +236,9 @@ class IslandoraSolrQueryProcessor {
$params_date_facets["facet.date.start"] = 'NOW/YEAR-20YEARS';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question here: seems like facet.range was introduced in Solr 3.3+, so is there really a need to maintain deprecated on solr 6.6 facet.date piece? can we just get rid of it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I just assume that someone somewhere is relying on a very old piece.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cool we can discuss this on our next committers call. Won't block this pull but good to have consensus. Probably more needed to be really Solr 7 compliant in the future

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost... wondering if we should instead move to a factory, to facilitate the use of different implementations for different versions of Solr... the version of Solr can be requested of Solr relatively easily, hitting:
http://whatever:port/solr/admin/info/system?wt=json, giving something like (added in indent=true to the query parameters for formatting):

{
  "responseHeader":{
    "status":0,
    "QTime":5},
  "mode":"std",
  "lucene":{
    "solr-spec-version":"4.10.4",
    "solr-impl-version":"4.10.4 1662817 - mike - 2015-02-27 16:43:07",
    "lucene-spec-version":"4.10.4",
    "lucene-impl-version":"4.10.4 1662817 - mike - 2015-02-27 16:38:43"},
[... a bunch of environment stuff ...]}

Would we then want to make IslandoraSolrQueryProcessor class just proxy out to whatever target implementation?

... Not sure... maybe going too far with abstraction fun, especially given this module supposedly just "supports" Solr 4.2+.

@@ -240,7 +247,7 @@ class IslandoraSolrQueryProcessor {
$default_sort = (variable_get('islandora_solr_facet_max_limit', '20') <= 0 ? 'index' : 'count');

$facet_sort_array = array();
foreach (array_merge($facet_array, $facet_dates) as $key => $value) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like something is missing here? i see foreach happens on facet_array now but i don´t see any changed logic that supports that change further up. I mean https://github.com/whikloj/islandora_solr_search/blob/5031f4ff93de89464cd3ec0e69831ee2a0dc9a04/includes/query_processor.inc#L201 is still there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See I added this back in PR-300 and I realized here that $facet_dates are already in $facet_array and this merge actually doesn't do anything.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense, but looks like another ticket to me. Not saying it is not good to fix since you are here, but good to document when/why this change happened somewhere. So $facet_dates is empty? Not needed at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with new ticket to resolve this https://jira.duraspace.org/browse/ISLANDORA-2271

@@ -697,7 +697,8 @@ class IslandoraSolrFacets {
public static function init($islandora_solr_query) {
self::$islandoraSolrQuery = $islandora_solr_query;
self::$facet_fields = isset($islandora_solr_query->islandoraSolrResult) ? $islandora_solr_query->islandoraSolrResult['facet_counts']['facet_fields'] : array();
self::$facet_dates = isset($islandora_solr_query->islandoraSolrResult) ? $islandora_solr_query->islandoraSolrResult['facet_counts']['facet_dates'] : array();
// XXX: isset() checking, as newer Solrs (7) won't return a value.
self::$facet_dates = isset($islandora_solr_query->islandoraSolrResult['facet_counts']['facet_dates']) ? $islandora_solr_query->islandoraSolrResult['facet_counts']['facet_dates'] : array();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous review, maybe we can just use ranges

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as above, but if people want me to try removing the date stuff I can certainly do that. Not sure if that should be part of this or separate though?

@@ -634,6 +634,7 @@ function islandora_solr_is_date_field($solr_field) {
$date_types = array(
'org.apache.solr.schema.DateField',
'org.apache.solr.schema.TrieDateField',
'org.apache.solr.schema.DatePointField',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Solr 5.x there are also other changes in field types relevant to dates and ranges. One in particular for Multi valued dates i care for is
https://lucene.apache.org/solr/6_4_0/solr-core/org/apache/solr/schema/DateRangeField.html

Copy link

@DiegoPino DiegoPino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jared. Looks good, concerns addressed except the thing with that array merge where i would love to see a ticket or some note here saying why it is being removed. Will test with Solr 4.10 and 5.5 later tomorrow and then approve (if i approve too fast i'm pretty sure someone will just merge, has happened to me a lot this week and i really want to test with date data). Sorry for the delay but Solr is corner stone of islandora. Thanks a lot

Copy link

@adam-vessey adam-vessey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm... not sure mapping ranges directly as dates is quite appropriate... there could be other data types for which range faceting might be useful; otherwise, a couple of little comments.

@@ -697,7 +697,8 @@ class IslandoraSolrFacets {
public static function init($islandora_solr_query) {
self::$islandoraSolrQuery = $islandora_solr_query;
self::$facet_fields = isset($islandora_solr_query->islandoraSolrResult) ? $islandora_solr_query->islandoraSolrResult['facet_counts']['facet_fields'] : array();
self::$facet_dates = isset($islandora_solr_query->islandoraSolrResult) ? $islandora_solr_query->islandoraSolrResult['facet_counts']['facet_dates'] : array();
// XXX: isset() checking, as newer Solrs (7) won't return a value.
self::$facet_dates = isset($islandora_solr_query->islandoraSolrResult['facet_counts']['facet_dates']) ? $islandora_solr_query->islandoraSolrResult['facet_counts']['facet_dates'] : array();
// Not in place yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is no longer relevant?

*/
public function processFacetRanges() {

$this->processFacetDates();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure this is appropriate, as range fields can cover things that are not dates... integers or other numerical data... but here, we do some pretty explicit date formatting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not disagreeing with what you are saying here. Just that some range formatting seems better than none. Once we have someone using this with Solr 6+ it will be easier to see the holes. One suggestion would be to make a field a facet_date if its a facet_range but on a islandora_solr_is_date_field() field.

@@ -232,6 +236,9 @@ class IslandoraSolrQueryProcessor {
$params_date_facets["facet.date.start"] = 'NOW/YEAR-20YEARS';
$params_date_facets["facet.date.end"] = 'NOW';
$params_date_facets["facet.date.gap"] = '+1YEAR';
$params_date_facets["facet.range.start"] = 'NOW/YEAR-20YEARS';
$params_date_facets["facet.range.end"] = 'NOW';
$params_date_facets["facet.range.gap"] = '+1YEAR';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure these defaults are appropriate... do we have to do some form of detection, to determine if date-related parameters are appropriate? Also, brings up the question of what would be a reasonable default, otherwise, for numerical data and the like.

$params_date_facets["facet.date"][] = $field;
$params_date_facets["facet.range"][] = $field;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculation of both where both are supported seems like an unnecessary strain on Solr.

@@ -232,6 +236,9 @@ class IslandoraSolrQueryProcessor {
$params_date_facets["facet.date.start"] = 'NOW/YEAR-20YEARS';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost... wondering if we should instead move to a factory, to facilitate the use of different implementations for different versions of Solr... the version of Solr can be requested of Solr relatively easily, hitting:
http://whatever:port/solr/admin/info/system?wt=json, giving something like (added in indent=true to the query parameters for formatting):

{
  "responseHeader":{
    "status":0,
    "QTime":5},
  "mode":"std",
  "lucene":{
    "solr-spec-version":"4.10.4",
    "solr-impl-version":"4.10.4 1662817 - mike - 2015-02-27 16:43:07",
    "lucene-spec-version":"4.10.4",
    "lucene-impl-version":"4.10.4 1662817 - mike - 2015-02-27 16:38:43"},
[... a bunch of environment stuff ...]}

Would we then want to make IslandoraSolrQueryProcessor class just proxy out to whatever target implementation?

... Not sure... maybe going too far with abstraction fun, especially given this module supposedly just "supports" Solr 4.2+.

@whikloj
Copy link
Member Author

whikloj commented Aug 13, 2018

@adam-vessey you bring up some good points. I was not considering using a numerical field as a range, so that would cause a problem. I think I can handle that by checking the field type to see if it is a date type before applying the date formatting. As for the idea of a factory, I do like that idea in general I'm just not sure if 1) I'm the best person to try and implement it and 2) how much deviation there is between the various versions of Solr (especially the subset that Islandorians use).

@whikloj
Copy link
Member Author

whikloj commented Oct 5, 2018

I've expanded on this a bit and hopefully cleaned up some concerns.

I have reverted using the processFacetRanges() and left it empty and instead push facet_range types that are also date fields straight into the processFacetDates() function. So when someone wants to make an integer facet range, they can write the code for that.

Alternatively if this seems too trouble some there is a much larger and definitely tougher PR (#345) which I created. I am trying to write a Solarium backend for it, but due to the way we let people mess around inside the IslandoraSolrQueryProcessor it is problematic.

Copy link

@adam-vessey adam-vessey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good... might've went and did a thing (more specifically, one commit; would want to firm up some other things before going further with it), but... appeared to do the trick... Both Solr 4.10.4 and 7.5...

Could easily live with private for these... (could always widen the scope later, should it be needed)... What do we think?

@@ -38,6 +38,8 @@ class IslandoraSolrQueryProcessor {
'',
);

private $solrVersion;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should possibly be protected instead of private... facilitate extensibility?

* @return bool
* Whether we know your Solr version and its below 6.
*/
private function solrHasDateFacets() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same deal, private vs protected.

* @var bool
* Whether it is.
*/
private $date_range = FALSE;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same deal, private vs protected.

@whikloj
Copy link
Member Author

whikloj commented Oct 9, 2018

@adam-vessey the specific commit does not seem to be accessible by me. Though by the first link that appears to be around getting GSearch to work with newer versions of Solr, yes?

@whikloj
Copy link
Member Author

whikloj commented Oct 9, 2018

As for protected versus private, I have no problem with going to protected. If we want to go to public then I'd want to add a getter/setter to make it easier if we need to alter the insides in future.

@adam-vessey
Copy link

@whikloj: Whoops on the commit... accidentally added a 0 in making the link (instead of the closing parenthesis... then added the closing parenthesis, I'm assuming?) Fixed, but also here: discoverygarden/gsearch@6e88084

Yeah, that's pretty much it... newer Java (for the Base64 stuff), newer Solr, and a bit of an improvement to allow for relative HREFs in XSLTs (which still needs to be put through its paces).

@bondjimbond
Copy link

@adam-vessey You approved this last week, still unmerged... is it good to go?

@adam-vessey adam-vessey merged commit aa9d0f7 into Islandora:7.x Oct 16, 2018
@whikloj whikloj deleted the 7.x-ISLANDORA-2201 branch October 16, 2018 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants