Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documents: add more facets for documents search #2953

Merged
merged 1 commit into from
Aug 24, 2022

Conversation

vgranata
Copy link
Contributor

@vgranata vgranata commented Jun 9, 2022

  • Added more facets for documents search
  • Implemented post filters
  • This PR requires a document re-indexing

Co-Authored-by: Valeria Granata valeria@chaw.com

Why are you opening this PR?

Closes #2763

Dependencies

My PR depends on the following rero-ils-ui's PR(s):

How to test

Please test:

  • for the OR facets (listed in the US) it is possible to select multiple values of the same facet.
    Example: it is possible to select several values of Document type.
  • the results count and the facets counts change accordingly to the selection made.
  • a list of chosen filters is displayed. It is possible to remove a filter by clicking on the X next to the filter name.
  • when there are no results a banner and 0 results are displayed.

Note: an AND is applied between values of different facets.

@github-actions github-actions bot added dev: fixtures Fixtures data used for ils.test and ilsdev.test f: data migration Data migration from a legacy system or a previous version f: search labels Jun 9, 2022
@vgranata vgranata force-pushed the grv-2763-improve-facets branch from f7eccfd to 43af519 Compare June 12, 2022 19:18
@vgranata vgranata force-pushed the grv-2763-improve-facets branch 2 times, most recently from 1b85862 to 98dd50a Compare June 21, 2022 12:33
@vgranata vgranata marked this pull request as ready for review June 21, 2022 14:12
@vgranata vgranata requested review from jma, zannkukai and Garfield-fr and removed request for zannkukai June 21, 2022 14:13
Copy link
Contributor

@jma jma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this task is a bit complexe it would be great to add comments. You can also put more details in you commit message. This commit add AND OR facets which are quite complex. Adding tests to check AND OR combinations can be a good added value: this can be done during the PO tests to save time.

@@ -1956,30 +1958,67 @@ def _(x):
)
)
),
subject=dict(
subject_fiction=dict(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The facets fiction and no_fiction seems identical, facet fileter is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filters are dynamically added using the function _facet_filter.

"""Create a filter DSL expression."""
filters = []
filters_group = {}
for name, filter_factory in definitions.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments are welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @jma ; It's very complicated to understand how this method works. I will suggest a lot of comments, maybe with some exemple.
Additionally add docstring about method params.

q = Q('bool', should=filter_)
s = s.query(q)

if facet_name == 'subject_fiction':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike this approach: hard coding a facet filter name here: is it possible to create a facet filter here:
https://github.com/rero/rero-ils/pull/2953/files#diff-6be28a31924cc25a9c440ab46eedfa749b3b94ec7a6c5bb3021f76530ac95e4aL1959

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could explore the "filtered aggregation" : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html

Something like :

"aggs": {
    "subject_fiction": {
      "filter": { "terms": { "genreForm.identifiedBy.value": ['A027757308', 'A021097366'] } },
      "aggs": {
        "subject": { "term": { "field": "facet_subjects" } }
      }
    },
    "subject_no_fiction": {
      "filter": { "not": { "terms": { "genreForm.identifiedBy.value": ['A027757308', 'A021097366'] } } },
      "aggs": {
        "subject": { "term": { "field": "facet_subjects" } }
      }
    }
  }

Not tested, but I think it's a way to build you are expected. Additionally, some post_process serialization should be necessary to get "subject" into the main aggregation level.

Copy link
Contributor Author

@vgranata vgranata Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the OR filters defined in post_filters in the config.py file, each facet needs to be filtered to reflect the selection made by the user on other facets.
These filters are dynamically created by the function _facet_filter in the file facets.py and a nested aggregation is created in the function default_facets_factory before applying the filter.

The facets subject_fiction and subject_no_fiction have a filter independently of the other facets selection. These filters are added to the dynamically generated filters in the function _facet_filter.

This problem will be addressed in another PR.

@@ -565,8 +564,7 @@
"type": "object",
"properties": {
"value": {
"type": "text",
"index": false
"type": "keyword"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This require a document reindexing. Please mention this in your commit message!

@@ -115,18 +115,28 @@ def post_process_serialize_search(self, results, pid_fetcher):

# Aggregations process
if viewcode == global_view_code:
aggregations = results.get('aggregations', {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small comment is welcome.

"""Create a filter DSL expression."""
filters = []
filters_group = {}
for name, filter_factory in definitions.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @jma ; It's very complicated to understand how this method works. I will suggest a lot of comments, maybe with some exemple.
Additionally add docstring about method params.

for v in values:
urlkwargs.add(name, v)

return (filters, filters_group, urlkwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COSMETIC : parentheses aren't required ;-)



def _post_filter(search, urlkwargs, definitions):
"""Ingest post filter in query."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring for params

q = Q('bool', should=filter_)
s = s.query(q)

if facet_name == 'subject_fiction':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could explore the "filtered aggregation" : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html

Something like :

"aggs": {
    "subject_fiction": {
      "filter": { "terms": { "genreForm.identifiedBy.value": ['A027757308', 'A021097366'] } },
      "aggs": {
        "subject": { "term": { "field": "facet_subjects" } }
      }
    },
    "subject_no_fiction": {
      "filter": { "not": { "terms": { "genreForm.identifiedBy.value": ['A027757308', 'A021097366'] } } },
      "aggs": {
        "subject": { "term": { "field": "facet_subjects" } }
      }
    }
  }

Not tested, but I think it's a way to build you are expected. Additionally, some post_process serialization should be necessary to get "subject" into the main aggregation level.

Comment on lines 1992 to 1847
_('online'): online_or_terms_filter({
'electronicLocator.type': ['versionOfResource', 'resource'],
'holdings_type': ['electronic']
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you indent?

@vgranata vgranata force-pushed the grv-2763-improve-facets branch 2 times, most recently from f4fd1e6 to 9756ceb Compare June 29, 2022 07:13
@PascalRepond
Copy link
Contributor

PascalRepond commented Jun 30, 2022

PO tests

General

  • MD/08.08: Performance issues when enlapsing a facet. Especially 'library', 'subject', 'author'.
    • REP/19.08: tested ok!
  • WEP/09.08: Sometimes the selected filters are not translated on the public view, for example with this search
    • REP/19.08: Not-blocking and hard to fix! We should open a new issue.
  • NPR/08.08: Facet "Contained in" to be removed
    • the facet should be based on field partOf (on the title of the $ref document). Currently it is based on seriesStatement for performance purposes, which does not really satisfy user's needs.
    • REP/19.08: tested OK. open new issue for adding the partOf facet (after completion of https://github.com/rero/rero-ils/pull/3049/files it will be easier to do)
  • NPR/08.08: Facets to be renamed:
    • Audience > Intended audience
    • Genre > Genre, form
  • REP: In the professionnal view, there is a "1" by default that can't be disabled, even with no filters active!
    image
    • NPR/08.08: tested ok
  • NPR: sometimes, the active library filter is display with only the pid, not the library name
    • NPR/08.08: tested ok
  • BER/30.06 : The active filters are displayed on top of the page: today, it takes a lot of space, and push the result set to the bottom.
    Is it mandatory to display active filters ? because the user can see selected filters in the facets
    if we decide to keep active filters on top, is it possible to put them on the same line ?
    > NPR: I would say rather keep the same filter display, but don't push down the column of the search results.
    > REP: I agree with BER, I would use the full container width and display the active filters as spans up top. PO decision
    • NPR/08.08: tested ok
      image
  • Publication date UI:
    • REP: "Show" -> should be "Apply"
      • NPR/08.08: tested ok
    • REP: "Show" and "Cancel" should be btn-outline-primary
      GRV: the button to remove the filter should be called "Cancel" or "Clear filter" ?
      • NPR/08.08: tested ok
    • REP (nice-to-have): If a value is empty, the filter dosen't work. Empty field could be taken as = -9999 (from) or 9999 (to)
      • NPR/08.08: tested ok
    • NPR/30.06 (nice-to-have): when entering a year in facet publication year, it would be nice to press enter and apply the filter.
      • GRV: would you like these changes on the Publication date UI just for RERO-ILS or would you like them for SONAR too ?
      • NPR/08.08: tested ok
  • Library facet :
    • JD 19/08 : if no results, all the name appears with count = 0, please hide lines that have a count of 0.
      image

Online facet

  • CLT/BER/REP: on public and professional interface, online facet results do not include online ebooks : 8 results for online and 2200 for all
    • NPR/08.08: tested ok
      image
      image
  • REP: "Online" text should be -> "Online resources"
    • REP: PO decision: For me, this facet is incomplete. There should also be a way to also show only physical resources (most needed in libraries)!
      • NPR/08.08: tested ok
  • NPR/09.08: (Pro UI) Ideally, the 2 filters "Online resources"/"Physical resources" should be hidden like other facets before launching the request
  • MD (PO approved) 8/8: "Only online" should be "Online resources" and "Only library" should be "Physical resources".
    • REP 19.08: In line with the requested specs... However, I think this formulation makes the facet very unclear. The initial facet looks like it shouldn't show neither online nor physical resource (as both are unselected).
    • Moreover, when selecting both buttons, it shows only documents that are online AND physical! I think this is very confusing.
    • add a header "Only show" to the facet OR rename the filter to "Only online resources"/"Only physical resources"

Organisation public view

  • NPR/08.08: the facet "Contained in" (OR) has not the same behaviour in
    • the public global view or in the admin view (4 results)
    • the organisation view (2 results)
      • here it seems it considers only the last "contained in" value for the filtering
    • This is also the case of facet "Language" and probably all OR facets, when used in combination with the "Library" facet.
  • NPR/30.06: some important display problems in the public view of an organisation: active filters are not displayed, some facets are not working (library, publication year)
    • NPR/08.08: tested ok
      image
    • MD: In the public view, on an institutional view, when I filter on publication year I get an error 400 validation error. The display is also wrong.
    • NPR/08.08: tested ok
      image
    • MD: In the public view, on an institutional view, the filter stays even if I click on it to suppress (facet ok).
      image
    • NPR/08.08: tested ok

Other issues (not to be solved with the facets)

@rero rero deleted a comment from pronguen Jul 1, 2022
@vgranata vgranata force-pushed the grv-2763-improve-facets branch 4 times, most recently from 54bcc7b to eb16686 Compare July 27, 2022 19:23
@vgranata vgranata force-pushed the grv-2763-improve-facets branch from eb16686 to 52b916c Compare August 18, 2022 07:43
@github-actions github-actions bot added the f: data About data model, importation, transformation, exportation of data, specific for bibliographic data label Aug 18, 2022
@vgranata vgranata force-pushed the grv-2763-improve-facets branch 2 times, most recently from d531ace to c22d5de Compare August 18, 2022 14:26
* this commit requires documents re-indexing

Co-Authored-by: Valeria Granata <valeria@chaw.com>
@vgranata vgranata force-pushed the grv-2763-improve-facets branch from c22d5de to 37bc84e Compare August 23, 2022 10:02
@vgranata vgranata merged commit a479a30 into rero:staging Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev: fixtures Fixtures data used for ils.test and ilsdev.test f: data migration Data migration from a legacy system or a previous version f: data About data model, importation, transformation, exportation of data, specific for bibliographic data f: search
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve facets
5 participants