Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for range-based ID searching #819

Merged
merged 44 commits into from
Dec 4, 2024
Merged

Add support for range-based ID searching #819

merged 44 commits into from
Dec 4, 2024

Conversation

ItIsJordan
Copy link
Collaborator

@ItIsJordan ItIsJordan commented Aug 16, 2024

Adds support for range-based searching in the OpenSearch index using HEPData ID and Inspire IDs. Updates the search help page to display new search syntax.

NOTE: inspire_id is changed from "text", to "integer". A re-index is required.

Closes #791.

Adds functions to HEPDataQueryParser to support range-based queries: Determine if a query is a valid query, and to parse a valid query.
Adds a range-based publication_recid search functionality to the OpenSearch api search. Also modifies supporting merge_results function to support single-dict.
Adds new tests for HEPDataQueryParser.is_range_query and .parse_range_query
Adds an accidentally removed if statement that causes empty queries to not load data table results.
Add search term shorthand publication_recid truncated to recid in OpenSearch
Adds support for potential different range terms, defined in hepdata/config.py. Updates OpenSearch API search to use this.
Adds default descending order to range query search in opensearch api
Update tests to use correct range search syntax
Add query parser test case for publication_recid in the test _query_parser test.
Alters inspire_id mapping from text to integer to allow range-based searching. Also updates some testing and required configuration addition for range searching. (keeping it all in one commit)
Updates help text in search_help to display inspire ID range search guidance
Adds more tolerance to extra whitespace after the colon in range queries, and between the "TO" and range numbers.

Also updates parsing query to remove whitespace to accommodate this tolerance.
Adds more cases to test_query_parser_is_range_query which include valid and invalid extra whitespace cases.
Adds a comment to HEPDataQueryParser
Adds a specific test for range-searching on recid/inspire_id entries in the OpenSearch API
Updates test_search_range_ids to include a number of expected failure cases.
@coveralls
Copy link

coveralls commented Aug 16, 2024

Coverage Status

coverage: 83.611% (+0.06%) from 83.554%
when pulling 16d3e8c on search-range
into 50a2498 on main.

Modifies the range query parser functionality to fix a bug which disallowed range queries simultaneously with other queries. Improves errors.

Renamed is_range_query to get_range_queries as it now returns range query data only.
Adds improved testing for range query functionality. Adds a specific test for range queries alongside alternate search types.
Updates commenting in range query code
Adds commenting and improves clarity of get_range_query
Improves the test for test_query_parser_get_range_queries, adds some new cases and improves code clarity in the test.
@ItIsJordan
Copy link
Collaborator Author

I have made improvements and bugfixes to this branch to solve cases where using range searches disabled searching by other means, and to improve test robustness in the search functionality

Proper implementation has been figured out. Removes range parsing methods as raw range strings can be passed through to the search object.

This in turn allows proper use of logic in query strings with range queries.
Updates search testing to test current range search verification and updates range search testing cases. Removes tests for removed range parsing functions.
Add inspire_id and publication_recid as possible sorting fields by adding them to the possible_sorting_fields in display_results_options.html
Remove display of recid and inspire id search options from search "sort by" widget
Fixes a bug where range queries with publication_recid were not sorted by default. Now sets CFG_SEARCH_RANGE_TERMS in config to use the correct OpenSearch field keyword (recid). Also updates the sort_fields_mapping in the OpenSearch api to use correct keyword.
Add new recid and inspire_id keywords to search function docstring for the sort_field in the opensearch api.
Update verify_range_query_term docstring to clarify that query used is a parsed query.
Updates test_verify_range_query_term to account for using a parsed range query (with recid and not publication_recid). Updates
Renames test_composite_range_queries to test_range_queries and adds some extra testing for single-range keyword searching. Checks against correct expected sort order.
Adds the keyword: "inspire_id" as a range searchable keyword term in the config. This is used to verify the query keyword in a range search.
Adds "publication_recid" to the sort_fields_mapping functon to allow use as a sort field.
Fixes intended keyword searching of publication_recid for publication only searches and recid for pub/table searches.
Updates range search query testing to match updated changes and add some new changes to the parsing
Update range test cases to include fixes for default range search ordering and correct use of recid vs publication_recid
Updates the range query parser test to match current functionality/changes
Adds more test cases to the test_range_queries test function, to include more boundary testing and differentiation between recid and publication_recid.
Copy link
Member

@GraemeWatt GraemeWatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks! Merging now.

@GraemeWatt GraemeWatt merged commit b6e9df9 into main Dec 4, 2024
7 checks passed
@GraemeWatt GraemeWatt deleted the search-range branch December 4, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

search: allow searching in a range of INSPIRE IDs or publication record IDs
3 participants