Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SolrProcessor.get_pub_year to get edition years #6038

Conversation

agmckee
Copy link
Contributor

@agmckee agmckee commented Jan 6, 2022

get_pub_year already supports ISO dates, as well as simple year strings, so it is an easy fix for the issue described in #6021. I don't believe this can have any negative side effects, it will simply extend support to ISO 8601 formatted dates without affecting dates currently indexed normally as get_pub_year falls back to re_year.search if re_iso_date doesn't match.

Closes #6021

fix

Technical

This is a fix as currently the document indexed by SOLR for works with only ISO dates doesn't have the "first_published_year" attribute, causing the UI issue described in #6021.

Testing

Add a work and edition with an ISO 8601 format date and look at the SOLR index for that work, you should notice the lack of "publish_year" and "first_publish_year" attributes. This affects worksearch, and the UI as described in #6021. After checking out the changes in this PR please reindex docker-compose exec web make reindex-solr and you should now see the aforementioned attributes for the respective work in SOLR.

Screenshot

Example of record with ISO 8601 formatted publish date.

example-record

SearchResultsWork macro displaying the year correctly.

example-searchresultswork-macro

Stakeholders

@mekarpeles @cdrini

@agmckee
Copy link
Contributor Author

agmckee commented Jan 6, 2022

Tests initially failed because (a) the tests currently have an ISO 8601 formatted date in them that is noted as not being covered by current functionality and (b) first version of code didn't handle the None values that sometimes arise, this is now rectified.
The pre-commit check appears to be failing for lots of PRs at present and is, I believe, unrelated.

@jimman2003
Copy link
Contributor

Yeah, another (unrelated) part of the code wasn't updated to the new syntax, hence the test fail

@jimchamp jimchamp added the Priority: 2 Important, as time permits. [managed] label Jan 24, 2022
@mekarpeles mekarpeles added Priority: 1 Do this week, receiving emails, time sensitive, . [managed] and removed Priority: 2 Important, as time permits. [managed] labels Feb 14, 2022
agmckee and others added 2 commits March 8, 2022 16:57
…lready supports ISO dates.

Update test as ISO date now supported. Filter out None values from pub_years set.
@cdrini cdrini force-pushed the support-iso-8601-date-for-first-published-year-when-indexing branch from 544b638 to b7214e7 Compare March 8, 2022 22:20
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thank you @agmckee ! And thank you @jimman2003 for helping review :)

I updated this to be a bit more lenient and also take years at the front since we often have them in the form eg 2004 May 19. The tests look not unreliable here, so not testing manually. Lgtm!

@cdrini cdrini merged commit 47b9461 into internetarchive:master Mar 8, 2022
@cdrini
Copy link
Collaborator

cdrini commented Mar 8, 2022

This has been merged into master, and any newly edited works will have their dates corrected! Once we do the next full reindex (#5827), all records will be updated and run with the new code here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 1 Do this week, receiving emails, time sensitive, . [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Date format isn't correctly read off edition.
5 participants