-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up records with future publication dates #4568
Comments
I would like to work on this issue. |
Can I start ? |
Okay, but it would be better if you choose one issue at a time and after filing a PR for that issue you can start working on it. |
Ok Thank You |
@Yashs911 Can you please help me to solve this issue |
@Bhavna777 Actually, I don't know the root cause, so I don't know where we should start. As per internetarchive/openlibrary-librarians#1 and some other issues linked to this. I will suggest that we hide the publication year >= 2021 for the time being. |
Added to librarians repo for manual correction. internetarchive/openlibrary-librarians#53 |
@seabelis Actually this issue is not just related to https://openlibrary.org/search?q=mark&mode=everything&sort=new but many books on OL have the wrong publication year so I was wondering if it was possible to hide publications year > 2021 |
But it will create problem in the upcoming years. |
By 2021 I meant we can use Current Year function |
I'm not the person to decide, but I'd prefer to delete the incorrect data than to hide it. |
@scottbarnes can you confirm whether this can be closed now re: 9999? |
I'm re-purposing this issue to clean up works that have future dates. https://openlibrary.org/query.json?type=/type/edition&publish_date~=9999*&limit=1000 or https://openlibrary.org/search?q=first_publish_year%3A%5B2025+TO+*%5D&mode=everything&sort=new Proposal
|
It may be helpful to keep a record of items we've so modified in case we later want to go back and, for example, reimport them or otherwise modify them further, and this way it will be easy to identify the ones from which we've removed |
@hornc notes that he is planning on removing all the |
There are about 5,868 editions with publish year 9999, and another 15,707 with publish years after 2025 but not 9999. Flipping through them it's unclear why exactly they have these weird dates and whether they should be deleted 😕 I think fixing the 9999 set is a good first stab. Would you be able to keep a list of the editions your script edits, and upload it to the issue? We might want to do further investigation on these editions later, and having a way to find them would be useful! |
One cause of the https://openlibrary.org/books/OL45340001M/%CA%BBAlimi_aman_jo_Islami_manshur?m=history and I'll see if there is a way to easily add the correct dates as a go, and look at patching the MARC import hole. --> See PR: #8448 |
@mekarpeles I believe all the |
A lot of the remaining future dates are simply spam: e.g. https://openlibrary.org/search?q=first_publish_year%3A%5B2025+TO+*%5D+Customer+Service+number&mode=everything&sort=new and And there are other variations |
Evidence / Screenshot (if possible)
Many works have wrong year of publication (Like 9999, 2049, 2040....)
See: https://openlibrary.org/search?q=publish_year%3A%5B2025+TO+*%5D
Relevant url?
https://openlibrary.org/search?q=mark&mode=everything&sort=new
https://openlibrary.org/works/OL21132031W/Classical_Music_Picture_Book?edition=
https://openlibrary.org/works/OL21486637W/Making_Sense_of_Politics?edition=
Details
Proposal
Use
first_publish_year:[2025 TO *]
in solr, e.g. https://openlibrary.org/search.json?q=first_publish_year%3A%5B2025+TO+*%5D, to find future datesStakeholders
The text was updated successfully, but these errors were encountered: