Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Dump keys sometimes have wrong format #6348

Closed
cdrini opened this issue Mar 30, 2022 · 1 comment · Fixed by #6349
Closed

Data Dump keys sometimes have wrong format #6348

cdrini opened this issue Mar 30, 2022 · 1 comment · Fixed by #6349
Assignees
Labels
Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Module: Data dumps Priority: 1 Do this week, receiving emails, time sensitive, . [managed] Type: Bug Something isn't working. [managed]

Comments

@cdrini
Copy link
Collaborator

cdrini commented Mar 30, 2022

The format of some of the keys is wrong though. E.g. instead of /authors/OL123A, it sometimes lists /a/OL123A
This looks like it was a regression between the 2021-10-13 data dump and 2021-11-30 data dump

Evidence / Screenshot (if possible)

$ curl -L 'https://archive.org/download/ol_dump_2022-03-02/ol_dump_authors_2022-03-02.txt.gz' | zcat | grep -F 'OL3965446A' | head -n1
/type/author    /authors/OL3965446A     1       2008-04-30T20:50:18.033121      {"name": "I. Meyerson", "last_modified": {"type": "/type/datetime", "value": "2008-04-30 20:50:18.033121"}, "key": "/a/OL3965446A", "type": {"key": "/type/author"}, "id": 16613668, "revision": 1}

$ curl -L 'https://archive.org/download/ol_dump_2022-03-24/ol_dump_authors_2022-03-24.txt.gz' | zcat | grep -F 'OL3965446A' | head -n1
/type/author    /authors/OL3965446A     1       2008-04-30T20:50:18.033121      {"name": "I. Meyerson", "last_modified": {"type": "/type/datetime", "value": "2008-04-30 20:50:18.033121"}, "key": "/a/OL3965446A", "type": {"key": "/type/author"}, "id": 16613668, "revision": 1}

$ curl -L 'https://archive.org/download/ol_dump_2021-11-30/ol_dump_authors_2021-11-30.txt.gz' | zcat | grep -F 'OL3965446A' | head -n1
/type/author    /authors/OL3965446A     1       2008-04-30T20:50:18.033121      {"name": "I. Meyerson", "last_modified": {"type": "/type/datetime", "value": "2008-04-30 20:50:18.033121"}, "key": "/a/OL3965446A", "type": {"key": "/type/author"}, "id": 16613668, "revision": 1}

$ curl -L 'https://archive.org/download/ol_dump_2021-10-13/ol_dump_authors_2021-10-13.txt.gz' | zcat | grep -F 'OL3965446A' | head -n1
/type/author    /authors/OL3965446A     1       2008-04-30T20:50:18.033121      {"name": "I. Meyerson", "last_modified": {"type": "/type/datetime", "value": "2008-04-30T20:50:18.033121"}, "key": "/authors/OL3965446A", "type": {"key": "/type/author"}, "revision": 1}

Proposal & Constraints

Related files

Stakeholders

@cdrini cdrini added Type: Bug Something isn't working. [managed] Priority: 1 Do this week, receiving emails, time sensitive, . [managed] Module: Data dumps Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] labels Mar 30, 2022
@cdrini cdrini added this to the Active Sprint milestone Mar 30, 2022
@cdrini cdrini self-assigned this Mar 30, 2022
@cdrini
Copy link
Collaborator Author

cdrini commented Mar 30, 2022

P1 because blocking #6119 and because it likely breaks anyone's workflow who's using the data dump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Module: Data dumps Priority: 1 Do this week, receiving emails, time sensitive, . [managed] Type: Bug Something isn't working. [managed]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant