Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindex documents into a new Solr on OJF #2222

Closed
3 tasks done
cdrini opened this issue Jul 19, 2019 · 8 comments
Closed
3 tasks done

Reindex documents into a new Solr on OJF #2222

cdrini opened this issue Jul 19, 2019 · 8 comments
Assignees
Labels
Affects: Server Issues with the server (olweb) or its plugins. [managed] Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Priority: 2 Important, as time permits. [managed] State: Work In Progress This issue is being actively worked on. [managed] Type: Subtask of Epic A subtask that is part of the work breakdown of an epic issue (see comments). [managed]

Comments

@cdrini
Copy link
Collaborator

cdrini commented Jul 19, 2019

Subtask of #1067

  • Create a Solr environment on server.openjournal.foundation
  • Index latest openlibrary dump into OJF Solr
  • [-] Replay missing edits from Infobase onto OJF Solr until up-to-date (partially done)
    • Skipping: should be done once on ol-solr0
  • Test OJF Solr by linking dev.openlibrary.org to OJF Solr
@cdrini cdrini added the Type: Subtask of Epic A subtask that is part of the work breakdown of an epic issue (see comments). [managed] label Jul 19, 2019
@cdrini cdrini self-assigned this Jul 19, 2019
@viragumathe5
Copy link

Can I work on this issue

@supercontracts
Copy link
Collaborator

@cdrini could guide you for any help in the subtasks :)

@viragumathe5
Copy link

So @cdrini can I collaborate with you ??

@cdrini
Copy link
Collaborator Author

cdrini commented Sep 9, 2019

Hey @viragumathe5 ! Unfortunately this task is already underway (adding the WIP label!). There's already a pretty long backlog of related changes enqueued (#1843, #2246), so I can't think of a way to add you to this task :(

What type of things are you interested in? I'm sure we can find a good issue for you work on :)

@viragumathe5
Copy link

No problem at all I just ask for the collaboration if required

I would like to contribute in any way like Documentation, CodeBase, etc
unable to do designing stuff :)
I feel lucky to work for Internet Archives
Thank You

@cdrini cdrini added the State: Work In Progress This issue is being actively worked on. [managed] label Sep 9, 2019
@cdrini
Copy link
Collaborator Author

cdrini commented Sep 9, 2019

If you'd like a small task, these would be good:

If you want something larger, this would be good:

@xayhewalo xayhewalo added Affects: Server Issues with the server (olweb) or its plugins. [managed] Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Priority: 2 Important, as time permits. [managed] labels Nov 22, 2019
@mekarpeles mekarpeles added the Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] label Dec 18, 2019
@cdrini
Copy link
Collaborator Author

cdrini commented Feb 14, 2020

Reindex complete; here are the numbers (using 2020-01-31 dump; and querying 02-14 solr for "before" values)

Type # in postgres # in old solr # in new solr psql diff solr diff
Works 18891263 16934104 18891032 -231 1956928
Orphans 3117594 2093485 3115125 -2469 1021640
Authors 7247819 6982935 7247631 -188 264696
Subjects 0 1514064 1514068 1514068 4

@cdrini
Copy link
Collaborator Author

cdrini commented Mar 4, 2020

Reindex complete; here are the numbers (using 2020-02-29 dump; and querying 03-03 solr for "before" values)

Type # in postgres # in old solr # in new solr psql diff solr diff
Works 18895253 16937045 18895021 -232 1957976
Orphans 3116995  2093378 3114527 3114527 1021149
Authors 7248307 6983408 7248115 -192 264707
Subjects 0 1514064 1514068 1514068 4

-> 3.2M records will be made visible 🎉 Next step: #1067

@cdrini cdrini added this to the Active Sprint milestone Mar 4, 2020
@cdrini cdrini closed this as completed Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects: Server Issues with the server (olweb) or its plugins. [managed] Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Priority: 2 Important, as time permits. [managed] State: Work In Progress This issue is being actively worked on. [managed] Type: Subtask of Epic A subtask that is part of the work breakdown of an epic issue (see comments). [managed]
Projects
None yet
Development

No branches or pull requests

5 participants