-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Open Syllabus Project totals to solr #8395
add Open Syllabus Project totals to solr #8395
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On first pass, this looks like it's in the right direction! Great work @RayBB
@mekarpeles great! Let me know what the next steps are. If I should send you or Drini the db file or if you will just produce it yourselves using the script in the folder. The original issue mentioned linking back to OSP. I don't have access to that data but would be happy to do that in the next PR if it's still desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Niiice! Excited to see this going into the solr 😊
olid_int = olid.replace("/works/", "").replace("OL", "").replace("W", "") | ||
try: | ||
# Connect to the SQLite database | ||
conn = sqlite3.connect(db_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Ray raises some potential perf concerns:
- Is this going to slow down our indexing/full reindexing?
- Is opening a new sqlite connection every time we call this fn a good idea? Or should we keep a long-lived connection open?
- This is ~1,000,000 rows of data ; Since it's effectively just a map of int -> int, storing the whole thing in mem is also potentially an option and would use ~8mb (4bytes for each int). Is this something we want?
I'm not an expert on sqlite; Maybe we just time this function to see how long it takes? But I think the we should be fine. If we notice a perf issue we can easily turn this function off until we fix it!
b577602
to
0e871a9
Compare
Co-authored-by: Drini Cami <cdrini@gmail.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #8395 +/- ##
==========================================
- Coverage 16.64% 16.57% -0.08%
==========================================
Files 88 89 +1
Lines 4692 4712 +20
Branches 836 841 +5
==========================================
Hits 781 781
- Misses 3394 3411 +17
- Partials 517 520 +3 ☔ View full report in Codecov by Sentry. |
We don't want solr updater to crash if it's missing, and this fixes the test
fa3c17f
to
da59c50
Compare
The OSP_DUMP_FILE environment variable was only accessible in a different stage of the jenkins pipeline. The name `osp_totals.db` is pretty unique so shouldn't be too hard to find int he future.
@cdrini just took a look and all your changes make sense to me. 🚀 |
Sweet ! Will be released at the next full reindex: #8234 |
python scripts/solr_updater.py $OL_CONFIG \ | ||
--state-file /solr-updater-data/$STATE_FILE \ | ||
--ol-url "$OL_URL" \ | ||
--osp-dump "$OSP_DUMP_LOCATION" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdrini this is failing due to osp_dump
being a positional argument. A similar thing is happening in the reindex-solr
Makefile target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah darnit! Good catch
Closes #5969
Starts importing Open Syllabus Project totals to solr
Technical
Following the steps here.
Testing
input_directory
inscripts/open_syllabus_project_parser.py
to match unzipped folderPYTHONPATH=$(PWD) python3 scripts/open_syllabus_project_parser.py
- from rootosp_count
Screenshot
N/A
Next Steps
Mek will use this data to help our educational selection :)
Stakeholders
@mekarpeles