Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database update script #648

Merged
merged 18 commits into from
May 27, 2020
Merged

database update script #648

merged 18 commits into from
May 27, 2020

Conversation

jmensch1
Copy link
Contributor

@jmensch1 jmensch1 commented May 24, 2020

Fixes #625

This enhances our ingestion process with an update script. Instead of resetting the database and downloading all of the Socrata data every night, we can just update the DB with the new stuff from Socrata. The update should take a few minutes, and the server will continue to function normally while the DB is updating.

I was going to run this as a cron job every night, but it turns out Heroku automatically restarts our app every 24 hours, so I tied the update script to the restart. Whenever the server starts, it will check to see if it's been at least 24 hours since the last update, and if so, it will run the update in the background.

The same check will be run locally, so our local DBs will stay up to date with Socrata as well.

I changed the structure of the DB a little bit, so unfortunately everyone's going to need to repopulate their local DBs. But I included a temporary migration script that will do that automatically when you run docker-compose for the first time after merging this code. The migration only includes 2020, which I figure is enough for dev purposes and doesn't take too long.

UPDATE 5/26:

I refactored the ingestion code into a proper database module that handles db operations with a clean api, for example:

import db
db.reset()
db.info.years()     // []                                    
db.requests.add_years([2018, 2019, 2020]) 
db.info.years()      // [2018, 2019, 2020]
db.info.last_updated()    // last updated timestamp
db.info.rows()   // prints row counts by year 
db.info.tables()  // prints size of all tables and views
db.requests.update()   // updates all years in db 

This makes it possible to add years individually rather than in a multi-hour script. Also provides useful info about the current contents of the db.

There are also two new diagnostic endpoints that will be helpful in monitoring our production environment:

/database -- a report on the current contents of the database 
/system -- a report on the system, including memory usage

Finally, there are a bunch of little cleanup things -- removing old endpoints, deleting unused functions and files, etc.

  • Up to date with dev branch
  • Branch name follows guidelines
  • All PR Status checks are successful
  • Peer reviewed and approved

Any questions? See the getting started guide

@jmensch1 jmensch1 requested review from adamkendis and sellnat77 and removed request for adamkendis May 24, 2020 01:10
@jmensch1 jmensch1 added this to the 311-Data - Beta milestone May 24, 2020
@jmensch1 jmensch1 added the Do Not Merge Do not merge this PR, it is likely blocked by something else label May 26, 2020
@jmensch1 jmensch1 removed the Do Not Merge Do not merge this PR, it is likely blocked by something else label May 26, 2020
Copy link
Member

@adamkendis adamkendis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great. Diagnostic endpoints are 👍

resource.getrlimit(getattr(resource, kind))
))

return {kind: report(kind) for kind in [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even know what most of these resources are but this seems like a really useful util, particularly if we decide to test out different backend hosting options.

Comment on lines +41 to +55
# fix North Westwood
res = exec_sql(f"""
UPDATE stage
SET nc = 127
WHERE nc = 0 AND ncname = 'NORTH WESTWOOD NC'
""")
log(f'\tFixed nc code for North Westwood NC: {res.rowcount} rows')

# fix Historic Cultural North
res = exec_sql(f"""
UPDATE stage
SET nc = 128
WHERE nc = 0 AND ncname = 'HISTORIC CULTURAL NORTH NC'
""")
log(f'\tFixed nc code for Historic Cultural North NC: {res.rowcount} rows')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I wasn't aware of this.

@adamkendis adamkendis merged commit b6936c2 into dev May 27, 2020
@adamkendis adamkendis deleted the 625-BACK-UpdateScript branch May 27, 2020 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nightly update script
3 participants