Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BEFORE OCT 12 – pic.datamade.us board report is out of sync with Legistar #347

Closed
hancush opened this issue Sep 15, 2018 · 8 comments
Closed

Comments

@hancush
Copy link
Collaborator

hancush commented Sep 15, 2018

Board report 2018-0140 was slated for a meeting during the summer but was postponed. While the link we have on file in the bill documents table resolves to the correct file in Legistar, the URL generated by the full_text_document_url template tag still points to the July version on pic.datamade.us.

@hancush hancush added the bug label Sep 15, 2018
@hancush
Copy link
Collaborator Author

hancush commented Sep 15, 2018

FWIW, we have the correct version on file in our metro-pdf-merger S3 bucket.

@hancush
Copy link
Collaborator Author

hancush commented Sep 15, 2018

Also, to be clear, the correct version of the report comes back when you download the report from the board report and event pages. The only place we are showing the wrong report, is the PDF pane on the board report page.

@reginafcompton
Copy link
Contributor

reginafcompton commented Sep 17, 2018

Problem

In the OCD and Councilmatic database, the url for a Board Report points to the latest version of the pdf (generated here).

When Councilmatic needs to render a PDF, it visits https://pic.datamade.us/lametro/document/, where the property image cache does some work:

What's the issue? We use the Board Report's URL as the key, and this URL remains stable, even when the document that Legistar serves changes. The PIC would not know about such changes, since it already cached an earlier version.

Solutions

@hancush
Copy link
Collaborator Author

hancush commented Sep 17, 2018

@reginafcompton would it be too hamfisted / difficult to connect services, to update the cached image when the bill changes?

@reginafcompton
Copy link
Contributor

reginafcompton commented Sep 17, 2018

We need to ensure that the property-image-cache has the most up-to-date PDFs of board reports. An effective strategy for doing this: delete the old PDFs from the S3 bucket, whenever a bill gets updated (then, the document route will create a new entry in AWS, when someone visits a board report page on the Councilmatic site).

After consulting with @evz, a good solution entails devising a new management command that does the following:

  1. executes after import_data
  2. queries the Councilmatic database for newly updated bills – we can query the raw_billdocuments table. (n.b. This also contains new data, but that should not be an issue, since the delete function in S3 simply "does not remove any objects" if the bucket does not contain the specified key.)
  3. deletes the entries for those bills in the S3 bucket – possible with a single HTTP request

@shrayshray
Copy link
Collaborator

The logic we discussed for consistent treatment of reports and PDF rendering is to:

  1. Check whether "Not Viewable via InSite" is True or False. If True, stop/do not display. If False,
  2. Check report type (this step becomes necessary once the archive of pre-2015 board documents and Board Boxes is added to Legistar). If "Board Box", display report and PDF. If False,
  3. Check whether the report is on a published agenda. If true, display report and PDF. If false, stop/do not display.

@shrayshray shrayshray added this to the September 2018 issues milestone Sep 19, 2018
@reginafcompton reginafcompton changed the title pic.datamade.us board report is out of sync with Legistar BEFORE OCT 12 – pic.datamade.us board report is out of sync with Legistar Sep 25, 2018
@reginafcompton
Copy link
Contributor

@shrayshray - Councilmatic now has a script that will refresh the document cache, every time a bill or event changes. We'll review this script, merge it, and add it to the data import pipeline early next week.

For the PDF rendering, I'll add the logic you note – though that seems like its more related to this issue: #345. So, I'll keep track of any relevant updates there.

@reginafcompton
Copy link
Contributor

@shrayshray - I've added the script for refreshing the document cache to the Metro data pipeline! Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants