Skip to content
This repository has been archived by the owner on Apr 14, 2023. It is now read-only.

Remove newsroom.smgov.net from tracked websites #57

Open
thekaveman opened this issue Sep 27, 2017 · 7 comments
Open

Remove newsroom.smgov.net from tracked websites #57

thekaveman opened this issue Sep 27, 2017 · 7 comments

Comments

@thekaveman
Copy link
Contributor

We launched www.santamonica.gov on September 22, which includes the newsroom functionality. On that date, we began redirecting newsroom URLs to the corresponding URLs on the newer site.

In the short-term, we can disable realtime reporting for newsroom.smgov.net.

In the long-term, we can completely remove newsroom.smgov.net. Since our longest reporting period is 90 days, the timeframe here is sometime after December 21, 2017.

thekaveman added a commit that referenced this issue Sep 27, 2017
@thekaveman
Copy link
Contributor Author

See also #56

@thekaveman
Copy link
Contributor Author

thekaveman commented Oct 21, 2017

@allejo: related to what I mentioned on the closure of #56. When I made ed16d7d and removed an old site, the aggregate WebJob started failing.

Removing the site from the _websites collection removes the key from the Jekyll-generated reports/variables.json file. Subsequent runs of the other WebJobs won't generate new data for the removed site. This is all expected 👍

However, data generated prior to removing the site is never cleaned up. When aggregate runs following the removal, it uses the contents of the data directory (a subdirectory for each agency) and the keys in reports/variables.json; since there is a mismatch, we get the error.

Two options I can think of: either use the keys from reports/variables.json exclusively, or have a separate cleanup WebJob that continuously deletes subdirectories of data that don't exist as keys in reports/variables.json. (I kind of like the former approach better than latter). Your thoughts?

@allejo
Copy link
Collaborator

allejo commented Oct 21, 2017

Ahhh that would make a lot of sense... Yea, I'm in favor of using reports/variables.json exclusively in the aggregate WebJob.

As for cleaning up old data, we could have a manual WebJob available to delete any old data that we could run every so often? Or we could tie that WebJob/script to be run on deployment as well.

@thekaveman
Copy link
Contributor Author

thekaveman commented Oct 21, 2017

Oh I like the idea of doing a clean on deployment! That plus moving aggregate to key off the reports/variables.json file should solve our current issue with removing sites and prevent stagnant data from sitting around forever.

@allejo
Copy link
Collaborator

allejo commented Oct 21, 2017

Should the change go into the feature/aggregate-script-46 branch so that can be revived/merged? Or do it in both branches (rewrite + master).

@thekaveman
Copy link
Contributor Author

Let's revive that thing and get it merged! I think I was supposed to review your changes, right?

@allejo
Copy link
Collaborator

allejo commented Oct 21, 2017

Yea, and I just need to confirm that the generated data is the same as with the current script.

allejo added a commit that referenced this issue Oct 21, 2017
Relying on the data in the filesystem is only reliable when working with
a clean slate. However, deleting websites will leave old data behind, so
instead of checking the filesystem, use Jekyll generated files for their
actual purpose: being the authoritative source of sites & reports.

Fixes #57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants