-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle missing histories #300
Comments
Ideal to handle this at the data level. Similar to workaround for minutes in the scraper. |
To replicate this behavior in the scraper, we need to be able to query the Legistar API for events where a particular matter appears on the agenda (i.e., in an associated event item). Some reading, specifically this, has led me to believe something like this should work: Matter 6276 appears on this agenda, so I know there should be at least one result: http://webapi.legistar.com/v1/metro/events/1603/eventitems The requests are going through ok, but the responses are coming back empty. I emailed Metro to see if they have any insight! If we can't do this querying, I don't know if it's practical to do this in the scraper. Will revisit when it's not 5 p.m. |
Another thing is: How often are bills added to Legistar without a history? If we add an artificial history, do we want to clear it when an actual history is added? Or should it be added to the extras dict? (That's probably a better idea.) If this isn't practical at the scraper level (i.e., we can't query the Legistar API the way we need to), it might be something we can create during the post save hook for bills, when we'll have full access to the database via the ORM. |
We're going to proceed on this assuming that changes won't be made to the Legistar API that allow us to query it the way we'd need to, to calculate this value in the scraper. Apart from perhaps occurring at the wrong level of the code base, the big issue with our current approach is that it runs a heavy query every time a bill's last action date is needed, either in the UI or when updating or rebuilding the search index. Caching this value would lead to faster page load time and indexing operations. I propose replacing the ubuntu@ip-10-0-0-80:~$ grep "bill:" /tmp/lametro.log | grep 'noop$'
bill: 0 new 0 updated 2920 noop
bill: 0 new 0 updated 1 noop
bill: 0 new 0 updated 1 noop
bill: 0 new 0 updated 2 noop
bill: 0 new 0 updated 4 noop
bill: 0 new 0 updated 4 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 0 updated 1 noop
bill: 0 new 0 updated 2 noop
bill: 1 new 0 updated 5 noop
bill: 0 new 0 updated 6 noop
bill: 0 new 0 updated 7 noop
bill: 0 new 0 updated 7 noop
bill: 0 new 0 updated 8 noop
bill: 0 new 0 updated 5 noop
bill: 0 new 0 updated 5 noop
bill: 0 new 0 updated 5 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 0 updated 2 noop
bill: 0 new 0 updated 2 noop
bill: 0 new 0 updated 2 noop
ubuntu@ip-10-0-0-80:~$ grep "bill:" /tmp/lametro.log.1 | grep 'noop$'
bill: 0 new 5 updated 2908 noop
bill: 0 new 0 updated 1 noop
bill: 0 new 0 updated 1 noop
bill: 0 new 0 updated 1 noop
bill: 0 new 0 updated 2 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 0 updated 4 noop
bill: 0 new 0 updated 6 noop
bill: 0 new 0 updated 5 noop
bill: 0 new 0 updated 7 noop
bill: 0 new 0 updated 6 noop
bill: 0 new 0 updated 4 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 0 updated 3 noop
bill: 6 new 0 updated 6 noop
bill: 0 new 0 updated 11 noop
bill: 1 new 0 updated 12 noop
bill: 0 new 0 updated 14 noop
bill: 0 new 0 updated 12 noop
bill: 0 new 0 updated 7 noop
bill: 0 new 0 updated 7 noop
bill: 0 new 0 updated 5 noop
bill: 0 new 0 updated 4 noop
bill: 0 new 0 updated 4 noop
bill: 0 new 0 updated 2 noop
bill: 0 new 0 updated 2 noop
bill: 0 new 0 updated 1 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 0 updated 3 noop
bill: 0 new 5 updated 2915 noop On the plus side, re-calculating this attribute every time a bill is saved ensures that bills for which we'd previously spoofed an action date via agendas would be updated appropriately when a history item is added. Thoughts, @fgregg? |
Update: Hm, looks like the signal approach won't work after all. With this in mind, it seems like we need a few things:
A signals-based approach gets us the first and third things, but not the second. Setting the attribute as bills are accessed, like we do with packets, might also seem attractive, but it doesn't get us periodic updates. So it's starting to seem like we need to set this outside of the import cycle, e.g., in a management command, or skip caching and calculate it on the fly. A management command could be ok, but since we aren't running scrapes and downstream ETL in concert, there's still the potential for incomplete data. Any thoughts, @fgregg? Related to Metro-Records/la-metro-councilmatic#553, Metro-Records/la-metro-councilmatic#555. |
Addressed this in opencivicdata/pupa#329 and datamade/django-councilmatic#265. |
Some LA Metro Board Reports have missing histories, but we still need to show their last action in the councilmatic application.
Currently, this is handled by some complicated by some view-like code in the councilmatic app, but it should be handled, if practicable, in the data layer.
The text was updated successfully, but these errors were encountered: