-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Procedure for handling "cannot resolve" Sentry errors #24
Comments
+1 on reducing errors, it makes it very hard to use both sentry and the semaphor channel |
i would like to propose that we devise a way of creating a digest of things we cannot resolve, and logging it in one place, i.e., opencivicdata/scrapers-us-municipal#241, rather than logging each and every one of these instances as a sentry error. |
I like the idea of a digest. It correlates with what @fgregg proposes in point 2 here. That is, we'd capture these unresolved bill errors and then scrape Metro for just those bills. We'd have a log of what that special scrape does, as opposed to Sentry errors. |
I think that we can move to a digest, or even reducing the level of logging once we have an understanding of all the reasons why unresolved bills (and other thigns) appear. I don't think we are there yet. |
@fgregg I definitely agree that we need to get to the bottom of this problem, but I'm not sure that needs to happen at the expense of one of our primary channels of communication. 10+ often redundant error notifications on every scrape is a lot, especially when the scrapes are happening at an increased frequency on Fridays. Coupled with pretty crappy search functionality in Semaphor, it becomes way too easy to lose track of conversations. Is there a way we can reconcile the log level with our communication needs? What about a separate channel for pupa errors? |
If we know that something is not a problem, is the ignoring of those events in sentry sufficient. If not, why not? |
I too like the idea of separate channels, but I don't think it's just a matter of distinguishing between conversation and logging, since the pupa-cannot-resolve-errors stand to obscure other meaningful Councilmatic errors (e.g., from Miami, or I'd rather see a separate channel for the Pupa errors entirely, and then preserve the Councilmatic channel as it has been in the past. I also think we can ignore Pupa errors once (1) we made a note of the error in a relevant Github issue (see above), or (2) we can absolutely identify the error as not a problem. |
It may be that we have just not stemmed the tide of this class of error just yet, but I muted at least 15 cannot resolve errors Friday and it felt like at least that many more came in the next scrape to take their place. These felt urgent to resolve, because I knew the errors would just recur 20 minutes later and further clog the channel. I would estimate I spent about an hour on this quasi-urgent task and related context switching. I'm sure @reginafcompton lost some time on it, as well. In summary, I do not feel that muting alone addresses the problem, because it is time consuming and – so far – less effective than I would like at keeping the notifications at bay. Perhaps the number of errors will be reduced when we've spent the time to mute them all; but it seems like by that point, not being notified at all would be the same solution, except it wouldn't cost us the hours. To your point about redundancy, I would strongly prefer that alerts not be redundant. It becomes too easy to ignore them, and potentially miss a meaningful one. Moreover, we don't learn anything from redundant alerts, apart from that the error is still happening, which we can already assume, because we know it's often not self-resolving, and we haven't made a change to fix it. |
For the flooding issue, it seems like we can address that by changing the frequency of reporting to semaphor In my opinion @evz should not move the civicpro scrapers to a separate repo, since different people have responsibility for addressing those. We already have councilmatic channel, where councilmatic errors should be located. |
I updated the semaphor rule so that a "warning or error" level issue will only be reported once per 24 hours. critical errors will still be reported up to every 5 minutes. |
Right @fgregg - I meant "obscure other meaningful SCRAPER errors", not Councilmatic errors. I think that Semaphor update will make a difference. We also need to undo the change to LOGGING from Friday. #25 I can do that this morning. I am not sure, however, if we have an agreed upon step-by-step for dealing with these Pupa warnings. Does what I summarized above make sense? I think if we really want to understand the nature of these errors, then we'll need to think more about my suggested (2b). |
I checked today's batch of "cannot resolve" errors against Legistar: none of them were present in the API. I propose that we make a consolidated list of these bills (we can take a look at the scraper logs to get past errors) and send it to Metro. We need their help to determine if these bills: Then, we can make a plan for resolution. I can pull together a list today and send it to Metro. |
Do we understand why we only saw them alerted today?
…On Wed, Jul 25, 2018 at 9:24 AM Regina Compton ***@***.***> wrote:
I checked today's batch of "cannot resolve" errors against Legistar: none
of them were present in the API.
I propose that we make a consolidated list of these bills (we can take a
look at the scraper logs to get past errors) and send it to Metro. We need
their help to determine if these bills:
(1) are private and will remain private (in which case no action from us
is needed);
(2) are private and will be come public;
(3) are something else....
Then, we can make a plan for resolution.
I can pull together a list today and send it to Metro.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAgxbfXVs6bgHRefywU5urU3DMTH7r50ks5uKH-IgaJpZM4VYHqK>
.
|
I am not sure I understand your question @fgregg - can you say more? |
Is this the first time we got this alert from sentry? If so, why? We scrape the events every night, so shouldn't we have seen these before. |
that's actually an interesting question, @fgregg – looking at the frequency of occurrence charts in sentry (check em out!), it looks like these recur, but not every night. (it's possible the reason for this is totally obvious and i'm just not in the scraper headspace.) in any case, the ones from today aren't new. |
i have suspicion that this is worth figuring out. |
"Why did we not see these alerts more often?"Forest turned up the volume on Pupa logging on June 22; I turned down the scraper volume from July 20-24. Sentry thus had 29 days to alert us about unresolved bills. However, according to our Semaphor chat, we periodically and a little haphazardly ignored (several, but not all) alerts for a period of time (e.g., for a week, until Monday, etc.) on July 5, 6, 13, 19, 20. This would explain why these bills do not have consistent daily alerts, for example: A couple inconsistencies – I see that some bills do not have alerts until later in June...why is that? For many bills, we did not get alerts on July 3 or 4 - were they ignored? (@hancush do you recall?) |
Coming to terms with the Pupa errorsShelly gave us terrific information about some of these unresolved bills. (I gave her a large sample to look into.) Given this information and what we learned in this issue, we can distinguish four types of bills that raise the "Cannot resolve error":
Actionable steps
I am most concerned about classes (3) and (4), since these have caused issues in the past. On one hand, we've confronted this problem by aggressively scraping all bills on Fridays. However, this strategy slows the bill import time (from a maximum of 30 minutes to 45 minutes), since it takes about 22 minutes for the scraper to grab all bills. Alternative, more efficient strategies include:
In the short term, I prefer the first option (a windowed scrape of bills from the last year), since it's an easy adjustment. Ideally, I would like our scrapers to have access to private bills. Why? Then Pupa errors will carry greater meaning, whereas now, we just get a flood of errors on certain Fridays and think, "oh well, these must be private bills that will soon become public....la-te-dah." |
Metro tested switching bills from private to public using a few techniques. I outlined the results of those tests here. Specifically, I learned two meaningful pieces of information: (1) Publishing an agenda does not change the timestamp of the "Not viewable" bills, to which the agenda refers. The bills become public, but their (2) Manually unchecking the "Not viewable" box for a bill does change the Next stepsWith this knowledge, we have a few options, though one seems better than the others.
I think our best option is to write some code that scrapes bills related to newly published agendas, something like:
This logic could reside in the LAMetro bills scraper, though we could make some changes further upstream (assuming that this problem affects NYC and Chicago?). |
So getting access to private bills is off the table? |
could we check the edit: oh, haha, bills don't have agendas..... NEVER MIND ME. |
@fgregg - Omar is looking into it. Let's wait for his reply before acting on anything. |
From Metro: 'Unfortunately, we don’t know of a way to give the scraper access to the “Not Viewable on Insite" reports. Omar has asked Granicus about this in the past, and received back either “we’ll look into it” or no response at all.' |
Recently, we increased the level of logging to Sentry to help DataMade quickly identify data problems, before the client does.
What should we do with
cannot resolve pseudo id to Bill
warnings?Potential step-by-step:
(1) check if bills are in Legistar;
(2a) if they are, then add them to this issue: opencivicdata/scrapers-us-municipal#241 Ignore the Sentry error (since the error has been recorded in a Github issue).
(2b) if they are not, then they might be private, but will become public. So, keep an eye on it? contact Metro? ignore the Sentry error? I am not sure.....
I am also not sure if we still need this level of logging, given that we’ll be aggressively scraping all bills every Friday.
The text was updated successfully, but these errors were encountered: