Pervasive lag issue with label/milestone changes in issues and PRs #78

jberkus · 2018-03-01T01:16:28Z

Lukasz,

If you look here:
https://k8s.devstats.cncf.io/d/IIUa5kezk/open-issues-prs-by-milestone?orgId=1&from=1509407831268&to=1511830631269&var-sig_name=All&var-sig=all&var-milestone_name=v1.9&var-milestone=v1_9&var-repo_name=kubernetes%2Fkubernetes&var-repo=kubernetes_kubernetes&var-full_name=Kubernetes

... it says that as of Nov 27, we had 90ish open issues against 1.9. However, if you look at the burndown report, we actually had 28 issues open on that date.

What's the reason for the extremely different counts?

lukaszgryglicki · 2018-03-01T05:17:28Z

Looking into that.
Will let you know what I found.

lukaszgryglicki · 2018-03-01T06:10:20Z

At first glance all seems OK.
There were 96 issues open at that time.
We're selecting SIG=All, so SIG is not taken into account.
Issues open at 2017-11-27 (repo=k/k, milestone=v1.9) were:

./runq x.sql {{to}} 2017-11-27
/---------+------+--------------------+--------------------+---------\
|issue_id |number|opened_at           |closed_at           |milestone|
+---------+------+--------------------+--------------------+---------+
|257517249|52444 |2017-09-13T20:51:41Z|                    |v1.9     |
|276181000|56242 |2017-11-22T19:26:59Z|2017-11-27T15:20:04Z|v1.9     |
|258657089|52683 |2017-09-19T00:06:23Z|                    |v1.9     |
|242791185|48893 |2017-07-13T18:32:20Z|                    |v1.9     |
|240004664|48396 |2017-07-02T11:28:01Z|                    |v1.9     |
|142012943|23233 |2016-03-19T01:27:59Z|                    |v1.9     |
|260790217|53084 |2017-09-26T22:30:35Z|2018-02-21T18:22:57Z|v1.9     |
|249544284|50495 |2017-08-11T05:50:42Z|                    |v1.9     |
|274977067|55967 |2017-11-17T19:48:34Z|2018-02-26T12:05:05Z|v1.9     |
|224607650|44975 |2017-04-26T21:40:07Z|                    |v1.9     |
|233429070|46934 |2017-06-04T11:57:01Z|2018-02-27T21:04:49Z|v1.9     |
|274039875|55768 |2017-11-15T05:33:05Z|                    |v1.9     |
|276180396|56241 |2017-11-22T19:24:28Z|2017-11-27T15:19:42Z|v1.9     |
|143491681|23479 |2016-03-25T12:42:12Z|                    |v1.9     |
|233673871|46983 |2017-06-05T18:41:13Z|                    |v1.9     |
|258951594|52745 |2017-09-19T20:22:04Z|                    |v1.9     |
|263574915|53548 |2017-10-06T21:20:01Z|2017-12-01T00:29:33Z|v1.9     |
|271678686|55194 |2017-11-07T01:23:14Z|2017-12-08T08:02:50Z|v1.9     |
|261503927|53236 |2017-09-29T01:05:05Z|2017-12-15T01:46:35Z|v1.9     |
|142002272|23225 |2016-03-18T23:33:23Z|                    |v1.9     |
|254753537|51825 |2017-09-01T21:16:18Z|                    |v1.9     |
|276173709|56239 |2017-11-22T18:57:52Z|2017-11-27T14:25:25Z|v1.9     |
|216652325|43607 |2017-03-24T05:02:08Z|                    |v1.9     |
|254492560|51746 |2017-08-31T23:04:25Z|                    |v1.9     |
|261236330|53188 |2017-09-28T08:37:00Z|                    |v1.9     |
|249961544|50599 |2017-08-14T08:17:01Z|2018-02-12T13:15:21Z|v1.9     |
|267271142|54318 |2017-10-20T18:55:07Z|                    |v1.9     |
|260853556|53109 |2017-09-27T05:45:02Z|                    |v1.9     |
|255366506|51965 |2017-09-05T18:26:03Z|2018-02-09T05:34:41Z|v1.9     |
|255714945|52039 |2017-09-06T19:22:00Z|                    |v1.9     |
|188847316|36666 |2016-11-11T20:47:13Z|                    |v1.9     |
|263211016|53497 |2017-10-05T17:48:56Z|                    |v1.9     |
|275491269|56091 |2017-11-20T20:38:46Z|2017-11-28T09:51:58Z|v1.9     |
|137393454|22212 |2016-02-29T22:10:23Z|                    |v1.9     |
|135339370|21657 |2016-02-22T07:23:55Z|                    |v1.9     |
|215912236|43486 |2017-03-21T23:45:56Z|                    |v1.9     |
|81132222 |8830  |2015-05-26T20:53:41Z|                    |v1.9     |
|247541182|50046 |2017-08-02T22:27:57Z|                    |v1.9     |
|250551665|50752 |2017-08-16T08:38:21Z|                    |v1.9     |
|243478249|49038 |2017-07-17T18:04:07Z|                    |v1.9     |
|266227745|54088 |2017-10-17T18:17:32Z|2018-02-26T17:43:10Z|v1.9     |
|276182336|56244 |2017-11-22T19:32:06Z|2017-12-14T01:09:00Z|v1.9     |
|275371939|56061 |2017-11-20T14:22:04Z|2017-11-27T08:05:51Z|v1.9     |
|254169942|51665 |2017-08-31T00:01:34Z|                    |v1.9     |
|256581518|52258 |2017-09-11T04:55:02Z|2017-12-14T05:25:53Z|v1.9     |
|236263512|47604 |2017-06-15T17:37:37Z|                    |v1.9     |
|246472365|49820 |2017-07-28T22:20:24Z|2017-12-08T02:27:38Z|v1.9     |
|89361997 |10045 |2015-06-18T18:20:39Z|                    |v1.9     |
|245021139|49480 |2017-07-24T09:28:06Z|2017-12-15T17:22:02Z|v1.9     |
|251517777|50986 |2017-08-20T22:03:06Z|2017-12-16T17:33:42Z|v1.9     |
|256085955|52123 |2017-09-07T22:12:47Z|                    |v1.9     |
|254492984|51747 |2017-08-31T23:07:09Z|                    |v1.9     |
|248283779|50215 |2017-08-07T00:50:53Z|2018-02-17T01:31:06Z|v1.9     |
|270182138|54904 |2017-11-01T03:22:32Z|2017-12-16T17:33:39Z|v1.9     |
|260361499|53006 |2017-09-25T17:51:13Z|                    |v1.9     |
|251925100|51099 |2017-08-22T11:28:55Z|2018-01-07T18:21:50Z|v1.9     |
|38003437 |489   |2014-07-16T17:08:14Z|                    |v1.9     |
|268441314|54574 |2017-10-25T15:24:34Z|2017-12-04T02:50:00Z|v1.9     |
|258641145|52678 |2017-09-18T22:31:42Z|2017-12-15T18:27:47Z|v1.9     |
|246135911|49734 |2017-07-27T18:54:20Z|                    |v1.9     |
|275776261|56155 |2017-11-21T16:28:18Z|2017-12-01T03:25:43Z|v1.9     |
|276707674|56357 |2017-11-24T22:49:37Z|2018-02-13T17:10:46Z|v1.9     |
|120494919|18233 |2015-12-04T21:59:00Z|2017-12-17T14:25:59Z|v1.9     |
|224611269|44976 |2017-04-26T21:55:49Z|2018-01-05T19:07:43Z|v1.9     |
|264036039|53615 |2017-10-09T21:46:11Z|                    |v1.9     |
|262875129|53451 |2017-10-04T17:52:05Z|                    |v1.9     |
|276181978|56243 |2017-11-22T19:30:50Z|2017-11-27T15:20:12Z|v1.9     |
|253886104|51594 |2017-08-30T05:55:42Z|2018-01-09T21:06:53Z|v1.9     |
|230543993|46255 |2017-05-22T23:11:29Z|                    |v1.9     |
|234274279|47131 |2017-06-07T16:49:20Z|2018-02-12T17:08:57Z|v1.9     |
|226416047|45385 |2017-05-04T21:37:36Z|2018-01-08T14:53:46Z|v1.9     |
|268341485|54551 |2017-10-25T10:13:48Z|2017-12-07T12:44:14Z|v1.9     |
|258902409|52735 |2017-09-19T17:31:12Z|                    |v1.9     |
|274574803|55892 |2017-11-16T16:17:09Z|2017-11-29T02:24:49Z|v1.9     |
|276656324|56348 |2017-11-24T16:07:23Z|                    |v1.9     |
|254420606|51726 |2017-08-31T18:04:53Z|                    |v1.9     |
|254487869|51745 |2017-08-31T22:36:42Z|                    |v1.9     |
|262507676|53395 |2017-10-03T17:04:35Z|2018-02-26T06:00:50Z|v1.9     |
|257343867|52412 |2017-09-13T11:16:14Z|2018-02-23T05:40:37Z|v1.9     |
|246459494|49817 |2017-07-28T21:06:56Z|2018-02-22T19:29:08Z|v1.9     |
|260418089|53020 |2017-09-25T21:17:33Z|                    |v1.9     |
|259375860|52827 |2017-09-21T05:02:02Z|                    |v1.9     |
|251782459|51049 |2017-08-21T21:59:02Z|                    |v1.9     |
|37915597 |473   |2014-07-15T19:08:45Z|2018-01-18T15:36:28Z|v1.9     |
|261237806|53189 |2017-09-28T08:42:04Z|                    |v1.9     |
|261504871|53237 |2017-09-29T01:12:42Z|2017-12-22T13:16:25Z|v1.9     |
|275025437|55978 |2017-11-17T23:20:50Z|2017-11-27T23:11:27Z|v1.9     |
|238030823|47943 |2017-06-23T03:25:34Z|2018-01-12T16:02:13Z|v1.9     |
|206339935|41161 |2017-02-08T22:07:11Z|2018-01-29T14:47:09Z|v1.9     |
|276697465|56355 |2017-11-24T20:53:55Z|2017-11-28T00:53:09Z|v1.9     |
|243130299|48968 |2017-07-14T22:49:30Z|                    |v1.9     |
|217737938|43783 |2017-03-29T01:20:33Z|2018-01-26T02:29:31Z|v1.9     |
|276170518|56235 |2017-11-22T18:45:33Z|2017-11-28T00:04:25Z|v1.9     |
|219746769|44118 |2017-04-05T23:42:57Z|                    |v1.9     |
|276243348|56262 |2017-11-23T01:02:31Z|2017-11-28T21:08:11Z|v1.9     |
|261426202|53221 |2017-09-28T19:01:16Z|                    |v1.9     |
\---------+------+--------------------+--------------------+---------/

I need to check them manually to see what is happening.

lukaszgryglicki · 2018-03-01T06:16:52Z

First of them: kubernetes/kubernetes#52444
It had a final milestone set here: kubernetes/kubernetes#52444 (comment)
But I don't see this issue's milestone now:

So the question is:
Is this issue v1.9 milestone or not?
I see adding this milestone but the final issue state is "no milestone".
This is confusing.
My dashboard checks the final state (open/closed), SIG label and milestone for a given day.
Database says that this issue had milestone v1.9 then.

lukaszgryglicki · 2018-03-01T06:20:26Z

This is the final issues list (links).
I'll check them and make some summary, for now I've tripple checked my SQLs and I think all is fine :/
|52444|
|56242|
|52683|
|48893|
|48396|
|23233|
|53084|
|50495|
|55967|
|44975|
|46934|
|55768|
|56241|
|23479|
|46983|
|52745|
|53548|
|55194|
|53236|
|23225|
|51825|
|56239|
|43607|
|51746|
|53188|
|50599|
|54318|
|53109|
|51965|
|52039|
|36666|
|53497|
|56091|
|22212|
|21657|
|43486|
|8830 |
|50046|
|50752|
|49038|
|54088|
|56244|
|56061|
|51665|
|52258|
|47604|
|49820|
|10045|
|49480|
|50986|
|52123|
|51747|
|50215|
|54904|
|53006|
|51099|
|489 |
|54574|
|52678|
|49734|
|56155|
|56357|
|18233|
|44976|
|53615|
|53451|
|56243|
|51594|
|46255|
|47131|
|45385|
|54551|
|52735|
|55892|
|56348|
|51726|
|51745|
|53395|
|52412|
|49817|
|53020|
|52827|
|51049|
|473 |
|53189|
|53237|
|55978|
|47943|
|41161|
|56355|
|48968|
|43783|
|56235|
|44118|
|56262|
|53221|

lukaszgryglicki · 2018-03-01T06:22:59Z

The second one has v1.9 milestone and is closed now, but it was closed 2017-11-27T15:20:04Z which is after 2017-11-27, so it was open at date to 2017-11-27 and had milestone v1.9 - so this issue is correct.

lukaszgryglicki · 2018-03-01T06:25:46Z

But the third one had v1.9 milestone that was later removed by bot: k8s-merge-robot removed this from the v1.9 milestone on Oct 9, 2017 before date to: 2017-11-27.

This may be the bug.
My code detects final milestone before "date to" but I see that I'm not detecting if milestone was removed later!

lukaszgryglicki · 2018-03-01T06:32:15Z

This one is very interesting.
It was on v1.9.
Then bot removed v1.9 (which I'm not detecting)
And after the date to it received v1.10.
So the final milestone before 2017-11-27 was v1.9, but bot removed it.
So I need to add detecting removed milestones, but still the first issue had a final v1.9 applied, there is no info on GitHub UI that milestone was removed, but issue has no milestone.
I need to analyse all events for this issue (on the GHA database).

lukaszgryglicki · 2018-03-01T06:52:07Z

The other (potential) issue can be:

Some final SIG label was applied (I'm taking the last SIG label before date to)
But after that last SIG label was applied, it could have been removed (still before date to)
Not a problem in this bug (we're taking about SIG: All in this case) but potentially can alter SIG values.

Detecting removed labels is handled here
Seems like I should do something similar here, for SIG and milestone.
This is quite complex and will take some time. I'll post my results here.
In all cases full data regenerate will be needed.

lukaszgryglicki · 2018-03-01T08:00:36Z

In first case I don't see any milestone removal on the GitHub UI, but indeed - it have milestone removed, database contains full history:

gha=# select e.created_at, i.milestone_id from gha_issues i, gha_events e where i.event_id = e.id and i.id = 257517249 order by e.created_at;
     created_at      | milestone_id 
---------------------+--------------
 2017-09-13 20:51:44 |             
 2017-09-13 21:44:09 |             
 2017-09-14 01:27:22 |             
 2017-09-14 12:35:33 |      2545392
 2017-09-18 18:27:15 |      2545392
 2017-09-18 18:30:52 |      2422217
 2017-10-05 22:42:03 |      2422217
 2017-10-07 08:14:47 |      2422217
 2017-10-08 08:25:02 |      2422217
 2017-10-09 18:49:32 |      2422217
 2017-10-11 08:24:05 |      2422217
 2017-10-12 17:29:14 |      2422217
 2017-10-14 08:21:59 |      2422217
 2017-10-15 08:24:21 |      2422217
 2017-10-16 08:25:36 |      2422217
 2017-10-18 00:09:01 |      2422217
 2017-10-18 19:33:45 |      2422217
 2017-10-20 08:26:12 |      2422217
 2017-10-22 08:23:17 |      2422217
 2017-10-23 08:26:04 |      2422217
 2017-10-24 08:27:20 |      2422217
 2017-10-25 08:29:44 |      2422217
 2017-10-27 08:24:29 |      2422217
 2017-10-30 08:31:40 |      2422217
 2017-11-01 08:21:30 |      2422217
 2017-11-02 08:23:19 |      2422217
 2017-11-04 08:19:49 |      2422217
 2017-11-06 08:21:14 |      2422217
 2017-11-07 08:24:28 |      2422217
 2017-11-08 14:23:48 |      2422217
 2017-11-08 14:24:32 |      2422217
 2017-11-08 14:28:46 |      2422217
 2017-11-08 15:12:58 |             
 2017-11-08 15:23:26 |             
 2017-11-09 00:43:22 |             
 2018-02-07 19:35:41 |             
(36 rows)

lukaszgryglicki · 2018-03-01T08:15:34Z

And this is the case when Issue had SIG label, but it was removed before date to.
And finally it has no SIG label, so it shouldn't be counted as any SIG (only in SIG: All which skips SIG labels processing).
There are about 14/~2800 such issues now (issues that had SIG label once, but no longer have it now.
So I will add detecting removed SIG labels too.

lukaszgryglicki · 2018-03-01T08:54:02Z

Not good, I see that when k8s robot is removing the milestone events records with milestone not yet removed. And the next event is one month later, and this is after date to, and that event is issue close.
Investigating more
SIG's removal is already handled, but I have major problems with Milestones...
I'm really scared that there may be no event recorded without milestone and only next event (which can happen even one year later) contains no milestone.

lukaszgryglicki · 2018-03-01T09:13:13Z

I need to go really deep - I'll download and save this event's JSON and see what data I can get from GitHub, because now I can see on the GitHub UI that "k8s-merge-robot removed milestone v1.9" but GHA database event is recorded with that milestone present, and the next event happens one month later (and that one has no milestone).

lukaszgryglicki · 2018-03-01T09:25:02Z

JSON does have milestone in "remove milestone" event.
Dead end.
The only hope seems to be "milestone/removed" label.
It is is applied at the date to time, we should ignore milestone.
Or possibly if applied after last milestone was set but before date to.

I'll try this approach now (in addition to standard milestone detect, which detects removed milestone but on the NEXT event, not removing event itself).

lukaszgryglicki · 2018-03-01T09:45:55Z

lukaszgryglicki · 2018-03-01T09:59:23Z

The problem is that for every event that modifies the milestone - we only have current milestone, not the new

So when somebody changes milestone from v1.9 to v1.10, we only have v1.9 milestone info, and on the next GitHub event we have v1.10, but next even can happen anytime, or there can be no next event at all
When somebody removes the milestone, we only know about this on the next event too.

This is probably why it now shows 36.. struggling more.

lukaszgryglicki · 2018-03-01T10:05:50Z

I will try the really crazy approach with finding milestones by always using next event on the same issue (if present).

lukaszgryglicki · 2018-03-01T10:33:15Z

Seems like this trick may work.
I will need something similar to PRs... not only issues.

lukaszgryglicki · 2018-03-01T10:56:09Z

I think this is OK now, see on the test server
I'll update prod too.

lukaszgryglicki · 2018-03-01T11:11:52Z

Prod also updated, I think this is very close to what we need, but due to special trick that tries to get milestone from next event (with current fallback) it is not ideal.

Trick with next event here.
Handling of milestone removal (which also uses milestone/removed label - saves life) here and then here.
Handling of SIG label removal here.

Let me know what do you think @jberkus

jberkus · 2018-03-01T17:43:04Z

Damn. Ok, that's pretty problematic. Have you filed a bug with Github?

jberkus · 2018-03-01T17:48:08Z

So, as I understand it, issues which were taken out of the milestone won't be removed from the count until another event happens to that issue? And the same with PRs, correct?

In the future milestone automation will de-facto remove this issue, but it would be nice if github fixed it.

jberkus · 2018-03-01T17:57:28Z

Question: do other, manual changes to labels generate events? Or only comments/open/close?

lukaszgryglicki · 2018-03-01T18:30:39Z

I've used trick, that uses next event. If there is no event, I fallback to current event.
The problem is when chnaging/removing milestones we have:

old milestone
new milestone (possibly null when removed).

The problem is that we only have "old milestone" while we should have two fields (both nullable):
milestone, new milestone.
Or if not possible then only new milestone, bot old.
Initially I thought that this is a bug in gha2db/devstats - but no, I've examined the exact JSON's and we really have no info about new milestone, and we can only take it from the next event.

Anyway, my tricks makes it work quite good atm imho.

jberkus · 2018-03-01T18:38:33Z

Right, what I'm saying is that the removal of the milestone doesn't, by itself, generate an event, correct?

So my question is: does the addition or removal of labels generate an event on its own?

BTW, checking issue burndown records, this means that issue counts are about 10-15% higher than history, and trail a day or two behind, which we'll want to note in the eventual documentation.

lukaszgryglicki · 2018-03-01T18:49:41Z

There is no separate event like

Milestone add/remove/change and we only have IssueComment event, which contains old milestone, the next event (any) will contain new milestone (next event related to this issue)
Add/Remove/change labels also doesn't generate event

This makes me wonder what if:

I've only add a label - there will be no GH event, so I will actually "see" that new label on the next event
I'm commenting on some issue and changing labels. There will be IssueComment event - but with old label set? new label set? (issue labels are kept in a separate table).

I'll check this tomorrow and report here.

lukaszgryglicki · 2018-03-01T19:09:47Z

I will open myself (and will check if that is true tomorrow).

There are no GH events for changing labels and milestones
All possible event types are:

gha=# select distinct type from gha_events;
             type              
-------------------------------
 PullRequestReviewCommentEvent
 MemberEvent
 PushEvent
 ReleaseEvent
 CreateEvent
 GollumEvent
 TeamAddEvent
 DeleteEvent
 PublicEvent
 ForkEvent
 PullRequestEvent
 IssuesEvent
 WatchEvent
 IssueCommentEvent
 CommitCommentEvent
(15 rows)

The problem with "old" milestone is because bot first comments (and this creates IssueCommentEvent with old milestone) and then changes milestone
If bot would first change milestone and then comment, we would have correct milestone information in this IssueCommentEvent
Same with labels, the final label set we get depends if we comment first and then modify labels (we get old label set then), if we change labels and then comment we would get correct labels set.
If we only change labels/milestone without commenting, we get new correct albels set/milestone on the next GH event referring to this issue
The only events that refer to issue are: IssueCommentEvent (commenting on the issue), IssuesEvent (change state: open, close)

Seems like we have a problem here.
Any ideas?

I'll confirm this 100% tomorrow.
@jberkus @dankohn ?

lukaszgryglicki · 2018-03-05T10:38:32Z

@jberkus what do you think about this:

I think the nice "workaround" would also be:

k8s-*-bot create additional comment after changing/updating milestone/label, something like "Note: milestone updated to abc", or "Note: label xyz removed"
prow creating similar comment after changing milestone/label.
any other automatic tool (if there is any) do the same.
That way we're in sync immediately.

lukaszgryglicki · 2018-03-14T11:42:28Z

Changed from bug to exchancement.
This si not a bug, we just don't have that data in GitHub archives, as already explained.

lukaszgryglicki · 2018-03-21T20:32:15Z

@jberkus any updates on this on the K8s side?

@dankohn @jberkus what do you think about spending few days researching new data source: GitHub API (in addition to already existing GHA & git)?

I think I can write yet another data source that will periodically query GitHub using API - just to get Issues/PRs current label state (that would eliminate need for another GHA event happening after somebody added the label from the GitHub UI).
This would certainly work for current issues/PR, I don't know if this is possible to query past state usingGitHub API - I think this is not possible, but I can double check it.
This separate data source would have to run in a separate process, because it can block when we're out of GitHub API points.
We're already querying GitHub API to get new releases tags (annotations/releases) but this is using very few GitHub API points. This runs every hour and uses just a few points out of 5000 available.
So new "labels" state API calls should always happen after the annotations part, because it can potentially run out of API points.
If we go this way, we may want to think again about GitHub OAuth token that is used by DevStats. Currently it uses my private GitHub OAuth token.

@dankohn can I investigate this task?

dankohn · 2018-03-21T20:39:32Z

Sure, but it seems easier to modify the mungebot to change labels in a way that records events in a way we can deal with.

…

-- Dan Kohn <dan@linuxfoundation.org> Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Wed, Mar 21, 2018 at 4:32 PM, Łukasz Gryglicki ***@***.***> wrote: @jberkus <https://github.com/jberkus> any updates on this on the K8s side? @dankohn <https://github.com/dankohn> @jberkus <https://github.com/jberkus> what do you think about spending few days researching new data source: GitHub API (in addition to already existing GHA & git)? - I think I can write yet another data source that will periodically query GitHub using API - just to get Issues/PRs current label state (that would eliminate need for another GHA event happening after somebody added the label from the GitHub UI). - This would certainly work for *current* issues/PR, I don't know if this is possible to query past state usingGitHub API - I think this is not possible, but I can double check it. - This separate data source would have to run in a separate process, because it can block when we're out of GitHub API points. - We're already querying GitHub API to get new releases tags (annotations/releases) but this is using very few GitHub API points. This runs every hour and uses just a few points out of 5000 available. - So new "labels" state API calls should always happen after the annotations part, because it can potentially run out of API points. - If we go this way, we may want to think again about GitHub OAuth token that is used by DevStats. Currently it uses my private GitHub OAuth token. @dankohn <https://github.com/dankohn> can I investigate this task? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <cncf/devstats#78 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC8MBiqoIfUxCE70Fk-4JwwUMxUGVyGkks5tgrjcgaJpZM4SXnGY> .

lukaszgryglicki · 2018-03-21T20:43:51Z

Yes, I've already suggested that.
But I think this won't be that easy to change k8s process to help Devstats.
Devstats is the tool to help K8s not the opposite :p
Ok, I'll do reasearch then and will see what I can do without touching current k8s workflow.

dankohn · 2018-03-21T20:47:23Z

mungebot is open source. They will accept pull requests if it doesn't slow anyone there down. Please research that as well.

…

-- Dan Kohn <dan@linuxfoundation.org> Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Wed, Mar 21, 2018 at 4:43 PM, Łukasz Gryglicki ***@***.***> wrote: Yes, I've already suggested that. But I think this won't be that easy to change k8s process to help Devstats. Devstats is the tool to help K8s not the opposite :p Ok, I'll do reasearch then and will see what I can do without touching current k8s workflow. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <cncf/devstats#78 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC8MBk9oQ3i9xqniAtDArUWz5OPDlTnXks5tgruLgaJpZM4SXnGY> .

lukaszgryglicki · 2018-03-21T20:53:08Z

OK will check this too.
Actually it needs some discussion - because I think sucha a change in mungebot would be quite easy to implement, but it needs acceptance from k8s people first.
Anyway, I'll postpone this a bit, because I've just received an email that I should add another project to DevStats.
So any feedback welcomed here, especially from @jberkus who originally detected this issue.

jberkus · 2018-03-24T02:53:32Z

@lukaszgryglicki there's two issues with using the API:

Kubernetes is constantly running out of API "tokens", so anything that requires a lot of additional API calls is just out.
I checked API data, and in the API it's also true that issues/PRs that have only had labels or milestones changed do not show up as "updated" in the API either. So we'd be in a position of polling all the issues/PRs in some way, which is a LOT of API calls.

Frankly, I think the best next step is to talk to someone at Github.

dankohn · 2018-03-24T03:05:00Z

We have a good relationship with GitHub and can ask for more API tokens. But could we please investigate first whether a small change to Mungegithub would provide all the data DevStats needs to avoid using the API. The API will always be more brittle that GitHub Archives data. Lukasz, can you state again the state that an issue can get in which is unknowable. I'd like to understand if we could just right a munge plugin that looks for that state and corrects it.

…

-- Dan Kohn <dan@linuxfoundation.org> Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Fri, Mar 23, 2018 at 10:53 PM, Josh Berkus ***@***.***> wrote: @lukaszgryglicki <https://github.com/lukaszgryglicki> there's two issues with using the API: 1. Kubernetes is constantly running out of API "tokens", so anything that requires a lot of additional API calls is just out. 2. I checked API data, and in the API it's also true that issues/PRs that have only had labels or milestones changed do not show up as "updated" in the API either. So we'd be in a position of polling all the issues/PRs in some way, which is a LOT of API calls. Frankly, I think the best next step is to talk to someone at Github. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <cncf/devstats#78 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC8MBg6UAC2tANeCtSoJLOkyqlssPNDMks5thbUsgaJpZM4SXnGY> .

jberkus · 2018-03-24T03:26:41Z

@dankohn it's not the technical difficulty, which is negligable.

It's that any method which involves increasing github notification traffic just to support devstats is a total nonstarter.

dankohn · 2018-03-24T10:58:31Z

I agree, but I'm trying to understand if the problem occurs in regular workflow or is a corner case.

…

-- Dan Kohn <dan@linuxfoundation.org> Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Fri, Mar 23, 2018 at 11:26 PM, Josh Berkus ***@***.***> wrote: @dankohn <https://github.com/dankohn> it's not the technical difficulty, which is negligable. It's that any method which involves increasing github notification traffic *just* to support devstats is a total nonstarter. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <cncf/devstats#78 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC8MBgXBCWia5rTmC4-y8EJ_A_jZICY3ks5thbzxgaJpZM4SXnGY> .

lukaszgryglicki · 2018-03-25T13:51:51Z

I almost have the working solution.
I'm using API to get all open issues state (since last hour to ask for possible smallest issues set).
It works.
I mean if I add the label to the issue without commenting and after this use GitHub API to get labels list for this issue - I can see the label just added.

And this is a quite fast and straightforward process - I've added 'ghapi2db' tool to support that, I only need to comment it.
When I detect that issue has different milestone or labels (GHA versus API) I'm creating artificial event with new state.

I can give you working solution tomorrow, without touching mungegithub at all and it will need about 200 API point/hour, which is a lot less than 5000.

mungegithub often adds or removes labels as a last operation, just after creating comment, so this situation happens often IMHO.

lukaszgryglicki · 2018-03-25T14:14:10Z

Actually I've just connected ghapi2db to our standard workflow (on the test server).

lukaszgryglicki · 2018-03-26T04:48:13Z

Seems like all is working OK, so data quality will increase all the time, starting from yesterday.
There is no way to aks GitHub API about issues state from the past, so the correct values will start from yesterday.
Not closing yet, but this should fix the lag issues.

jberkus · 2018-03-26T23:48:05Z

Wow, great work, @lukaszgryglicki

lukaszgryglicki · 2018-03-27T05:23:06Z

And we don't need to touch mungebot.
BTW @dankohn I've missed your question about this occuring in a regular case orcorner case.
This is rather a regular case, becaus emost label work is done by the bot, and bot usually reacts to devs comments to add/remove/modify labels/milestone.
So this is a regular case.

jberkus · 2018-03-27T17:01:33Z

Now, I do think it's worth talking about having prow write a log of its actions that devstats can access. That would give us the data WITHOUT adding to the github notification burden.

lukaszgryglicki · 2018-03-27T17:04:57Z

No problem for me anymore. I already have a tool that gets the ifno it needs.
But if prow will create such a log I can write another tool to get this data and make ghapi2db tool not needed anymore.
But as I said, I already have the date, and I'm far from API limits to get it, so no longer a problem for me.

lukaszgryglicki · 2018-03-29T13:13:00Z

Final version is on the test server, here: https://k8s.cncftest.io/d/22/open-issues-prs-by-milestone?orgId=1
It fixed three more things:

It excludes 10 sandbox repos, as requested here: https://github.com/cncf/devstats/issues/87 (this new data wasn't moved to prod yet, also as requested here: https://github.com/cncf/devstats/issues/87#issuecomment-376209110)
I've detected that I'm not detecting issues/PRs that were closed but reopened later
It uses advanced postgres features (window function, partitioning and with statements) to make efficient joins on the same table (to find most up to date issues state at given point of time - it used subselect before, so it was orders of magnitude slower, see SQL here: https://github.com/cncf/devstats/blob/master/metrics/kubernetes/open_prs_sigs_milestones.sql)

The same problem with detecting closed & reopened issues/PRs (and performance issues too) also happens for:

PRs labels (which also waits for the final labels set to display, see: https://github.com/cncf/devstats/issues/86#issuecomment-376348127)
PR workload
PR workload table

Now I'll work on the remaining dashboards - all on the test now.
I'll update prod when I have green light for it.
I've also updated Influx DB to v1.5.1 in the meantime and had a horror day yesterday with fixing issues due to this (18 hours of trial & errors).

@jberkus @dankohn

lukaszgryglicki · 2018-03-30T14:18:54Z

All problems described above are now fixed and gone.
The only thing remining it excludoing sandbox repos.
I'm currently doing this on the test but not on the prod.

jberkus · 2018-04-02T22:04:38Z

This looks good to me, I've thrown it in #devstats to see if I can get more eyeballs on it. OK if you want to wait for a day just so more people can look for obvious glitches.

lukaszgryglicki · 2018-04-03T10:50:49Z

Currently we don't have any data newer than 2018-04-02 14:00 UTC, due to GitHub archives outage: https://github.com/cncf/devstats/issues/91

lukaszgryglicki · 2018-04-03T16:18:13Z

Outage fixed on the GHA side, DevStats has all the data again.

lukaszgryglicki · 2018-04-05T11:01:46Z

No longer blocked, now just need to confirm that it works ok.

lukaszgryglicki · 2018-04-14T05:44:47Z

I'm closing this, please reopen if you find lag/bug.

jberkus added priority/high bug labels Mar 1, 2018

lukaszgryglicki added the wip label Mar 1, 2018

lukaszgryglicki self-assigned this Mar 1, 2018

jberkus changed the title ~~Count issues with release by milestone~~ Pervasive lag issue with label/milestone changes in issues and PRs Mar 3, 2018

lukaszgryglicki added enhancement and removed bug labels Mar 14, 2018

lukaszgryglicki removed the blocked label Apr 5, 2018

lukaszgryglicki closed this as completed Apr 14, 2018

Pervasive lag issue with label/milestone changes in issues and PRs #78

Pervasive lag issue with label/milestone changes in issues and PRs #78

Comments

jberkus commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018 • edited Loading

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018 • edited Loading

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

jberkus commented Mar 1, 2018

jberkus commented Mar 1, 2018

jberkus commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

jberkus commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 1, 2018

lukaszgryglicki commented Mar 5, 2018

lukaszgryglicki commented Mar 14, 2018

lukaszgryglicki commented Mar 21, 2018

dankohn commented Mar 21, 2018 via email

lukaszgryglicki commented Mar 21, 2018

dankohn commented Mar 21, 2018 via email

lukaszgryglicki commented Mar 21, 2018

jberkus commented Mar 24, 2018

dankohn commented Mar 24, 2018 via email

jberkus commented Mar 24, 2018

dankohn commented Mar 24, 2018 via email

lukaszgryglicki commented Mar 25, 2018

lukaszgryglicki commented Mar 25, 2018

lukaszgryglicki commented Mar 26, 2018

jberkus commented Mar 26, 2018

lukaszgryglicki commented Mar 27, 2018 • edited Loading

jberkus commented Mar 27, 2018

lukaszgryglicki commented Mar 27, 2018

lukaszgryglicki commented Mar 29, 2018 • edited Loading

lukaszgryglicki commented Mar 30, 2018

jberkus commented Apr 2, 2018

lukaszgryglicki commented Apr 3, 2018

lukaszgryglicki commented Apr 3, 2018

lukaszgryglicki commented Apr 5, 2018

lukaszgryglicki commented Apr 14, 2018

lukaszgryglicki commented Mar 1, 2018 •

edited

Loading

lukaszgryglicki commented Mar 1, 2018 •

edited

Loading

lukaszgryglicki commented Mar 27, 2018 •

edited

Loading

lukaszgryglicki commented Mar 29, 2018 •

edited

Loading