Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group reports only by bug hash when uniqueing #1121

Merged
merged 1 commit into from
Nov 14, 2017

Conversation

csordasmarton
Copy link
Contributor

No description provided.

@csordasmarton csordasmarton added enhancement 🌟 database 🗄️ Issues related to the database schema. labels Nov 14, 2017
@csordasmarton csordasmarton added this to the release 6.2 milestone Nov 14, 2017
@csordasmarton
Copy link
Contributor Author

I did some measurement on these queries:

-- getRunReportCounts
-- Unique, TIME: 1487ms
SELECT runs.id AS runs_id, runs.name AS runs_name, count(DISTINCT reports.bug_id) AS count_1 
FROM reports
LEFT OUTER JOIN files ON reports.file_id = files.id
LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
LEFT OUTER JOIN runs ON reports.run_id = runs.id
GROUP BY runs.id
ORDER BY runs.name

-- Non-Unique, TIME: 318ms
SELECT runs.id AS runs_id, runs.name AS runs_name, count(*) AS count_1 
FROM reports
LEFT OUTER JOIN files ON reports.file_id = files.id
LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
LEFT OUTER JOIN runs ON reports.run_id = runs.id
GROUP BY runs.id
ORDER BY runs.name

--getRunResultCount
-- Unique, TIME: 256ms
SELECT count(*)
FROM (
  SELECT reports.bug_id AS reports_bug_id 
  FROM reports
  LEFT OUTER JOIN files ON reports.file_id = files.id
  LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
  GROUP BY reports.bug_id
) as anon

-- Non-unique, TIME: 134ms
SELECT count(*)
FROM (
  SELECT reports.id AS id 
  FROM reports
  LEFT OUTER JOIN files ON reports.file_id = files.id
  LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
) as anon

-- getRunResults
-- Unique: TIME: 477ms
SELECT reports.id AS reports_id, reports.bug_id AS reports_bug_id, reports.checker_message AS reports_checker_message,
  reports.checker_id AS reports_checker_id, reports.severity AS reports_severity, review_statuses.bug_hash AS review_statuses_bug_hash,
  review_statuses.status AS review_statuses_status, review_statuses.author AS review_statuses_author, review_statuses.message AS review_statuses_message,
  review_statuses.date AS review_statuses_date, files.filename AS files_filename, files.filepath AS files_filepath 
FROM reports
LEFT OUTER JOIN files ON reports.file_id = files.id
LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
LEFT OUTER JOIN (
  SELECT anon_2.id AS id 
  FROM (
    SELECT max(reports.id) AS id, max(reports.severity) AS severity 
    FROM reports
    LEFT OUTER JOIN files ON reports.file_id = files.id
    LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id 
    WHERE review_statuses.status IN ('unreviewed') OR review_statuses.status IS NULL
    GROUP BY reports.bug_id
  ) AS anon_2
  ORDER BY anon_2.severity DESC
  LIMIT 500 OFFSET 0
) AS anon_1 ON anon_1.id = reports.id
WHERE anon_1.id IS NOT NULL


-- getCheckerCounts
-- Unique, TIME: 1523ms
SELECT anon_1.checker_id AS anon_1_checker_id, max(anon_1.severity) AS max_1, count(anon_1.bug_id) AS count_1 
FROM (
  SELECT max(reports.checker_id) AS checker_id, max(reports.severity) AS severity, reports.bug_id AS bug_id 
  FROM reports
  LEFT OUTER JOIN files ON reports.file_id = files.id
  LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
  GROUP BY reports.bug_id
) AS anon_1
GROUP BY anon_1.checker_id
ORDER BY anon_1.checker_id

-- Non-unique, TIME: 263ms
SELECT reports.checker_id AS reports_checker_id, reports.severity AS reports_severity, count(reports.id) AS count_1 
FROM reports
LEFT OUTER JOIN files ON reports.file_id = files.id
LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
GROUP BY reports.checker_id, reports.severity
ORDER BY reports.checker_id

-- getSeverityCounts
-- Unique, TIME: 287ms
SELECT anon_1.severity AS anon_1_severity, count(anon_1.bug_id) AS count_1 
FROM (
  SELECT max(reports.severity) AS severity, reports.bug_id AS bug_id 
  FROM reports
  LEFT OUTER JOIN files ON reports.file_id = files.id
  LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
  GROUP BY reports.bug_id
) AS anon_1
GROUP BY anon_1.severity

-- Non-unique, TIME: 215ms
SELECT reports.severity AS reports_severity, count(reports.id) AS count_1 
FROM reports
LEFT OUTER JOIN files ON reports.file_id = files.id
LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
GROUP BY reports.severity

-- getCheckerMsgCounts
-- Unique, TIME: 3954ms, ROWS: 20115
SELECT anon_1.checker_message AS anon_1_checker_message, count(anon_1.bug_id) AS count_1 
FROM (
  SELECT max(reports.checker_message) AS checker_message, reports.bug_id AS bug_id 
  FROM reports
  LEFT OUTER JOIN files ON reports.file_id = files.id
  LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
  GROUP BY reports.bug_id
) AS anon_1
GROUP BY anon_1.checker_message
ORDER BY anon_1.checker_message

-- Non-unique, TIME: 2451ms
SELECT reports.checker_message AS reports_checker_message, count(reports.id) AS count_1 
FROM reports
LEFT OUTER JOIN files ON reports.file_id = files.id
LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
GROUP BY reports.checker_message
ORDER BY reports.checker_message

-- getReviewStatusCounts
-- Unique, TIME: 338ms
SELECT max(anon_1.bug_id) AS max_1, anon_1.status AS anon_1_status, count(anon_1.bug_id) AS count_1 
FROM (
  SELECT reports.bug_id AS bug_id, max(review_statuses.status) AS status 
  FROM reports
  LEFT OUTER JOIN files ON reports.file_id = files.id
  LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
  GROUP BY reports.bug_id
) AS anon_1
GROUP BY anon_1.status

-- Non-unique, TIME: 313ms
SELECT max(reports.bug_id) AS max_1, review_statuses.status AS review_statuses_status, count(reports.id) AS count_1 
FROM reports
LEFT OUTER JOIN files ON reports.file_id = files.id
LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
GROUP BY review_statuses.status

-- getFileCounts
-- Unique, TIME: 772ms
SELECT files.filename AS files_filename, anon_1.report_count AS anon_1_report_count 
FROM files JOIN (
  SELECT anon_2.file_id AS file_id, count(1) AS report_count 
  FROM (
    SELECT reports.bug_id, max(reports.file_id) AS file_id 
    FROM reports
    LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
    GROUP BY reports.bug_id
  ) AS anon_2
  GROUP BY anon_2.file_id
) AS anon_1 ON anon_1.file_id = files.id
ORDER BY files.filename

-- Non-unique, TIME: 1441ms, ROWS: 10033
SELECT files.filepath AS files_filepath, anon_1.report_count AS anon_1_report_count 
FROM files JOIN (
  SELECT anon_2.file_id AS file_id, count(1) AS report_count 
  FROM (
    SELECT reports.bug_id AS bug_id, reports.file_id AS file_id 
    FROM reports
    LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
  ) AS anon_2
  GROUP BY anon_2.file_id
) AS anon_1 ON anon_1.file_id = files.id
ORDER BY files.filepath

-- getRunHistoryTagCounts
-- Unique, TIME: 1451ms
SELECT anon_1.run_id AS anon_1_run_id, max(runs.name) AS run_name, max(run_histories.time) AS max_1, max(run_histories.version_tag) AS max_2, sum(anon_2.report_count) AS sum_1 
FROM (
  SELECT run_histories.run_id AS run_id, max(run_histories.id) AS run_history_id 
  FROM run_histories 
  WHERE run_histories.version_tag IS NOT NULL
  GROUP BY run_histories.run_id
) AS anon_1
LEFT OUTER JOIN run_histories ON run_histories.id = anon_1.run_history_id
LEFT OUTER JOIN runs ON runs.id = anon_1.run_id
LEFT OUTER JOIN (
  SELECT reports.run_id AS run_id, count(DISTINCT reports.bug_id) AS report_count 
  FROM reports
  LEFT OUTER JOIN files ON reports.file_id = files.id
  LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
  GROUP BY reports.run_id
) AS anon_2 ON anon_2.run_id = run_histories.run_id 
WHERE run_histories.version_tag IS NOT NULL
GROUP BY anon_1.run_id
ORDER BY run_name

-- Non-unique, TIME: 278
SELECT anon_1.run_id AS anon_1_run_id, max(runs.name) AS run_name, max(run_histories.time) AS max_1, max(run_histories.version_tag) AS max_2, sum(anon_2.report_count) AS sum_1 
FROM (
  SELECT run_histories.run_id AS run_id, max(run_histories.id) AS run_history_id 
  FROM run_histories 
  WHERE run_histories.version_tag IS NOT NULL
  GROUP BY run_histories.run_id
) AS anon_1
LEFT OUTER JOIN run_histories ON run_histories.id = anon_1.run_history_id
LEFT OUTER JOIN runs ON runs.id = anon_1.run_id
LEFT OUTER JOIN (
  SELECT reports.run_id AS run_id, count(*) AS report_count 
  FROM reports
  LEFT OUTER JOIN files ON reports.file_id = files.id
  LEFT OUTER JOIN review_statuses ON review_statuses.bug_hash = reports.bug_id
  GROUP BY reports.run_id
) AS anon_2 ON anon_2.run_id = run_histories.run_id 
WHERE run_histories.version_tag IS NOT NULL
GROUP BY anon_1.run_id
ORDER BY run_name

@@ -303,6 +303,6 @@ def test_uniqueing_compared_to_test_config(self):
unique_bugs = set()
# Uniqueing is done based on file name, line number, and hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update comment here.


review_data = create_review_data(review_status)
if cmp_data:
q = q.filter(Report.bug_id.in_(diff_hashes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use the file name when calculating the diff between two runs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't use the file name in diff. We use only bug hash.

@csordasmarton csordasmarton force-pushed the group_by_bughash branch 2 times, most recently from a829b86 to e4455e3 Compare November 14, 2017 14:24

return filter_report_filter(q, filter_expression, run_ids, cmp_data,
diff_hashes)
def group_by_unique_fields(q):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is just me but I find the name of this confusing. Some documentation might help. So bug_id is the only unique field in report table??

- Increase performance of the filter queries.
- Uniqueing the filters and reports only by bughash.
@bruntib bruntib merged commit 28d4acb into Ericsson:master Nov 14, 2017
@csordasmarton csordasmarton deleted the group_by_bughash branch November 16, 2017 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database 🗄️ Issues related to the database schema. enhancement 🌟
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants