Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Проблемы детерминированных запросов Clickbench #6674

Open
iddqdex opened this issue Jul 15, 2024 · 0 comments
Assignees

Comments

@iddqdex
Copy link
Collaborator

iddqdex commented Jul 15, 2024

По нескольким запросам ответы отличаются от канонизированных.
1.

/*28*/ SELECT key, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM $data WHERE Referer <> '' GROUP BY Url::CutWWW(Url::GetHost(Referer)) as key HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

По моим представлениям значение поля l изменилось с переходом с String на Utf8. Канонизировать как есть? Сделать два варианта канонических результатов?

/*36*/ SELECT URL, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-01') AND EventDate <= Date('2013-07-31') AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;
/*37*/ SELECT Title, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-01') AND EventDate <= Date('2013-07-31') AND DontCountHits = 0 AND IsRefresh = 0 AND Title <> '' GROUP BY Title ORDER BY PageViews DESC LIMIT 10;
/*38*/ SELECT URL, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-01') AND EventDate <= Date('2013-07-31') AND IsRefresh = 0 AND IsLink <> 0 AND IsDownload = 0 GROUP BY URL ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
/*39*/ SELECT TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-01') AND EventDate <= Date('2013-07-31') AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
/*40*/ SELECT URLHash, EventDate, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-01') AND EventDate <= Date('2013-07-31') AND IsRefresh = 0 AND TraficSourceID IN (-1, 6) AND RefererHash = 3594120000172545465 GROUP BY URLHash, EventDate ORDER BY PageViews DESC LIMIT 10 OFFSET 100;

У этих запросов много одинаковых значений по полю PageViews из-за чего недетерменированно, какие именно попадут в окно и в каком порядке. Предлагаю добавить полей в сортировки и переканонизировать:

/*36*/ SELECT URL, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-01') AND EventDate <= Date('2013-07-31') AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews, URL DESC LIMIT 10;
/*37*/ SELECT Title, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-01') AND EventDate <= Date('2013-07-31') AND DontCountHits = 0 AND IsRefresh = 0 AND Title <> '' GROUP BY Title ORDER BY PageViews, Title DESC LIMIT 10;
/*38*/ SELECT URL, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-01') AND EventDate <= Date('2013-07-31') AND IsRefresh = 0 AND IsLink <> 0 AND IsDownload = 0 GROUP BY URL ORDER BY PageViews, URL DESC LIMIT 10 OFFSET 1000;
/*39*/ SELECT TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-01') AND EventDate <= Date('2013-07-31') AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst ORDER BY PageViews, TraficSourceID DESC LIMIT 10 OFFSET 1000;
/*40*/ SELECT URLHash, EventDate, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-01') AND EventDate <= Date('2013-07-31') AND IsRefresh = 0 AND TraficSourceID IN (-1, 6) AND RefererHash = 3594120000172545465 GROUP BY URLHash, EventDate ORDER BY PageViews, URLHash DESC LIMIT 10 OFFSET 100;
/*42*/ SELECT Minute, COUNT(*) AS PageViews FROM $data WHERE CounterID = 62 AND EventDate >= Date('2013-07-14') AND EventDate <= Date('2013-07-15') AND IsRefresh = 0 AND DontCountHits = 0 GROUP BY DateTime::ToSeconds(EventTime)/60 As Minute ORDER BY Minute LIMIT 10 OFFSET 1000;

В канонических результатах Minute представлено в виде даты, хотя по документации должно быть числом (и приходит число и соответствует). Предлагаю поменять канонический результат.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants