-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Superset 4.1 with Pinot not aggregating timestamps correctly #30219
Comments
Hey @anthony-df-nguyen! I'm here to help you with bugs, questions, and becoming a contributor. Let's squash those bugs together! 😊 I found a similar closed issue that might be relevant to your problem: refactor(pinot) regression on area chart (DB engine error). This issue was resolved by updating the The root cause of the problem you're experiencing is that the @classmethod
def epoch_ms_to_dttm(cls) -> str:
return (
"DATETIMECONVERT({col}, '1:MILLISECONDS:EPOCH', "
+ "'1:MILLISECONDS:EPOCH', '1:MILLISECONDS')"
) Workarounds:
These changes should help ensure that the timestamps are converted from milliseconds to a more appropriate format for aggregation, thus resolving the issue [2]. |
Hello @anthony-df-nguyen , this should be fixed as of #31341. I believe it'll be included in the next patch release of 4.0 and 4.1. |
Ping @michael-s-molina @sadpandajoe as you're the release managers for 4.0 and 4.1 respectively |
Bug description
Issue
It appears that when using Pinot (1.2.0) with Superset 4.1, columns that are epoch_ms formatted fields are not aggregating correctly without a lot of workarounds. It worked okay for us back when we were on Superset 2.0.
On Superset 2.0, when we used the old
Time Series Line Chart
, if we use an epoch_ms column as the time column, the query being made was the following which was working.SELECT DATETIMECONVERT(incident_date_epoch, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:DAYS'), count(DISTINCT report_id) AS count_1 FROM "default".incident_reports GROUP BY DATETIMECONVERT(incident_date_epoch, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:DAYS') ORDER BY count(DISTINCT report_id) DESC LIMIT 10000;
Example of previously working Time Series Line Chart
![image](https://private-user-images.githubusercontent.com/12958581/366135240-2e02e75a-79ac-4ae1-bdbd-c3c14e30bbee.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMjYwMjUsIm5iZiI6MTczOTMyNTcyNSwicGF0aCI6Ii8xMjk1ODU4MS8zNjYxMzUyNDAtMmUwMmU3NWEtNzlhYy00YWUxLWJkYmQtYzNjMTRlMzBiYmVlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDAyMDIwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTIyOTE5MjNhMDllZDYxMjY2MDcwM2IxMmY1MzAwZjM1NWUzODJiMDRhYTE2ZWU5ZmY3MDc1YThlY2MxMzlhMWMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.0gZETAEeg9DtvoWT3pDEaTO7Ov8HFGt-hFErSWp8bNg)
Now on Superset 4.0, we use the newer
Line Chart
but using the same time column uses this query which appears to throw an errorSELECT CAST(DATE_TRUNC( 'QUARTER', CAST(DATETIMECONVERT(incident_date_epoch, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MILLISECONDS') AS TIMESTAMP) ) AS TIMESTAMP) AS incident_date_epoch, COUNT(DISTINCT report_id) AS "COUNT_DISTINCT(report_id)" FROM "default".incident_reports GROUP BY CAST(DATE_TRUNC( 'QUARTER', CAST(DATETIMECONVERT(incident_date_epoch, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MILLISECONDS') AS TIMESTAMP) ) AS TIMESTAMP) ORDER BY COUNT(DISTINCT report_id) DESC LIMIT 10000;
Error
![image](https://private-user-images.githubusercontent.com/12958581/366135187-adb4d082-0e9d-4a92-8f96-d9a3c5a5d966.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMjYwMjUsIm5iZiI6MTczOTMyNTcyNSwicGF0aCI6Ii8xMjk1ODU4MS8zNjYxMzUxODctYWRiNGQwODItMGU5ZC00YTkyLThmOTYtZDlhM2M1YTVkOTY2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDAyMDIwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNiOGQ0YzMzNGEyYmM3YWY2Njk1MWZkZjQ0ZmQzZDcxMTdkMzFiMThhMmU2YWE4N2Q3NzdlMjNjMzFlMjE4NGMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.oY-Ik3GmP18uuO6xHQgknjzxeNM9cEd3X8G0Pxu8rcI)
Workarounds
I was able to workaround this doing 2 things, which are still not ideal
Workaround 1
I can make a calculated column in Superset for each epoch_ms field im using, in which i set
![image](https://private-user-images.githubusercontent.com/12958581/366137128-d54f0bee-c227-45a4-8509-b06e1ca937f4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMjYwMjUsIm5iZiI6MTczOTMyNTcyNSwicGF0aCI6Ii8xMjk1ODU4MS8zNjYxMzcxMjgtZDU0ZjBiZWUtYzIyNy00NWE0LTg1MDktYjA2ZTFjYTkzN2Y0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDAyMDIwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWVhNGI3MTFkYTM2OWY2ZDViNWY4MzZhZjc1NWU4NDc1NWM1ZTA3Njk4NmQ1MmY5MjVjOTJjNzk2MjU2N2I5ZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.jgvIeW9JeuJ6KtZ-UzMMxetaVsv3RHOnryg3xdiwZRk)
Data Type
to DATETIME andDate Time Format
as %Y-%m-%d %H:%M:%S.%f and have the SQL expression ToDateTime(my_epoch_ms_column, 'yyyy-MM-dd')Which will finally successfully plot on the Line Chart
![image](https://private-user-images.githubusercontent.com/12958581/366137233-1f220ce7-f94b-4b2f-bcec-158119833db3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMjYwMjUsIm5iZiI6MTczOTMyNTcyNSwicGF0aCI6Ii8xMjk1ODU4MS8zNjYxMzcyMzMtMWYyMjBjZTctZjk0Yi00YjJmLWJjZWMtMTU4MTE5ODMzZGIzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDAyMDIwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg1Y2IxYjBjNTMwNDA2Y2IxYzc0NWY5NmNlODQwOWRlMDQ2ZDI5YTBiNGY5YTY1MjE5NzQwZDVmYTdhN2MxZjQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.mAdV9zvSOAJff4G4EoWPjFIDDyyFhj1wLZenee5d-xg)
which produces this SQL query
SELECT CAST(DATE_TRUNC('DAY', CAST(TODATETIME(incident_date_epoch, 'yyyy-MM-dd') AS TIMESTAMP)) AS TIMESTAMP) AS parsed_incident_date_epoch, COUNT(DISTINCT report_id) AS "COUNT_DISTINCT(report_id)" FROM "default".incident_reports GROUP BY CAST(DATE_TRUNC('DAY', CAST(TODATETIME(incident_date_epoch, 'yyyy-MM-dd') AS TIMESTAMP)) AS TIMESTAMP) ORDER BY COUNT(DISTINCT report_id) DESC LIMIT 10000;
Workaround 2
It appears that simply renaming the time column in the chart itself will finally get the chart to plot, although the format still looks ugly
Which produces SQL query
SELECT CAST(DATE_TRUNC( 'DAY', CAST(DATETIMECONVERT(incident_date_epoch, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MILLISECONDS') AS TIMESTAMP) ) AS TIMESTAMP) AS "Incident Date Epoch", COUNT(DISTINCT report_id) AS "COUNT_DISTINCT(report_id)" FROM "default".incident_reports GROUP BY CAST(DATE_TRUNC( 'DAY', CAST(DATETIMECONVERT(incident_date_epoch, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MILLISECONDS') AS TIMESTAMP) ) AS TIMESTAMP) ORDER BY COUNT(DISTINCT report_id) DESC LIMIT 10000;
Other Threads
I've seen these other issues which dont appear to have a clear resolution to the issue im describing
How to reproduce the bug
epoch_ms
Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.10
Node version
I don't know
Browser
Chrome
Additional context
This is the logs I see in superset when trying the line chart presented in my issues
Traceback (most recent call last):
File "/app/superset/connectors/sqla/models.py", line 1724, in query
df = self.database.get_df(
File "/app/superset/models/core.py", line 677, in get_df
self.db_engine_spec.execute(cursor, sql_, self)
File "/app/superset/db_engine_specs/base.py", line 1828, in execute
raise cls.get_dbapi_mapped_exception(ex) from ex
File "/app/superset/db_engine_specs/base.py", line 1824, in execute
cursor.execute(query)
File "/usr/local/lib/python3.10/site-packages/pinotdb/db.py", line 44, in g
return f(self, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/pinotdb/db.py", line 312, in execute
raise exceptions.DatabaseError(msg)
pinotdb.exceptions.DatabaseError: {'errorCode': 150,
'message': 'SQLParsingError:\n'
'org.apache.pinot.sql.parsers.SqlCompilationException: '
"'as(cast(datetrunc('DAY', "
"cast(datetimeconvert(incident_date_epoch, '1:MILLISECONDS:EPOCH', "
"'1:MILLISECONDS:EPOCH', '1:MILLISECONDS'), 'TIMESTAMP')), "
"'TIMESTAMP'), incident_date_epoch)' should appear in GROUP BY "
'clause.\n'
'\tat '
'org.apache.pinot.sql.parsers.CalciteSqlParser.validateGroupByClause(CalciteSqlParser.java:191)\n'
'\tat '
'org.apache.pinot.sql.parsers.CalciteSqlParser.validate(CalciteSqlParser.java:174)\n'
'\tat '
'org.apache.pinot.sql.parsers.CalciteSqlParser.queryRewrite(CalciteSqlParser.java:550)\n'
'\tat '
'org.apache.pinot.sql.parsers.CalciteSqlParser.compileSqlNodeToPinotQuery(CalciteSqlParser.java:484)'}
Checklist
The text was updated successfully, but these errors were encountered: