-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve order in to_dataframe
with BQ Storage from queries containing ORDER BY
#7793
Conversation
if query is None: | ||
return False | ||
|
||
return re.search(r"ORDER\s+BY", query, re.IGNORECASE) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a comment indicating we know there's a class of false positives (ordered window functions)?
if query is None: | ||
return False | ||
|
||
return re.search(r"ORDER\s+BY", query, re.IGNORECASE) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please compile the pattern as a module-scope global, and use "raw" strings e.g.:
_CONTAINS_ORDER_BY = re.compile(r"ORDER\s+BY", re.IGNORECASE)
...
def _contains_order_by(query):
"""Do we need to preserve the order of the query results?"""
return query and _CONTAINS_ORDER_BY.search(query)
…ng ORDER BY This fixes an issue where due to reading from multiple stream in parallel, the order of rows is not preserved. Normally this isn't an issue, but it is when the rows are query results from an ORDER BY query.
3b0c7ae
to
babdd66
Compare
This fixes an issue where due to reading from multiple stream in
parallel, the order of rows is not preserved. Normally this isn't an
issue, but it is when the rows are query results from an
ORDER BY
query.Note: The detection of
ORDER BY
is not perfect. It will have false positives (such asORDER BY
in a string or sub-query), but hopefully no false negatives. Since a false positive only means we read from a single stream, behavior is correct but perhaps slower than expected.Closes #7759