Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sqllab] fix: return pandas records in execute_sql_statements #9102

Merged
merged 2 commits into from
Feb 14, 2020

Conversation

nytai
Copy link
Member

@nytai nytai commented Feb 8, 2020

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

SQLLAB_BACKEND_PERSISTENCE feature stores the results in the results backend which is used to rehydrate tab states. Data is incorrectly being returned in serialized binary form due to the assumption that if we're storing results the query is being executed asynchronously. This pr fixes that assumption by also checking if we're looking to return results (in which case we also use the df_to_records and JSON serialization approach).

This is obviously not the most performant way of handling this as were doing pyarrow + msgpack to store the results in the results backend, as well as pandas + json to return the results, however it's the easiest way to get SQLLAB_BACKEND_PERSISTENCE + RESULTS_BACKEND_USE_MSGPACK working together without a significant rewrite/refactor.

TEST PLAN

  • tested locally that sqllab works with SQLLAB_BACKEND_PERSISTENCE feature enabled
  • tested locally that sqllab works with SQLLAB_BACKEND_PERSISTENCE feature disabled

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@robdiciuccio @betodealmeida

@nytai nytai force-pushed the tai/fix-sqllab-backend-msgpack branch from d52832b to 15a4dce Compare February 8, 2020 02:37
@nytai nytai changed the title don't use msgpack if using SQLLAB_BACKEND_PERSISTENCE and returning r… skip msgpack if using SQLLAB_BACKEND_PERSISTENCE and returning results Feb 8, 2020
@robdiciuccio
Copy link
Member

Ideally, we'll always want to serialize to msgpack (vs json) when persisting to the results backend due to the performance gains. Instead of skipping msgpack serialization, can the logic used to fetch stored results and rehydrate the tab state perform the proper deserialization?

I'm still not clear where the issue lies with synchronous queries returning bad data. Is msgpack modifying the payload dict passed into _serialize_payload?

Previously, RESULTS_BACKEND was only used for async queries. What is the expected behavior of server-persisted tab state if no results backend is configured? Should we update the documentation to explain this new use case?

@nytai nytai force-pushed the tai/fix-sqllab-backend-msgpack branch from 15a4dce to 460b46d Compare February 11, 2020 21:49
@nytai nytai changed the title skip msgpack if using SQLLAB_BACKEND_PERSISTENCE and returning results [sqllab] fix: return pandas records in execute_sql_statements Feb 11, 2020
@codecov-io
Copy link

Codecov Report

Merging #9102 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #9102   +/-   ##
======================================
  Coverage    59.1%   59.1%           
======================================
  Files         372     372           
  Lines       11922   11922           
  Branches     2919    2919           
======================================
  Hits         7046    7046           
  Misses       4694    4694           
  Partials      182     182

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a005e45...460b46d. Read the comment docs.

@nytai nytai force-pushed the tai/fix-sqllab-backend-msgpack branch from 460b46d to f3dff5a Compare February 11, 2020 22:12
@nytai nytai force-pushed the tai/fix-sqllab-backend-msgpack branch from f3dff5a to 435ac95 Compare February 11, 2020 22:18
@nytai
Copy link
Member Author

nytai commented Feb 11, 2020

@robdiciuccio Looks like it's the pyarrow data that's getting sent to the frontend, thus it ends up as '[bytes]' for that key. As for if the feature flag is on and no results backend is configured the following is displayed to the user:

Screen Shot 2020-02-11 at 2 23 20 PM

@mistercrunch mistercrunch merged commit f95a867 into apache:master Feb 14, 2020
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S 🚢 0.36.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants