Adding support for SQL as execution engine #306

thyneb19 · 2021-03-14T07:02:39Z

Overview

Merging in the sql-engine branch to bring the updated Lux SQL functionality to the main release. Users will be able to connect LuxSQLTable objects to their database tables and views, and leverage Lux' recommendation system without having to pull all of their database data locally.

Changes

This PR will update the Lux SQLExecutor as well as add a new LuxSQLTable object. The LuxSQLTable object is meant to help users differentiate between Lux' functionality when connecting to a Dataframe versus a SQL database. This PR makes the following major changes:

Updates SQLExecutor.py to query data necessary for all of Lux' supported charts
Adds sqltable.py which contains the LuxSQLTable class. This class inherits the Lux recommendation system utilities from the LuxDataFrame object
Adjusts frame.py to ensure metadata and recommendation maintenance works when using the SQLExecutor
Adds scripts to test the functionality of the SQLExecutor

Example Output

Script to reproduce:
After setting up Postgres, go into command line via psql postgres, then setup via:

CREATE USER postgres WITH PASSWORD 'lux';
ALTER USER postgres WITH SUPERUSER;
DROP schema public cascade;
CREATE schema public;
CREATE DATABASE postgres;

Then run python upload_car_data.py or other data upload scripts inside lux/data

from sqlalchemy import create_engine
engine = create_engine("postgresql://postgres:lux@localhost:5432")

tbl = lux.LuxSQLTable()
lux.config.set_SQL_connection(engine)
tbl.set_SQL_table("car")

tbl

Users will now be able to connect Lux to their database tables like so:

Uses unique value metadata to verify if a value is valid

frame.py was trying to import luxWidget instead of luxwidget

… with SQL Executor Some interestingness functions required the number of observations in the data and visualization, so I added these values to the metadata to make the scoring work when using the SQL executor Added tests for SQL executor

Removed lines that changed Year column type to datetime

SQL Executor tests interfering with travis build, commenting out for now

…nto Database-Executor

Issue where validator was relying on metadata which was not yet generated, moved metadata calculation before validation step in frame.py

Renamed num_obs to length, removed ordinal variable from Executor mapping function

* Merging Recent SQL Executor changes * Fix to Validator Uses unique value metadata to verify if a value is valid * Fix Bug with Widget Rendering frame.py was trying to import luxWidget instead of luxwidget * Added Number of Observations to MetaData, Fixed Interestingness issue with SQL Executor Some interestingness functions required the number of observations in the data and visualization, so I added these values to the metadata to make the scoring work when using the SQL executor Added tests for SQL executor * Re-added Licensing Headers * Adding Recent frame.py changes * Adjusted SQL Executor Tests Removed lines that changed Year column type to datetime * Update Frame with new Action Registering * Resolving Conflicts in frame.py * Commenting out local SQL Executor tests SQL Executor tests interfering with travis build, commenting out for now * Update correlation.py * Update frame.py * Fixing Code Format * Cleaning up Pandas Executor imports * Fix Validation Bug Issue where validator was relying on metadata which was not yet generated, moved metadata calculation before validation step in frame.py * Changed metadata variable name Renamed num_obs to length, removed ordinal variable from Executor mapping function Co-authored-by: 19thyneb <thyne.boonmark@gmail.com> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

…utor

Updated travis.yml file to create postgresql database in test instance. Added script to populate test database with data.

Updated database credentials

* Merging Recent SQL Executor changes * Fix to Validator Uses unique value metadata to verify if a value is valid * Fix Bug with Widget Rendering frame.py was trying to import luxWidget instead of luxwidget * Added Number of Observations to MetaData, Fixed Interestingness issue with SQL Executor Some interestingness functions required the number of observations in the data and visualization, so I added these values to the metadata to make the scoring work when using the SQL executor Added tests for SQL executor * Re-added Licensing Headers * Adding Recent frame.py changes * Adjusted SQL Executor Tests Removed lines that changed Year column type to datetime * Update Frame with new Action Registering * Resolving Conflicts in frame.py * Commenting out local SQL Executor tests SQL Executor tests interfering with travis build, commenting out for now * Update correlation.py * Update frame.py * Fixing Code Format * Cleaning up Pandas Executor imports * Fix Validation Bug Issue where validator was relying on metadata which was not yet generated, moved metadata calculation before validation step in frame.py * Changed metadata variable name Renamed num_obs to length, removed ordinal variable from Executor mapping function * Added script to generate Postgresql database Updated travis.yml file to create postgresql database in test instance. Added script to populate test database with data. * Update upload_car_data.py Updated database credentials * Updated script name in travis.yml * Removed unnecessary import from travis.yml * Added psycopg2 to requirements.txt * Creating Postgres test database in travis * Fixed directory issue Co-authored-by: 19thyneb <thyne.boonmark@gmail.com> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

Added tests for basic SQL Executor functionality.

Added an example notebook to showcase how to use the sql-engine. Fixed variable reference in interestingness.py that was causing issues.

Rather than referencing the _length parameter throughout the code, update and use the LuxSQLTable len() function. Added _setup_done parameter to the LuxSQLTable. This will check if the initial setup of the table, retrieving and populating attributes, is completed. This will inform which len() function to use, as the parent len() is required while populating the columns of the LuxSQLTable.

Rename _repr_html_() to _ipython_display_()

This reverts commit e350ab4.

This reverts commit 5d1a2f4.

This reverts commit 7c7dcd3.

Merge in Master branch changes, add len() functionality to the LuxSQLTable

This reverts commit b5998c7.

Updated _is_datetime_number() in the PandasExecutor to use the is_integer_dtype() function to check if a series is of int dtype. Cleaned up SQLExecutor checks in frame.py

* Revert "Revert "Update LuxSQLTable __len__() and metadata computation"" This reverts commit b5998c7. * Cleaned up datatype and SQLExecutor checks Updated _is_datetime_number() in the PandasExecutor to use the is_integer_dtype() function to check if a series is of int dtype. Cleaned up SQLExecutor checks in frame.py

Merging in Master branch changes to sql-engine branch

Black Reformatting

dorisjlee · 2021-04-11T00:22:12Z

tests/test_compiler.py

@@ -17,6 +17,7 @@
 import pandas as pd
 from lux.vis.Vis import Vis
 from lux.vis.VisList import VisList
+import psycopg2


Can we make the SQL tests into a different file so that the separation between SQL and Pandas tests is more clear? That way, it would make it easier to add a flag that only runs PandasExecutor tests (e.g., when developing locally).

On a related note, the line connection = psycopg2.connect("host=localhost dbname=postgres user=postgres password=lux") is repeated many times across the tests. We could probably put this inside conftest.py as a @pytest.fixture (variables that are reused across tests in the same session).

dorisjlee · 2021-04-11T00:28:02Z

tests/test_type.py

@@ -21,6 +21,7 @@

 # Suite of test that checks if data_type inferred correctly by Lux
 def test_check_cars():
+    lux.config.set_SQL_connection("")


Why is this empty?

Aiming to simplify the initial Lux installation. Will include a notice in the SQL documentation letting users know that they will have to install the library themselves if the want to use the LuxSQLTable functionality.

Removed psycopg2 from Lux requirements

…Executor" This reverts commit 68c7747, reversing changes made to 801f3cd.

…nto Database-Executor

Merging in master branch changes

dorisjlee · 2021-04-11T04:21:50Z

Thanks @thyneb19 @dj-khandelwal @NiStannum @sophiahhuang for working hard on the SQL features! We're really looking forward to this new addition to the upcoming release!

19thyneb and others added 30 commits October 15, 2020 11:26

Merging Recent SQL Executor changes

308d99c

Fix to Validator

daa9a0d

Uses unique value metadata to verify if a value is valid

Fix Bug with Widget Rendering

804b0dc

frame.py was trying to import luxWidget instead of luxwidget

Re-added Licensing Headers

8763df9

Adding Recent frame.py changes

c2b0b46

Adjusted SQL Executor Tests

1b08461

Removed lines that changed Year column type to datetime

Update Frame with new Action Registering

38c5e7e

Resolving Conflicts in frame.py

14d2f90

Merge branch 'sql-engine' into Database-Executor

78d8e10

Commenting out local SQL Executor tests

d783b4c

SQL Executor tests interfering with travis build, commenting out for now

Merge branch 'Database-Executor' of https://github.com/thyneb19/lux i…

c03e001

…nto Database-Executor

Update correlation.py

8f0e643

Update frame.py

d365d52

Fixing Code Format

7da2992

Cleaning up Pandas Executor imports

f1b7c8b

Fix Validation Bug

d97f0e4

Issue where validator was relying on metadata which was not yet generated, moved metadata calculation before validation step in frame.py

Changed metadata variable name

582b370

Renamed num_obs to length, removed ordinal variable from Executor mapping function

Merge remote-tracking branch 'upstream/sql-engine' into Database-Exec…

554c71f

…utor

Added script to generate Postgresql database

d65687a

Updated travis.yml file to create postgresql database in test instance. Added script to populate test database with data.

Update upload_car_data.py

7243b2f

Updated database credentials

Updated script name in travis.yml

2add76f

Removed unnecessary import from travis.yml

cf74beb

Added psycopg2 to requirements.txt

14d52b8

Creating Postgres test database in travis

379517d

Fixed directory issue

a72f236

Updated SQL Executor Tests

e947fd6

Added tests for basic SQL Executor functionality.

Added sql_executor example notebook, minor bug fix

1234009

Added an example notebook to showcase how to use the sql-engine. Fixed variable reference in interestingness.py that was causing issues.

thyneb19 added 16 commits March 25, 2021 17:12

Merge remote-tracking branch 'upstream/master' into Database-Executor

d296c9b

Removed unnecessary __repr__() function

8d6cf4b

Updated LuxSQLTable repr

e350ab4

Rename _repr_html_() to _ipython_display_()

Revert "Updated LuxSQLTable repr"

5d1a2f4

This reverts commit e350ab4.

Revert "Revert "Updated LuxSQLTable repr""

48c1b57

This reverts commit 5d1a2f4.

Revert "Update LuxSQLTable __len__() and metadata computation"

b5998c7

This reverts commit 7c7dcd3.

Merge pull request #327 from thyneb19/Database-Executor

75c5cae

Merge in Master branch changes, add len() functionality to the LuxSQLTable

Revert "Revert "Update LuxSQLTable __len__() and metadata computation""

6f597c2

This reverts commit b5998c7.

Cleaned up datatype and SQLExecutor checks

7999ad6

Updated _is_datetime_number() in the PandasExecutor to use the is_integer_dtype() function to check if a series is of int dtype. Cleaned up SQLExecutor checks in frame.py

Merge branch 'sql-engine' into Database-Executor

e92dbd6

Merge remote-tracking branch 'upstream/master' into Database-Executor

5399097

Merge pull request #347 from thyneb19/Database-Executor

2d24a3b

Merging in Master branch changes to sql-engine branch

Black Reformatting

6922d3b

Merge pull request #348 from thyneb19/Database-Executor

bf3cb0f

Black Reformatting

dorisjlee requested changes Apr 11, 2021

View reviewed changes

dorisjlee and others added 11 commits April 10, 2021 17:28

minor changes to requirements and cleanup

40b85b1

Merge branch 'master' into sql-engine

2298f13

Removed psycopg2 from Lux requirements

801f3cd

Aiming to simplify the initial Lux installation. Will include a notice in the SQL documentation letting users know that they will have to install the library themselves if the want to use the LuxSQLTable functionality.

Merge branch 'sql-engine' into Database-Executor

284f5ba

Merge pull request #352 from thyneb19/Database-Executor

1e02ad6

Removed psycopg2 from Lux requirements

Merge remote-tracking branch 'upstream/master' into Database-Executor

68c7747

Revert "Merge remote-tracking branch 'upstream/master' into Database-…

f11e772

…Executor" This reverts commit 68c7747, reversing changes made to 801f3cd.

Merge branch 'Database-Executor' of https://github.com/thyneb19/lux i…

a309361

…nto Database-Executor

Merge pull request #353 from thyneb19/Database-Executor

1632005

Merging in master branch changes

add back merged overridden changes

c94b5a6

merge conflict fixed

50f8562

dorisjlee merged commit e97dcd3 into master Apr 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for SQL as execution engine #306

Adding support for SQL as execution engine #306

thyneb19 commented Mar 14, 2021 •

edited by dorisjlee

Loading

dorisjlee Apr 11, 2021

dorisjlee Apr 11, 2021

dorisjlee Apr 11, 2021

dorisjlee commented Apr 11, 2021

Adding support for SQL as execution engine #306

Adding support for SQL as execution engine #306

Conversation

thyneb19 commented Mar 14, 2021 • edited by dorisjlee Loading

Overview

Changes

Example Output

dorisjlee Apr 11, 2021

Choose a reason for hiding this comment

dorisjlee Apr 11, 2021

Choose a reason for hiding this comment

dorisjlee Apr 11, 2021

Choose a reason for hiding this comment

dorisjlee commented Apr 11, 2021

thyneb19 commented Mar 14, 2021 •

edited by dorisjlee

Loading