Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for SQL as execution engine #306

Merged
merged 226 commits into from
Apr 11, 2021
Merged

Adding support for SQL as execution engine #306

merged 226 commits into from
Apr 11, 2021

Conversation

thyneb19
Copy link
Contributor

@thyneb19 thyneb19 commented Mar 14, 2021

Overview

Merging in the sql-engine branch to bring the updated Lux SQL functionality to the main release. Users will be able to connect LuxSQLTable objects to their database tables and views, and leverage Lux' recommendation system without having to pull all of their database data locally.

Changes

This PR will update the Lux SQLExecutor as well as add a new LuxSQLTable object. The LuxSQLTable object is meant to help users differentiate between Lux' functionality when connecting to a Dataframe versus a SQL database. This PR makes the following major changes:

  • Updates SQLExecutor.py to query data necessary for all of Lux' supported charts
  • Adds sqltable.py which contains the LuxSQLTable class. This class inherits the Lux recommendation system utilities from the LuxDataFrame object
  • Adjusts frame.py to ensure metadata and recommendation maintenance works when using the SQLExecutor
  • Adds scripts to test the functionality of the SQLExecutor

Example Output

Script to reproduce:
After setting up Postgres, go into command line via psql postgres, then setup via:

CREATE USER postgres WITH PASSWORD 'lux';
ALTER USER postgres WITH SUPERUSER;
DROP schema public cascade;
CREATE schema public;
CREATE DATABASE postgres;

Then run python upload_car_data.py or other data upload scripts inside lux/data

from sqlalchemy import create_engine
engine = create_engine("postgresql://postgres:lux@localhost:5432")

tbl = lux.LuxSQLTable()
lux.config.set_SQL_connection(engine)
tbl.set_SQL_table("car")

tbl

Users will now be able to connect Lux to their database tables like so:

sql_executor

19thyneb and others added 30 commits October 15, 2020 11:26
Uses unique value metadata to verify if a value is valid
frame.py was trying to import luxWidget instead of luxwidget
… with SQL Executor

Some interestingness functions required the number of observations in the data and visualization, so I added these values to the metadata to make the scoring work when using the SQL executor

Added tests for SQL executor
Removed lines that changed Year column type to datetime
SQL Executor tests interfering with travis build, commenting out for now
Issue where validator was relying on metadata which was not yet generated, moved metadata calculation before validation step in frame.py
Renamed num_obs to length, removed ordinal variable from Executor mapping function
* Merging Recent SQL Executor changes

* Fix to Validator

Uses unique value metadata to verify if a value is valid

* Fix Bug with Widget Rendering

frame.py was trying to import luxWidget instead of luxwidget

* Added Number of Observations to MetaData, Fixed Interestingness issue with SQL Executor

Some interestingness functions required the number of observations in the data and visualization, so I added these values to the metadata to make the scoring work when using the SQL executor

Added tests for SQL executor

* Re-added Licensing Headers

* Adding Recent frame.py changes

* Adjusted SQL Executor Tests

Removed lines that changed Year column type to datetime

* Update Frame with new Action Registering

* Resolving Conflicts in frame.py

* Commenting out local SQL Executor tests

SQL Executor tests interfering with travis build, commenting out for now

* Update correlation.py

* Update frame.py

* Fixing Code Format

* Cleaning up Pandas Executor imports

* Fix Validation Bug

Issue where validator was relying on metadata which was not yet generated, moved metadata calculation before validation step in frame.py

* Changed metadata variable name

Renamed num_obs to length, removed ordinal variable from Executor mapping function

Co-authored-by: 19thyneb <thyne.boonmark@gmail.com>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>
Updated travis.yml file to create postgresql database in test instance.

Added script to populate test database with data.
Updated database credentials
* Merging Recent SQL Executor changes

* Fix to Validator

Uses unique value metadata to verify if a value is valid

* Fix Bug with Widget Rendering

frame.py was trying to import luxWidget instead of luxwidget

* Added Number of Observations to MetaData, Fixed Interestingness issue with SQL Executor

Some interestingness functions required the number of observations in the data and visualization, so I added these values to the metadata to make the scoring work when using the SQL executor

Added tests for SQL executor

* Re-added Licensing Headers

* Adding Recent frame.py changes

* Adjusted SQL Executor Tests

Removed lines that changed Year column type to datetime

* Update Frame with new Action Registering

* Resolving Conflicts in frame.py

* Commenting out local SQL Executor tests

SQL Executor tests interfering with travis build, commenting out for now

* Update correlation.py

* Update frame.py

* Fixing Code Format

* Cleaning up Pandas Executor imports

* Fix Validation Bug

Issue where validator was relying on metadata which was not yet generated, moved metadata calculation before validation step in frame.py

* Changed metadata variable name

Renamed num_obs to length, removed ordinal variable from Executor mapping function

* Added script to generate Postgresql database

Updated travis.yml file to create postgresql database in test instance.

Added script to populate test database with data.

* Update upload_car_data.py

Updated database credentials

* Updated script name in travis.yml

* Removed unnecessary import from travis.yml

* Added psycopg2 to requirements.txt

* Creating Postgres test database in travis

* Fixed directory issue

Co-authored-by: 19thyneb <thyne.boonmark@gmail.com>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>
Added tests for basic SQL Executor functionality.
Added an example notebook to showcase how to use the sql-engine.

Fixed variable reference in interestingness.py that was causing issues.
Rather than referencing the _length parameter throughout the code, update and use the LuxSQLTable len() function.

Added _setup_done parameter to the LuxSQLTable. This will check if the initial setup of the table, retrieving and populating attributes, is completed. This will inform which len() function to use, as the parent len() is required while populating the columns of the LuxSQLTable.
Rename _repr_html_() to _ipython_display_()
Merge in Master branch changes, add len() functionality to the LuxSQLTable
Updated _is_datetime_number() in the PandasExecutor to use the is_integer_dtype() function to check if a series is of int dtype.

Cleaned up SQLExecutor checks in frame.py
* Revert "Revert "Update LuxSQLTable __len__() and metadata computation""

This reverts commit b5998c7.

* Cleaned up datatype and SQLExecutor checks

Updated _is_datetime_number() in the PandasExecutor to use the is_integer_dtype() function to check if a series is of int dtype.

Cleaned up SQLExecutor checks in frame.py
Merging in Master branch changes to sql-engine branch
@@ -17,6 +17,7 @@
import pandas as pd
from lux.vis.Vis import Vis
from lux.vis.VisList import VisList
import psycopg2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the SQL tests into a different file so that the separation between SQL and Pandas tests is more clear? That way, it would make it easier to add a flag that only runs PandasExecutor tests (e.g., when developing locally).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a related note, the line connection = psycopg2.connect("host=localhost dbname=postgres user=postgres password=lux") is repeated many times across the tests. We could probably put this inside conftest.py as a @pytest.fixture (variables that are reused across tests in the same session).

@@ -21,6 +21,7 @@

# Suite of test that checks if data_type inferred correctly by Lux
def test_check_cars():
lux.config.set_SQL_connection("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this empty?

@dorisjlee dorisjlee merged commit e97dcd3 into master Apr 11, 2021
@dorisjlee
Copy link
Member

Thanks @thyneb19 @dj-khandelwal @NiStannum @sophiahhuang for working hard on the SQL features! We're really looking forward to this new addition to the upcoming release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants