Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sip-68): Add DatasourceDAO class to manage querying different datasources easier #20030

Merged
merged 4 commits into from
May 13, 2022

Conversation

hughhhh
Copy link
Member

@hughhhh hughhhh commented May 11, 2022

SUMMARY

Building a data access object for Datasource to make it easier to query a source based upon id and type. With sip-68 work we'll be allowing users to power charts with different data object such as (queries, savedqueries, sl_datasets, and sl_tables) this work will make it easier to do quick look up of these objects with out having to write and ORM query.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented May 11, 2022

Codecov Report

Merging #20030 (c3b209b) into master (3a379af) will decrease coverage by 0.13%.
The diff coverage is 72.10%.

❗ Current head c3b209b differs from pull request most recent head afa8760. Consider uploading reports for the commit afa8760 to get more accurate results

@@            Coverage Diff             @@
##           master   #20030      +/-   ##
==========================================
- Coverage   66.33%   66.19%   -0.14%     
==========================================
  Files        1713     1713              
  Lines       64083    64148      +65     
  Branches     6734     6744      +10     
==========================================
- Hits        42509    42463      -46     
- Misses      19863    19972     +109     
- Partials     1711     1713       +2     
Flag Coverage Δ
hive ?
mysql 82.01% <78.04%> (-0.02%) ⬇️
postgres 82.07% <78.04%> (-0.02%) ⬇️
presto ?
python 82.16% <78.04%> (-0.37%) ⬇️
sqlite 81.81% <78.04%> (-0.02%) ⬇️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../plugin-chart-echarts/src/Funnel/transformProps.ts 71.73% <ø> (ø)
...hart-echarts/src/MixedTimeseries/transformProps.ts 0.00% <ø> (ø)
...ins/plugin-chart-echarts/src/Pie/transformProps.ts 55.07% <ø> (ø)
...s/plugin-chart-echarts/src/Radar/transformProps.ts 0.00% <ø> (ø)
...gin-chart-echarts/src/Timeseries/transformProps.ts 57.89% <ø> (ø)
...lugin-chart-echarts/src/Timeseries/transformers.ts 51.63% <ø> (ø)
...plugin-chart-echarts/src/Treemap/transformProps.ts 51.51% <ø> (ø)
...ins/preset-chart-xy/src/components/Line/Encoder.ts 100.00% <ø> (ø)
...rc/SqlLab/components/ScheduleQueryButton/index.tsx 20.75% <0.00%> (ø)
superset-frontend/src/components/Select/styles.tsx 81.13% <ø> (ø)
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3a379af...afa8760. Read the comment docs.



def test_get_datasource_sl_table(app_context: None, session: Session) -> None:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these are just pass?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently working on these test now and will push them up soon

@hughhhh hughhhh force-pushed the datasourcedao branch 4 times, most recently from 0ebe324 to 0ff252d Compare May 11, 2022 20:24
Copy link
Member

@pkdotson pkdotson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, Hugh! I would just you to remove Any from the Datasource type and add a custom exception instead of reusing DatasetNotFoundError. I love the unit tests in this PR!

from superset.tables.models import Table
from superset.utils.core import DatasourceType

Datasource = Union[Dataset, SqlaTable, Table, Query, SavedQuery, Any]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Datasource = Union[Dataset, SqlaTable, Table, Query, SavedQuery, Any]
Datasource = Union[Dataset, SqlaTable, Table, Query, SavedQuery]

We don't want Datasource to be of type Any — on the contrary, it's really nice that the type clearly defines what constitutes a datasource.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betodealmeida this is a hack to allow the types to play nice with mypy, the good thing is it still does some type checking. I was going to look into this more, but literally spent most of the day looking into it and had no luck. I can revisit this if you think it's a mandatory thing

Copy link
Member

@betodealmeida betodealmeida May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if you need to add Any to make mypy pass it's because something is wrong. Adding Any never fixes anything, it only makes the type checker happy. In this case it defeats the purpose of type checking, because all objects are of type Any.

Was this fixed by using Type?

cls, datasource_type: DatasourceType, datasource_id: int, session: Session
) -> Datasource:
if datasource_type not in cls.sources:
raise DatasetNotFoundError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should create a custom exception here (DatasourceTypeNotSupportedError, for example).


@classmethod
def get_datasource(
cls, datasource_type: DatasourceType, datasource_id: int, session: Session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, but it's nice to standardize: in your other methods the session is the first argument, in this one it's the last. I'd move it to first argument here, for consistency.

Comment on lines 81 to 82
schema: str,
database_name: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar small nit here: in general schema comes after database in our functions/methods (since they are hierarchical), I would swap the order for consistency.

permissions: Set[str],
schema_perms: Set[str],
) -> List[Datasource]:
# TODO(bogdan): add unit test
Copy link
Member

@betodealmeida betodealmeida May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't leave TODOs for other people! :)

Suggested change
# TODO(bogdan): add unit test
# TODO(hughhhh): add unit test

Or remove it altogether.

) -> List[Datasource]:
# TODO(bogdan): add unit test
datasource_class = DatasourceDAO.sources[DatasourceType[database.type]]
if isinstance(datasource_class, SqlaTable):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, if you invert the if you can remove an indentation level:

if not isinstance(datasource_class, SqlaTable):
    return []

return (
    session.query(...)
    ...
)

) -> Optional[Datasource]:
"""Returns datasource with columns and metrics."""
datasource_class = DatasourceDAO.sources[DatasourceType[datasource_type]]
if isinstance(datasource_class, SqlaTable):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, if you invert the if the code is a bit easier to read:

if not isinstance(datasource_class, SqlaTable):
    return None

return (...)

schema: Optional[str] = None,
) -> List[Datasource]:
datasource_class = DatasourceDAO.sources[DatasourceType[database.type]]
if isinstance(datasource_class, SqlaTable):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

Copy link
Member

@eschutho eschutho May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only sqlatable on these? Is the plan to add in the rest when we need them?

from superset.utils.core import DatasourceType


def create_test_data(session: Session) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also rewrite this as a fixture:

@pytest.fixture
def session_with_data(session: Session) -> Iterator[Session]:
    ...  # code of `create_test_data` here
    yield session

And then in your tests:

def test_get_datasource_sqlatable(app_context: None, session_with_data: Session) -> None:
    ...

But there's no need, a function like this also works fine.

from superset.tables.models import Table
from superset.utils.core import DatasourceType

Datasource = Union[Dataset, SqlaTable, Table, Query, SavedQuery, Any]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we should keep the Datasource type and DatasourceType enum in the same file so that if one changes, we can be sure to change the other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea i'll move the Datasource reference next to the enum

if isinstance(source_class, SqlaTable):
qry = source_class.default_query(qry)
datasources.extend(qry.all())
return datasources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just limit this to just SqlaTable since no charts or dashboards will be able to be saved with any other datasources

)

if not datasource:
raise DatasetNotFoundError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here re what Beto said above. This could be a missing Table or Query.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments!

@hughhhh hughhhh merged commit 21c5b26 into apache:master May 13, 2022
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.0.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants