Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Table performance improvements #3509

Merged
merged 3 commits into from
Sep 25, 2017

Conversation

Mogball
Copy link
Contributor

@Mogball Mogball commented Sep 21, 2017

Reduces the number of database requests issued during "Add Table". Testing on a local environment with the example datasets gave about a 4.5x speed up.

  • Added more performant "check if table exists" and "get dialect" functions
  • Removed queries and calls to session.commit() from the inside of loops

@coveralls
Copy link

coveralls commented Sep 21, 2017

Coverage Status

Coverage increased (+0.02%) to 69.558% when pulling 022ca79 on Mogball:mogball/bugfix/add_table into 9af34ba on apache:master.


def get_dialect(self):
sqla_url = url.make_url(self.sqlalchemy_uri_decrypted)
entrypoint = sqla_url._get_entrypoint()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not return self.get_sqla_engine().dialect ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit more overhead to it, plus creating an engine just to get the dialect and then ignoring the instance seems wasteful.

These few lines are what create_engine does to determine dialect anyway.

Copy link
Member

@mistercrunch mistercrunch Sep 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate calling private functions in external libs though... there's no guarantees as to whether the api may change in minor version changes.

To mitigate the cost we could change the use of get_sqla_engine by using a cached sqla_engine @property. I think it's only a matter of adding a decorator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check out the @memoize decorator in superset.utils

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also for the sake of moving forward I don't mind if you just call get_sqla_engine here and save the sqla_engine memoized property for another PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I was looking for a way to cache the sqla_engine because that function does get called frequently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, turns out we can just call sqla_url.get_dialect()

@coveralls
Copy link

coveralls commented Sep 22, 2017

Coverage Status

Coverage increased (+0.02%) to 69.558% when pulling 6909de6 on Mogball:mogball/bugfix/add_table into 9af34ba on apache:master.

@coveralls
Copy link

coveralls commented Sep 22, 2017

Coverage Status

Coverage increased (+0.02%) to 69.558% when pulling 5451ebe on Mogball:mogball/bugfix/add_table into 9af34ba on apache:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 69.558% when pulling 5451ebe on Mogball:mogball/bugfix/add_table into 9af34ba on apache:master.

@mistercrunch mistercrunch merged commit f3146ef into apache:master Sep 25, 2017
timifasubaa pushed a commit to timifasubaa/incubator-superset that referenced this pull request Oct 3, 2017
* Improved performance of 'Add table' function

* got rid of pvt function call

* changes metric obj to key on metric_name
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
* Improved performance of 'Add table' function

* got rid of pvt function call

* changes metric obj to key on metric_name
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.20.1 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.20.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants