Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: script to benchmark DB migrations #13561

Merged
merged 1 commit into from
Apr 15, 2021

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Mar 10, 2021

SUMMARY

Add a script benchmark_migration.py that when passed a migration script will:

  1. Identify models being mutated by the migration.
  2. Run the migration and time it.
  3. Downgrade and ensure that the models have at least 10 elements by adding new elements with fake data.
  4. Repeat the process for 100, 1000, etc. (default is 1000, configurable through a flag).
  5. Print out the results for all multiple of 10.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TEST PLAN

$ python scripts/benchmark_migration.py superset/migrations/versions/c501b7c653a3_add_missing_uuid_column.py --limit 1000
Loaded your LOCAL configuration at [/Users/beto/Projects/incubator-superset/superset_config.py]
logging was configured successfully
INFO:superset.utils.logging_configurator:logging was configured successfully
/Users/beto/Projects/incubator-superset/venv/lib/python3.8/site-packages/flask_caching/__init__.py:191: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
  warnings.warn(
Importing migration script: superset/migrations/versions/c501b7c653a3_add_missing_uuid_column.py
Migration goes from 070c043f2fdb to c501b7c653a3
Current version of the DB is c501b7c653a3

Identifying models used in the migration:
- Database (1 rows in table dbs)
- Slice (113 rows in table slices)
- DruidCluster (0 rows in table clusters)
- Dashboard (11 rows in table dashboards)
- SqlaTable (27 rows in table tables)
- SliceEmailSchedule (0 rows in table slice_email_schedules)
- DruidDatasource (0 rows in table datasources)
- DashboardEmailSchedule (0 rows in table dashboard_email_schedules)
- TableColumn (773 rows in table table_columns)
- SqlMetric (35 rows in table sql_metrics)
- DruidColumn (0 rows in table columns)
- DruidMetric (0 rows in table metrics)

Downgrade DB to 070c043f2fdb and start benchmark? [y/N]: y
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running downgrade c501b7c653a3 -> 070c043f2fdb, add missing uuid column
Benchmarking migration
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 070c043f2fdb -> c501b7c653a3, add missing uuid column

Updating dasboard position json with slice uuid.. Done.

Migration on current DB took: 0.30 seconds
Running with at least 10 entities of each model
- Adding 9 entities to the Database model
- Adding 10 entities to the DruidCluster model
- Adding 10 entities to the SliceEmailSchedule model
- Adding 10 entities to the DruidDatasource model
- Adding 10 entities to the DashboardEmailSchedule model
- Adding 10 entities to the DruidColumn model
- Adding 10 entities to the DruidMetric model
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Migration for 10+ entities took: 0.20 seconds
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running downgrade c501b7c653a3 -> 070c043f2fdb, add missing uuid column
Running with at least 100 entities of each model
- Adding 90 entities to the Database model
- Adding 90 entities to the DruidCluster model
- Adding 89 entities to the Dashboard model
- Adding 73 entities to the SqlaTable model
- Adding 90 entities to the SliceEmailSchedule model
- Adding 90 entities to the DruidDatasource model
- Adding 90 entities to the DashboardEmailSchedule model
- Adding 65 entities to the SqlMetric model
- Adding 90 entities to the DruidColumn model
- Adding 90 entities to the DruidMetric model
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 070c043f2fdb -> c501b7c653a3, add missing uuid column

Updating dasboard position json with slice uuid.. Done.

Migration for 100+ entities took: 0.28 seconds
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running downgrade c501b7c653a3 -> 070c043f2fdb, add missing uuid column
Running with at least 1000 entities of each model
- Adding 900 entities to the Database model
- Adding 887 entities to the Slice model
- Adding 900 entities to the DruidCluster model
- Adding 900 entities to the Dashboard model
- Adding 900 entities to the SqlaTable model
- Adding 900 entities to the SliceEmailSchedule model
- Adding 900 entities to the DruidDatasource model
- Adding 900 entities to the DashboardEmailSchedule model
- Adding 227 entities to the TableColumn model
- Adding 900 entities to the SqlMetric model
- Adding 900 entities to the DruidColumn model
- Adding 900 entities to the DruidMetric model
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 070c043f2fdb -> c501b7c653a3, add missing uuid column

Updating dasboard position json with slice uuid.. Done.

Migration for 1000+ entities took: 1.40 seconds
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running downgrade c501b7c653a3 -> 070c043f2fdb, add missing uuid column
Cleaning up DB

Results:

Current: 0.30 s
10+: 0.20 s
100+: 0.28 s
1000+: 1.40 s

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@betodealmeida betodealmeida changed the title Ch6269 feat: script to benchmark DB migrations Mar 16, 2021
@betodealmeida betodealmeida marked this pull request as ready for review March 16, 2021 00:32
@betodealmeida betodealmeida force-pushed the ch6269 branch 3 times, most recently from 9cf1207 to fd75074 Compare March 16, 2021 17:58
@codecov
Copy link

codecov bot commented Mar 16, 2021

Codecov Report

Merging #13561 (bb95559) into master (21f973f) will increase coverage by 0.14%.
The diff coverage is 36.17%.

❗ Current head bb95559 differs from pull request most recent head f4d1242. Consider uploading reports for the commit f4d1242 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #13561      +/-   ##
==========================================
+ Coverage   79.70%   79.85%   +0.14%     
==========================================
  Files         945      943       -2     
  Lines       47975    47836     -139     
  Branches     6082     6019      -63     
==========================================
- Hits        38240    38199      -41     
+ Misses       9614     9518      -96     
+ Partials      121      119       -2     
Flag Coverage Δ
cypress 56.36% <100.00%> (-0.04%) ⬇️
hive 80.34% <21.91%> (-0.15%) ⬇️
mysql 80.61% <21.91%> (-0.15%) ⬇️
postgres 80.64% <21.91%> (-0.15%) ⬇️
presto 80.36% <21.91%> (-0.14%) ⬇️
python 81.21% <21.91%> (-0.15%) ⬇️
sqlite 80.24% <21.91%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset-frontend/src/views/CRUD/types.ts 100.00% <ø> (ø)
superset-frontend/src/views/CRUD/utils.tsx 60.60% <ø> (ø)
superset/utils/mock_data.py 25.00% <20.83%> (ø)
.../src/views/CRUD/data/savedquery/SavedQueryList.tsx 74.43% <80.00%> (+0.70%) ⬆️
superset-frontend/src/dashboard/actions/hydrate.js 85.84% <100.00%> (ø)
...rset-frontend/src/dashboard/util/findPermission.ts 100.00% <100.00%> (ø)
superset/examples/big_data.py 35.00% <100.00%> (ø)
superset-frontend/src/filters/utils.ts 95.23% <0.00%> (-4.77%) ⬇️
...dashboard/components/SliceHeaderControls/index.jsx 78.35% <0.00%> (-1.04%) ⬇️
...erset-frontend/src/dashboard/components/Header.jsx
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21f973f...f4d1242. Read the comment docs.

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing db migration performance with this creative approach! I have probably more questions rather than feedbacks on this one...

requirements/development.txt Outdated Show resolved Hide resolved
superset/utils/data.py Show resolved Hide resolved
superset/utils/data.py Outdated Show resolved Hide resolved
scripts/benchmark_migration.py Outdated Show resolved Hide resolved
superset/utils/data.py Outdated Show resolved Hide resolved
scripts/benchmark_migration.py Outdated Show resolved Hide resolved
scripts/benchmark_migration.py Outdated Show resolved Hide resolved
scripts/benchmark_migration.py Show resolved Hide resolved
Copy link
Member

@robdiciuccio robdiciuccio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Will this destroy data currently in the local DB for the affected tables?
  • Suggest boosting the test case row counts to something higher, perhaps 100, 10K, 100K to better simulate diverse use cases

requirements/base.txt Outdated Show resolved Hide resolved
scripts/benchmark_migration.py Show resolved Hide resolved
superset/utils/data.py Outdated Show resolved Hide resolved
superset/utils/data.py Outdated Show resolved Hide resolved
superset/utils/data.py Show resolved Hide resolved
superset/utils/data.py Show resolved Hide resolved
scripts/benchmark_migration.py Show resolved Hide resolved
@betodealmeida betodealmeida force-pushed the ch6269 branch 4 times, most recently from 96d9efd to b65f988 Compare April 14, 2021 23:23
@betodealmeida betodealmeida force-pushed the ch6269 branch 2 times, most recently from 6a46eab to ad7c267 Compare April 15, 2021 00:42
@betodealmeida betodealmeida merged commit c1cb361 into apache:master Apr 15, 2021
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 29, 2021
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset-io size/L 🚢 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants