Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat and perf: Add support for Multiple Programmatic Descriptions and Table level Badges #1903

Conversation

VoLKyyyOG
Copy link

@VoLKyyyOG VoLKyyyOG commented Jun 15, 2022

Summary of Changes

Hey team, thanks for reviewing this PR! This is a newer PR with the correct sign-off.

The motivation originates from #1828 which makes the original code quite convoluted and creates duplicated nodes/records. However, with #1877 being an epic speed-up to the original Neo4JCSV process (hopefully the PR is merged soon), the original naive method of duplicating TableMetadata for multiple programmatic descriptions does not work.

The changes here aim to mitigate this issue and allow multiple programmatic descriptions to be added and also benefit from the significantly reduced runtime using APOC.

Change Log:

  • Added support Table level Badges
  • Added support for multiple programmatic descriptions to reduce duplicate calls to TableMetadata
  • Added a sample script under example/scripts/sample_base_postgres_metadata_extractor for examples
  • Added some extra comments that I felt were useful when reading through the code (they were quite sparse)
  • In future, it may be worth deprecating the is_view flag and use the new badges argument instead (as its generalised). We've removed the flag from our prod instance without issues.

Example Changes:
(all sensitive info has been removed)

  • Table Badges

image

  • Programmatic Descriptions

image

Tests

The code has been tested and deployed in our prod environment and I am more than happy to have a quick knowledge share on my changes / tests. This consists of over 40 thousand tables and 500,000 thousand columns.

However testing from the core repo side would be appreciated as our Amundsen is greatly modified and as such, our environment will not capture all test cases.

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Signed-off-by: Akira Takihara Wang <akiraw@squareup.com>
@VoLKyyyOG VoLKyyyOG requested a review from a team as a code owner June 15, 2022 01:48
@boring-cyborg boring-cyborg bot added area:databuilder From databuilder folder category:models labels Jun 15, 2022
@VoLKyyyOG
Copy link
Author

@chonyy fixed it!

@VoLKyyyOG
Copy link
Author

VoLKyyyOG commented Jun 15, 2022

Hmm looks like some issue with None being allowed through. Will resolve sometime soon.

Signed-off-by: Akira Takihara Wang <akiraw@squareup.com>
@stale
Copy link

stale bot commented Jun 29, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added stale and removed stale labels Jun 29, 2022
@stale
Copy link

stale bot commented Jul 14, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Jul 14, 2022
@feng-tao feng-tao added the keep fresh Disables stalebot from closing an issue label Jul 14, 2022
@feng-tao
Copy link
Member

@kristenarmes @dkunitsk @allisonsuarez could you all take a look?

@stale stale bot removed stale labels Jul 14, 2022
This reverts commit 98e2580.
Signed-off-by: Akira Takihara Wang <akiraw@squareup.com>
Signed-off-by: Akira Takihara Wang <akiraw@squareup.com>
Signed-off-by: Akira Takihara Wang <akiraw@squareup.com>
Signed-off-by: Akira Takihara Wang <akiraw@squareup.com>
This reverts commit 98e2580.

Signed-off-by: Akira Takihara Wang <akiraw@squareup.com>
Signed-off-by: Akira Takihara Wang <akiraw@squareup.com>
@VoLKyyyOG VoLKyyyOG closed this Jul 27, 2022
@VoLKyyyOG VoLKyyyOG deleted the feature/add_table_badges_and_programmatic_desc branch July 27, 2022 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:databuilder From databuilder folder keep fresh Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants