-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect #10898
fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect #10898
Conversation
WalkthroughThe updates across various modules significantly enhance the metadata ingestion system by introducing an Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Inspector
participant SchemaRetriever
User->>SchemaRetriever: Request schema fields
SchemaRetriever->>Inspector: Inspect column types
Inspector->>SchemaRetriever: Return inspected data
SchemaRetriever-->>User: Return schema fields
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
@@ -641,7 +642,7 @@ def _get_direct_raw_col_upstreams( | |||
|
|||
# Parse the column name out of the node name. | |||
# Sqlglot calls .sql(), so we have to do the inverse. | |||
normalized_col = sqlglot.parse_one(node.name).this.name | |||
normalized_col = sqlglot.parse_one(node.name, dialect=dialect).this.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a test case that would have failed before but works with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to reproduce the issue we talked about in context of the SAP HANA view parsing and I think it wasn't caused by parse_one method: As the identifiers are qualified using the optimize method - and more important with the identify parameter set to true - the name of the node should always be correct, including the capitalization.
For the call to the sql method in the to_node method the dialect is also not provided (https://github.com/tobymao/sqlglot/blob/5df3f5292488df6a8e21abf3b49086c823797e78/sqlglot/lineage.py#L234 and https://github.com/tobymao/sqlglot/blob/5df3f5292488df6a8e21abf3b49086c823797e78/sqlglot/lineage.py#L285), therefore the capitalization should also not be changed by the sql method. Not sure if you meant this when you mentioned the sql method in Slack, but I think you are right - everything is correct.
I have also run some performance tests (using timeit) regarding the dialect instance which would implicitly be created by parse_one and there is basically no difference with and without the dialect instance.
edit: I have removed that part from the PR. :-)
FYI also seeing some errors in the tests e.g. Might make sense to make a helper method like |
b5fbc10
to
9a7f61a
Compare
I don't think we should fallback to the repr function in case of an error: The data types which are returned by the reflection methods of SQLAlchemy are "produced" by the corresponding dialect and in general it should be possible to compile these data types by the dialect itself (otherwise the dialect would not produce these data types, right?). The NullType is a special case, which can explicitly not be compiled and will result in a CompileError: "NullType will result in a CompileError if the compiler is asked to render the type itself [...]" (see here: https://docs.sqlalchemy.org/en/20/core/type_api.html#sqlalchemy.types.NullType) I have added a utility function to sqlalchemy_type_converter.py which will return the __visit_name__ of the NullType (which is "null") in case NullType is suppied...I think this is better than having "NullType()" as the native data type. |
9a7f61a
to
17b655d
Compare
17b655d
to
f5eaa01
Compare
f5eaa01
to
4b1f9ba
Compare
if isinstance(column_type, types.NullType): | ||
return types.NullType.__visit_name__ | ||
|
||
return column_type.compile(dialect=inspector.dialect) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should have a try catch around this - purely to ensure that we don't fail broadly if an underlying dialect throws an exception in .compile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a try/except, which will use the visit_name as a fallback and in case the data type is not visitable (which the data type should be, but who knows...) the repr of the data type. I would not expect to ever need this fallback, but to make sure that we are not failing because of the native data type it's probably better to have it.
401014f
to
9eec1d1
Compare
9eec1d1
to
1942fbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (21)
- metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/hive.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (7 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/trino.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (2 hunks)
- metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4 hunks)
- metadata-ingestion/tests/integration/hana/docker-compose.yml (1 hunks)
- metadata-ingestion/tests/integration/hana/hana_mces_golden.json (10 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json (53 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json (9 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json (10 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json (9 hunks)
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (7 hunks)
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json (7 hunks)
- metadata-ingestion/tests/integration/oracle/test_oracle.py (1 hunks)
- metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json (11 hunks)
- metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json (11 hunks)
- metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json (26 hunks)
- metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json (26 hunks)
- metadata-ingestion/tests/integration/trino/trino_mces_golden.json (18 hunks)
Files skipped from review due to trivial changes (3)
- metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json
- metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json
- metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json
Additional comments not posted (106)
metadata-ingestion/tests/integration/hana/docker-compose.yml (1)
9-9
: Verify the necessity of all the removed port mappings.Reducing the port mappings can limit the functionality and accessibility of services interacting with the
testhana
container. Ensure that the removed ports are not required for any critical interactions.metadata-ingestion/tests/integration/oracle/test_oracle.py (1)
27-27
: Verify the necessity and correctness of the lambda function.The lambda function added to the
process
method enforces the return of the string 'NUMBER'. Ensure that this change is necessary for the tests and correctly implemented.metadata-ingestion/src/datahub/ingestion/source/sql/hive.py (2)
172-172
: Verify the correct integration and utilization of theinspector
parameter.The new
inspector
parameter is introduced to theget_schema_fields_for_column
function. Ensure that it is correctly integrated and utilized throughout the function.
178-181
: Verify the correct handling of theinspector
parameter by the superclass method.The call to the superclass method is updated to include the new
inspector
parameter. Ensure that the superclass method correctly handles this parameter and that the change does not introduce any issues.metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4)
8-9
: LGTM! New imports are necessary.The new imports for
Inspector
andVisitable
are required for the added functionality.
181-181
: LGTM! Method signature update is necessary.The addition of the
inspector
parameter enhances the function's capability to handle column types more robustly.
222-225
: LGTM! Appropriate usage of theinspector
parameter.The
inspector
parameter is used correctly to get the native data type in the fallback description.
251-267
: LGTM! Method signature update and new logic are necessary.The addition of the
inspector
parameter and the handling ofNullType
improve the method's robustness. The try/except block ensures graceful handling of compilation errors.metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json (9)
150-150
: LGTM! Standardized representation ofINTEGER
.The
nativeDataType
has been updated from"INTEGER()"
to"INTEGER"
, aligning it with a more conventional format.
162-162
: LGTM! Simplified representation ofVARCHAR(50)
.The
nativeDataType
has been updated from"VARCHAR(length=50)"
to"VARCHAR(50)"
, enhancing clarity and consistency.
174-174
: LGTM! Simplified representation ofVARCHAR(50)
.The
nativeDataType
has been updated from"VARCHAR(length=50)"
to"VARCHAR(50)"
, enhancing clarity and consistency.
186-186
: LGTM! Simplified representation ofVARCHAR(50)
.The
nativeDataType
has been updated from"VARCHAR(length=50)"
to"VARCHAR(50)"
, enhancing clarity and consistency.
198-198
: LGTM! Simplified representation ofVARCHAR(50)
.The
nativeDataType
has been updated from"VARCHAR(length=50)"
to"VARCHAR(50)"
, enhancing clarity and consistency.
210-210
: LGTM! Standardized representation ofFLOAT
.The
nativeDataType
has been updated from"FLOAT()"
to"FLOAT"
, aligning it with a more conventional format.
326-326
: LGTM! Standardized representation ofINTEGER
.The
nativeDataType
has been updated from"INTEGER()"
to"INTEGER"
, aligning it with a more conventional format.
338-338
: LGTM! Simplified representation ofVARCHAR(50)
.The
nativeDataType
has been updated from"VARCHAR(length=50)"
to"VARCHAR(50)"
, enhancing clarity and consistency.
350-350
: LGTM! Standardized representation ofINTEGER
.The
nativeDataType
has been updated from"INTEGER()"
to"INTEGER"
, aligning it with a more conventional format.metadata-ingestion/src/datahub/ingestion/source/sql/trino.py (1)
390-399
: LGTM! Method signature update and internal call update are necessary.The addition of the
inspector
parameter enhances the method's capability to handle column schemas more robustly. The internal call to the superclass method ensures that all necessary arguments are passed correctly.metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1)
502-502
: Ensure that all function calls toget_schema_fields_for_column
match the new signature.The code changes are approved.
However, ensure that all function calls to
get_schema_fields_for_column
match the new signature.Verification successful
Ensure that all function calls to
get_schema_fields_for_column
match the new signature.The code changes are approved. The verification confirms that the new function signature, including the
inspector
parameter, is consistently used across the codebase.
metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
metadata-ingestion/src/datahub/ingestion/source/sql/hive.py
metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
metadata-ingestion/src/datahub/ingestion/source/sql/trino.py
metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type python -A 5 $'get_schema_fields_for_column'Length of output: 83
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --glob '*.py' -A 5 $'get_schema_fields_for_column'Length of output: 5961
metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json (9)
455-455
: LGTM! The change aligns with standard SQL syntax.The
nativeDataType
has been correctly modified fromVARCHAR(length=500)
toVARCHAR(500)
.
467-467
: LGTM! The change aligns with standard SQL syntax.The
nativeDataType
has been correctly modified fromVARCHAR(length=200)
toVARCHAR(200)
.
479-479
: LGTM! The change aligns with standard SQL syntax.The
nativeDataType
has been correctly modified fromBIGINT()
toBIGINT
.
491-491
: LGTM! The change aligns with standard SQL syntax.The
nativeDataType
has been correctly modified fromTEXT()
toTEXT
.
503-503
: LGTM! The change aligns with standard SQL syntax.The
nativeDataType
has been correctly modified fromTEXT()
toTEXT
.
515-515
: LGTM! The change aligns with standard SQL syntax.The
nativeDataType
has been correctly modified fromTIMESTAMP()
toTIMESTAMP WITHOUT TIME ZONE
.
527-527
: LGTM! The change aligns with standard SQL syntax.The
nativeDataType
has been correctly modified fromVARCHAR(length=255)
toVARCHAR(255)
.
539-539
: LGTM! The change aligns with standard SQL syntax.The
nativeDataType
has been correctly modified fromVARCHAR(length=255)
toVARCHAR(255)
.
551-551
: LGTM! The change aligns with standard SQL syntax.The
nativeDataType
has been correctly modified fromJSON(astext_type=Text())
toJSON
.metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (2)
472-477
: Ensure that all method calls to_process_projections
match the new signature.The code changes are approved.
However, ensure that all method calls to
_process_projections
match the new signature.Verification successful
All method calls to
_process_projections
match the new signature.The code changes are approved.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all method calls to `_process_projections` match the new signature. # Test: Search for the method usage. Expect: Only occurrences of the new signature. rg --type python -A 5 $'_process_projections'Length of output: 75
Script:
#!/bin/bash # Description: Verify all method calls to `_process_projections` match the new signature. # Test: Search for the method usage. Expect: Only occurrences of the new signature. rg --type py -A 5 $'_process_projections'Length of output: 1279
681-681
: Ensure that all method calls to_process_models
match the new signature.The code changes are approved.
However, ensure that all method calls to
_process_models
match the new signature.Verification successful
All method calls to
_process_models
match the new signature.The code changes are correctly implemented and verified.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all method calls to `_process_models` match the new signature. # Test: Search for the method usage. Expect: Only occurrences of the new signature. rg --type python -A 5 $'_process_models'Length of output: 70
Script:
#!/bin/bash # Description: Verify all method calls to `_process_models` match the new signature. # Test: Search for the method usage. Expect: Only occurrences of the new signature. rg --type py -A 5 $'_process_models'Length of output: 1164
metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json (2)
261-261
: Verify the impact of changingnativeDataType
from"NUMBER(asdecimal=False)"
to"NUMBER"
.Ensure that this change aligns with the expected data type representation across the system and does not introduce any inconsistencies or errors.
Also applies to: 383-383, 508-508, 757-757, 879-879, 1004-1004
Line range hint
1-1008
:
Verify the impact of removing multiple JSON objects.Ensure that the removal of these entities does not affect the data ingestion process and that they are no longer relevant or have been replaced by a different mechanism.
metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py (4)
524-524
: LGTM! But verify the usage of theinspector
parameter in the function body.The addition of the
inspector
parameter to the function signature is approved.Ensure that the
inspector
parameter is utilized correctly within the function body.
524-524
: LGTM! But verify the usage of theinspector
parameter in the function body.The addition of the
inspector
parameter to the function signature is approved.Ensure that the
inspector
parameter is utilized correctly within the function body.
757-759
: LGTM! But verify the usage of theinspector
parameter in the function body.The addition of the
inspector
parameter to the function signature is approved.Ensure that the
inspector
parameter is utilized correctly within the function body.
882-882
: LGTM! But verify the usage of theinspector
parameter in the function body.The addition of the
inspector
parameter to the function signature is approved.Ensure that the
inspector
parameter is utilized correctly within the function body.metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (5)
261-261
: Verify the impact of simplifyingnativeDataType
.The change from
"NUMBER(asdecimal=False)"
to"NUMBER"
improves readability and consistency. Ensure that this change does not affect downstream processing or interpretation of numeric data types.
383-383
: Verify the impact of simplifyingnativeDataType
.The change from
"NUMBER(asdecimal=False)"
to"NUMBER"
improves readability and consistency. Ensure that this change does not affect downstream processing or interpretation of numeric data types.
508-508
: Verify the impact of simplifyingnativeDataType
.The change from
"NUMBER(asdecimal=False)"
to"NUMBER"
improves readability and consistency. Ensure that this change does not affect downstream processing or interpretation of numeric data types.
757-757
: Verify the impact of simplifyingnativeDataType
.The change from
"NUMBER(asdecimal=False)"
to"NUMBER"
improves readability and consistency. Ensure that this change does not affect downstream processing or interpretation of numeric data types.
879-879
: Verify the impact of simplifyingnativeDataType
.The change from
"NUMBER(asdecimal=False)"
to"NUMBER"
improves readability and consistency. Ensure that this change does not affect downstream processing or interpretation of numeric data types.metadata-ingestion/tests/integration/trino/trino_mces_golden.json (6)
259-259
: Approved: Simplified data type representation.The change from
INTEGER()
toINTEGER
aligns with standard SQL data type definitions and simplifies the representation.
271-271
: Approved: Simplified data type representation.The change from
VARCHAR(length=50)
toVARCHAR(50)
aligns with standard SQL data type definitions and simplifies the representation.
283-283
: Approved: Simplified data type representation.The change from
VARCHAR(length=50)
toVARCHAR(50)
aligns with standard SQL data type definitions and simplifies the representation.
295-295
: Approved: Simplified data type representation.The change from
VARCHAR(length=50)
toVARCHAR(50)
aligns with standard SQL data type definitions and simplifies the representation.
307-307
: Approved: Simplified data type representation.The change from
JSON()
toJSON
aligns with standard SQL data type definitions and simplifies the representation.
531-531
: Approved: Simplified data type representation.The change from
DATE()
toDATE
aligns with standard SQL data type definitions and simplifies the representation.metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (6)
123-125
: Approved: Necessary import for native data type handling.The import statement for
get_native_data_type_for_sqlalchemy_type
is necessary for the changes made to handle native data types using SQLAlchemy's inspector.
794-794
: Approved: Enhanced schema field retrieval.The addition of the
inspector
parameter to theget_schema_fields
method call allows for improved schema field retrieval and type handling using SQLAlchemy's inspector.
975-975
: Approved: Enhanced schema field retrieval.The addition of the
inspector
parameter to theget_schema_fields
method signature enhances the method's ability to retrieve and handle schema fields using SQLAlchemy's inspector.
988-988
: Approved: Enhanced schema field retrieval.The addition of the
inspector
parameter to theget_schema_fields_for_column
method call allows for improved schema field retrieval and type handling using SQLAlchemy's inspector.
1000-1000
: Approved: Enhanced schema field retrieval.The addition of the
inspector
parameter to theget_schema_fields_for_column
method signature enhances the method's ability to retrieve and handle schema fields using SQLAlchemy's inspector.
1014-1019
: Approved: Improved native data type handling.The updated logic for determining the
nativeDataType
of a column by using theget_native_data_type_for_sqlalchemy_type
function ensures that the native data type is derived correctly based on the SQLAlchemy type system, enhancing type safety and correctness.metadata-ingestion/tests/integration/hana/hana_mces_golden.json (25)
8-15
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
customProperties
enhances readability and usability.
29-30
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
status
enhances readability and usability.
45-47
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
dataPlatformInstance
enhances readability and usability.
61-64
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
subTypes
enhances readability and usability.
79-80
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
browsePathsV2
enhances readability and usability.
95-102
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
customProperties
enhances readability and usability.
117-118
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
status
enhances readability and usability.
133-135
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
dataPlatformInstance
enhances readability and usability.
149-151
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
subTypes
enhances readability and usability.
167-168
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
container
enhances readability and usability.
183-189
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
browsePathsV2
enhances readability and usability.
204-205
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
container
enhances readability and usability.
Line range hint
238-259
:
Good transition to structured JSON objects and enhanced profiling capabilities.The change from a string-based representation to a structured JSON object for
SchemaMetadata
enhances readability and usability. The detailed statistical data for each field improves the profiling capabilities of the datasets.
341-343
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
subTypes
enhances readability and usability.
359-361
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
domains
enhances readability and usability.
377-387
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
browsePathsV2
enhances readability and usability.
402-403
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
container
enhances readability and usability.
Line range hint
436-457
:
Good transition to structured JSON objects and enhanced profiling capabilities.The change from a string-based representation to a structured JSON object for
SchemaMetadata
enhances readability and usability. The detailed statistical data for each field improves the profiling capabilities of the datasets.
539-541
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
subTypes
enhances readability and usability.
557-559
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
domains
enhances readability and usability.
575-585
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
browsePathsV2
enhances readability and usability.
600-601
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
container
enhances readability and usability.
Line range hint
634-655
:
Good transition to structured JSON objects and enhanced profiling capabilities.The change from a string-based representation to a structured JSON object for
SchemaMetadata
enhances readability and usability. The detailed statistical data for each field improves the profiling capabilities of the datasets.
725-727
: Good transition to structured JSON objects.The change from a string-based representation to a structured JSON object for
subTypes
enhances readability and usability.
743-745
: **Goodmetadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json (12)
150-150
: Simplified data type representation.The change from
INTEGER()
toINTEGER
removes unnecessary parentheses, simplifying the data type representation.
162-162
: Simplified data type representation.The change from
DATE()
toDATE
removes unnecessary parentheses, simplifying the data type representation.
174-174
: More concise VARCHAR length definition.The change from
VARCHAR(length=14)
toVARCHAR(14)
uses a more concise form for defining the length of the VARCHAR type.
186-186
: More concise VARCHAR length definition.The change from
VARCHAR(length=16)
toVARCHAR(16)
uses a more concise form for defining the length of the VARCHAR type.
198-198
: Enhanced consistency in ENUM formatting.The change from
ENUM('M', 'F')
toENUM('M','F')
removes spaces, enhancing consistency in formatting.
221-221
: Simplified data type representation.The change from
DATE()
toDATE
removes unnecessary parentheses, simplifying the data type representation.
337-337
: Simplified data type representation.The change from
INTEGER()
toINTEGER
removes unnecessary parentheses, simplifying the data type representation.
349-349
: Simplified data type representation.The change from
INTEGER()
toINTEGER
removes unnecessary parentheses, simplifying the data type representation.
361-361
: Simplified data type representation.The change from
DATE()
toDATE
removes unnecessary parentheses, simplifying the data type representation.
373-373
: Simplified data type representation.The change from
DATE()
toDATE
removes unnecessary parentheses, simplifying the data type representation.
2459-2459
: Enhanced consistency in SET formatting.The change from
SET('a', 'b', 'c', 'd')
toSET('a','b','c','d')
removes spaces, enhancing consistency in formatting.
2575-2575
: More concise VARCHAR length definition.The change from
VARCHAR(length=50)
toVARCHAR(50)
uses a more concise form for defining the length of the VARCHAR type.metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json (6)
234-234
: Verify the updatedtransient_lastddltime
value.Ensure that the new timestamp
1722106707
is correct and consistent with the expected format and context.
270-270
: Verify the updatednativeDataType
value.Ensure that the new data type
INTEGER
is correct and consistent with the expected format and context.
474-474
: Verify the updatedtransient_lastddltime
value.Ensure that the new timestamp
1722106711
is correct and consistent with the expected format and context.
508-508
: Verify the updatednativeDataType
value.Ensure that the new data type
VARCHAR
is correct and consistent with the expected format and context.
756-756
: Verify the updatedtransient_lastddltime
value.Ensure that the new timestamp
1722106709
is correct and consistent with the expected format and context.
790-790
: Verify the updatednativeDataType
value.Ensure that the new data type
VARCHAR
is correct and consistent with the expected format and context.metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json (10)
247-247
: Verify the consistency oftransient_lastddltime
values.Ensure that the updated timestamp value for
transient_lastddltime
is consistent with other entries and follows the correct format.
283-283
: LGTM! Ensure consistency across all entries.The change from
INTEGER()
toINTEGER
standardizes the data type representation. Verify that similar changes are applied consistently across other entries.
508-508
: Verify the consistency oftransient_lastddltime
values.Ensure that the updated timestamp value for
transient_lastddltime
is consistent with other entries and follows the correct format.
542-542
: LGTM! Ensure consistency across all entries.The change from
VARCHAR()
toVARCHAR
standardizes the data type representation. Verify that similar changes are applied consistently across other entries.
767-767
: Verify the consistency oftransient_lastddltime
values.Ensure that the updated timestamp value for
transient_lastddltime
is consistent with other entries and follows the correct format.
801-801
: LGTM! Ensure consistency across all entries.The change from
VARCHAR()
toVARCHAR
standardizes the data type representation. Verify that similar changes are applied consistently across other entries.
994-994
: Verify the consistency oftransient_lastddltime
values.Ensure that the updated timestamp value for
transient_lastddltime
is consistent with other entries and follows the correct format.
1028-1028
: LGTM! Ensure consistency across all entries.The change from
VARCHAR()
toVARCHAR
standardizes the data type representation. Verify that similar changes are applied consistently across other entries.
1265-1265
: Verify the consistency oftransient_lastddltime
values.Ensure that the updated timestamp value for
transient_lastddltime
is consistent with other entries and follows the correct format.
1299-1299
: LGTM! Ensure consistency across all entries.The change from
VARCHAR()
toVARCHAR
standardizes the data type representation. Verify that similar changes are applied consistently across other entries.
@@ -2,15 +2,11 @@ | |||
version: '3.4' | |||
services: | |||
testhana: | |||
image: "store/saplabs/hanaexpress:2.00.054.00.20210603.1" | |||
image: "saplabs/hanaexpress:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider pinning the Docker image to a specific version.
Using the latest version of the Docker image can introduce variability and potential instability due to untested changes. Pinning to a specific version ensures stability and reproducibility.
- image: "saplabs/hanaexpress:latest"
+ image: "saplabs/hanaexpress:2.00.054.00.20210603.1"
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
image: "saplabs/hanaexpress:latest" | |
image: "saplabs/hanaexpress:2.00.054.00.20210603.1" |
1942fbb
to
dd3f27b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (21)
- metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/hive.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (7 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/trino.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (2 hunks)
- metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4 hunks)
- metadata-ingestion/tests/integration/hana/docker-compose.yml (1 hunks)
- metadata-ingestion/tests/integration/hana/hana_mces_golden.json (10 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json (53 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json (9 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json (10 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json (9 hunks)
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (7 hunks)
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json (7 hunks)
- metadata-ingestion/tests/integration/oracle/test_oracle.py (1 hunks)
- metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json (11 hunks)
- metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json (11 hunks)
- metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json (26 hunks)
- metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json (26 hunks)
- metadata-ingestion/tests/integration/trino/trino_mces_golden.json (18 hunks)
Files skipped from review due to trivial changes (6)
- metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json
- metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json
- metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json
- metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json
- metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json
- metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json
Files skipped from review as they are similar to previous changes (14)
- metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
- metadata-ingestion/src/datahub/ingestion/source/sql/hive.py
- metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py
- metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
- metadata-ingestion/src/datahub/ingestion/source/sql/trino.py
- metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py
- metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
- metadata-ingestion/tests/integration/hana/docker-compose.yml
- metadata-ingestion/tests/integration/hana/hana_mces_golden.json
- metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json
- metadata-ingestion/tests/integration/oracle/test_oracle.py
- metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json
Additional comments not posted (11)
metadata-ingestion/tests/integration/trino/trino_mces_golden.json (11)
259-259
: LGTM! SimplifiednativeDataType
representation.The change from
INTEGER()
toINTEGER
aligns with standard SQL data type representations and improves clarity.
271-271
: LGTM! SimplifiednativeDataType
representation.The change from
VARCHAR(length=50)
toVARCHAR(50)
aligns with standard SQL data type representations and improves clarity.
283-283
: LGTM! SimplifiednativeDataType
representation.The change from
VARCHAR(length=50)
toVARCHAR(50)
aligns with standard SQL data type representations and improves clarity.
295-295
: LGTM! SimplifiednativeDataType
representation.The change from
VARCHAR(length=50)
toVARCHAR(50)
aligns with standard SQL data type representations and improves clarity.
307-307
: LGTM! SimplifiednativeDataType
representation.The change from
JSON()
toJSON
aligns with standard SQL data type representations and improves clarity.
507-507
: LGTM! SimplifiednativeDataType
representation.The change from
INTEGER()
toINTEGER
aligns with standard SQL data type representations and improves clarity.
519-519
: LGTM! SimplifiednativeDataType
representation.The change from
INTEGER()
toINTEGER
aligns with standard SQL data type representations and improves clarity.
531-531
: LGTM! SimplifiednativeDataType
representation.The change from
DATE()
toDATE
aligns with standard SQL data type representations and improves clarity.
543-543
: LGTM! SimplifiednativeDataType
representation.The change from
DATE()
toDATE
aligns with standard SQL data type representations and improves clarity.
726-726
: LGTM! SimplifiednativeDataType
representation.The change from
INTEGER()
toINTEGER
aligns with standard SQL data type representations and improves clarity.
738-738
: LGTM! SimplifiednativeDataType
representation.The change from
VARCHAR(length=50)
toVARCHAR(50)
aligns with standard SQL data type representations and improves clarity.
dd3f27b
to
1a07c6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (22)
- metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/hive.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (7 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/trino.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (2 hunks)
- metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4 hunks)
- metadata-ingestion/tests/integration/hana/docker-compose.yml (1 hunks)
- metadata-ingestion/tests/integration/hana/hana_mces_golden.json (10 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json (53 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json (9 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json (10 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json (9 hunks)
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (7 hunks)
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json (7 hunks)
- metadata-ingestion/tests/integration/oracle/test_oracle.py (1 hunks)
- metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json (11 hunks)
- metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json (11 hunks)
- metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json (26 hunks)
- metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json (26 hunks)
- metadata-ingestion/tests/integration/trino/trino_mces_golden.json (18 hunks)
- metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py (7 hunks)
Files skipped from review due to trivial changes (3)
- metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json
- metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json
- metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json
Files skipped from review as they are similar to previous changes (17)
- metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
- metadata-ingestion/src/datahub/ingestion/source/sql/hive.py
- metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py
- metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
- metadata-ingestion/src/datahub/ingestion/source/sql/trino.py
- metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py
- metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
- metadata-ingestion/tests/integration/hana/docker-compose.yml
- metadata-ingestion/tests/integration/hana/hana_mces_golden.json
- metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json
- metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json
- metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json
- metadata-ingestion/tests/integration/oracle/test_oracle.py
- metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json
- metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json
- metadata-ingestion/tests/integration/trino/trino_mces_golden.json
Additional comments not posted (10)
metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py (5)
48-50
: LGTM! But verify the function usage in the codebase.The addition of the
inspector
parameter and its instantiation as aMagicMock
with aDefaultDialect
is correct.However, ensure that all function calls to
get_schema_fields_for_sqlalchemy_column
match the new signature.Verification successful
All function calls to
get_schema_fields_for_sqlalchemy_column
match the new signature.The addition of the
inspector
parameter and its instantiation as aMagicMock
with aDefaultDialect
is correctly reflected in the function calls throughout the codebase.
metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type python -A 5 $'get_schema_fields_for_sqlalchemy_column'Length of output: 94
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type py -A 5 $'get_schema_fields_for_sqlalchemy_column'Length of output: 6596
86-87
: LGTM! But verify the function usage in the codebase.The addition of the
inspector
parameter and its instantiation as aMagicMock
with aDefaultDialect
is correct.However, ensure that all function calls to
get_schema_fields_for_sqlalchemy_column
match the new signature.Verification successful
Verified: All calls to
get_schema_fields_for_sqlalchemy_column
match the new signature.The addition of the
inspector
parameter and its instantiation as aMagicMock
with aDefaultDialect
is correct and consistent throughout the codebase.
metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py
metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type python -A 5 $'get_schema_fields_for_sqlalchemy_column'Length of output: 94
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type py -A 5 $'get_schema_fields_for_sqlalchemy_column'Length of output: 6596
66-68
: LGTM! But verify the function usage in the codebase.The addition of the
inspector
parameter and its instantiation as aMagicMock
with aDefaultDialect
is correct.However, ensure that all function calls to
get_schema_fields_for_sqlalchemy_column
match the new signature.Verification successful
Function usage verified!
All calls to
get_schema_fields_for_sqlalchemy_column
match the new signature with theinspector
parameter.
metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py
metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type python -A 5 $'get_schema_fields_for_sqlalchemy_column'Length of output: 94
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type py -A 5 'get_schema_fields_for_sqlalchemy_column'Length of output: 6596
111-113
: LGTM! But verify the function usage in the codebase.The addition of the
inspector
parameter and its instantiation as aMagicMock
with aDefaultDialect
is correct.However, ensure that all function calls to
get_schema_fields_for_sqlalchemy_column
match the new signature.Verification successful
Verified: All function calls to
get_schema_fields_for_sqlalchemy_column
match the new signature.The addition of the
inspector
parameter is correctly reflected across the codebase.
metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py
metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type python -A 5 $'get_schema_fields_for_sqlalchemy_column'Length of output: 94
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg -t py -A 5 'get_schema_fields_for_sqlalchemy_column'Length of output: 6592
22-24
: LGTM! But verify the function usage in the codebase.The addition of the
inspector
parameter and its instantiation as aMagicMock
with aDefaultDialect
is correct.However, ensure that all function calls to
get_schema_fields_for_sqlalchemy_column
match the new signature.Verification successful
Function usage verified!
The function
get_schema_fields_for_sqlalchemy_column
is correctly called with theinspector
parameter in the codebase.
- Confirmed occurrences in test cases and other parts of the codebase.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type python -A 5 $'get_schema_fields_for_sqlalchemy_column'Length of output: 94
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Find all Python files and search for the function usage. Expect: Only occurrences of the new signature. fd -e py -x rg 'get_schema_fields_for_sqlalchemy_column' {}Length of output: 620
Script:
#!/bin/bash # Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature. # Find all Python files and search for the function usage with more context lines. Expect: Only occurrences of the new signature. fd -e py -x rg -A 10 -B 2 'get_schema_fields_for_sqlalchemy_column' {}Length of output: 4341
metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (5)
261-261
: Simplification of nativeDataType is correct.The
nativeDataType
field is correctly simplified from"NUMBER(asdecimal=False)"
to"NUMBER"
.
383-383
: Simplification of nativeDataType is correct.The
nativeDataType
field is correctly simplified from"NUMBER(asdecimal=False)"
to"NUMBER"
.
508-508
: Simplification of nativeDataType is correct.The
nativeDataType
field is correctly simplified from"NUMBER(asdecimal=False)"
to"NUMBER"
.
757-757
: Simplification of nativeDataType is correct.The
nativeDataType
field is correctly simplified from"NUMBER(asdecimal=False)"
to"NUMBER"
.
879-879
: Simplification of nativeDataType is correct.The
nativeDataType
field is correctly simplified from"NUMBER(asdecimal=False)"
to"NUMBER"
.
…s by compiling data type using dialect
1a07c6d
to
0e254e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (26)
- metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/hive.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (7 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/trino.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (2 hunks)
- metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4 hunks)
- metadata-ingestion/tests/integration/hana/docker-compose.yml (1 hunks)
- metadata-ingestion/tests/integration/hana/hana_mces_golden.json (10 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json (49 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json (9 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json (10 hunks)
- metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json (9 hunks)
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (7 hunks)
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json (7 hunks)
- metadata-ingestion/tests/integration/oracle/test_oracle.py (1 hunks)
- metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json (11 hunks)
- metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json (11 hunks)
- metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_no_db_to_file.json (24 hunks)
- metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_no_db_with_filter.json (14 hunks)
- metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_to_file.json (14 hunks)
- metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_with_lower_case_urn.json (14 hunks)
- metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json (26 hunks)
- metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json (26 hunks)
- metadata-ingestion/tests/integration/trino/trino_mces_golden.json (18 hunks)
- metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py (7 hunks)
Files skipped from review due to trivial changes (3)
- metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json
- metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json
- metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json
Files skipped from review as they are similar to previous changes (18)
- metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
- metadata-ingestion/src/datahub/ingestion/source/sql/hive.py
- metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py
- metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
- metadata-ingestion/src/datahub/ingestion/source/sql/trino.py
- metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py
- metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
- metadata-ingestion/tests/integration/hana/docker-compose.yml
- metadata-ingestion/tests/integration/hana/hana_mces_golden.json
- metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json
- metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json
- metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json
- metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json
- metadata-ingestion/tests/integration/oracle/test_oracle.py
- metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json
- metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json
- metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py
Additional comments not posted (44)
metadata-ingestion/tests/integration/trino/trino_mces_golden.json (11)
259-259
: Simplified data type representation.The
nativeDataType
for theid
field has been correctly updated to "INTEGER".
271-271
: Simplified data type representation.The
nativeDataType
for thename
field has been correctly updated to "VARCHAR(50)".
283-283
: Simplified data type representation.The
nativeDataType
for theauthor
field has been correctly updated to "VARCHAR(50)".
295-295
: Simplified data type representation.The
nativeDataType
for thepublisher
field has been correctly updated to "VARCHAR(50)".
307-307
: Simplified data type representation.The
nativeDataType
for thetags
field has been correctly updated to "JSON".
507-507
: Simplified data type representation.The
nativeDataType
for thebook_id
field has been correctly updated to "INTEGER".
519-519
: Simplified data type representation.The
nativeDataType
for themember_id
field has been correctly updated to "INTEGER".
531-531
: Simplified data type representation.The
nativeDataType
for theissue_date
field has been correctly updated to "DATE".
543-543
: Simplified data type representation.The
nativeDataType
for thereturn_date
field has been correctly updated to "DATE".
726-726
: Simplified data type representation.The
nativeDataType
for theid
field has been correctly updated to "INTEGER".
738-738
: Simplified data type representation.The
nativeDataType
for thename
field has been correctly updated to "VARCHAR(50)".metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_no_db_with_filter.json (9)
115-119
: Verify the correctness of job metadata fields.Ensure that the new values for
job_id
,job_name
,description
,date_created
, anddate_modified
are accurate and consistent with the rest of the metadata.
1307-1307
: Approved: Data type representation change.The
nativeDataType
for theID
field has been updated toINTEGER
, which aligns with SQL Server best practices.
1319-1319
: Approved: Data type representation change.The
nativeDataType
for theProductName
field has been updated toNVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS
, which aligns with SQL Server best practices.
1549-1549
: Approved: Data type representation change.The
nativeDataType
for theID
field has been updated toINTEGER
, which aligns with SQL Server best practices.
1561-1561
: Approved: Data type representation change.The
nativeDataType
for theItemName
field has been updated toNVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS
, which aligns with SQL Server best practices.
1681-1681
: Approved: Data type representation change.The
nativeDataType
for theID
field has been updated toINTEGER
, which aligns with SQL Server best practices.
1694-1694
: Approved: Data type representation change.The
nativeDataType
for theLastName
field has been updated toVARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS
, which aligns with SQL Server best practices.
1706-1706
: Approved: Data type representation change.The
nativeDataType
for theFirstName
field has been updated toVARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS
, which aligns with SQL Server best practices.
1718-1718
: Approved: Data type representation change.The
nativeDataType
for theAge
field has been updated toINTEGER
, which aligns with SQL Server best practices.metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_to_file.json (8)
115-115
: Update confirmed:job_id
field.The
job_id
field has been updated to"c6fb6778-14f1-4516-bb41-e5eaa97a687b"
, reflecting a new job execution context.
118-119
: Update confirmed:date_created
anddate_modified
fields.The
date_created
field has been updated to"2024-07-27 23:58:29.780000"
and thedate_modified
field has been updated to"2024-07-27 23:58:29.943000"
, reflecting new timestamps for job execution.
1307-1307
: Update confirmed:nativeDataType
forID
field.The
nativeDataType
for theID
field has been updated to"INTEGER"
, aligning with SQL Server's expected data type specifications.
1319-1319
: Update confirmed:nativeDataType
forProductName
field.The
nativeDataType
for theProductName
field has been updated to"NVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS"
, aligning with SQL Server's expected data type specifications.
1549-1549
: Update confirmed:nativeDataType
forID
field inItems
table.The
nativeDataType
for theID
field in theItems
table has been updated to"INTEGER"
, aligning with SQL Server's expected data type specifications.
1561-1561
: Update confirmed:nativeDataType
forItemName
field.The
nativeDataType
for theItemName
field has been updated to"NVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS"
, aligning with SQL Server's expected data type specifications.
1681-1681
: Update confirmed:nativeDataType
forID
field inPersons
table.The
nativeDataType
for theID
field in thePersons
table has been updated to"INTEGER"
, aligning with SQL Server's expected data type specifications.
1694-1694
: Update confirmed:nativeDataType
forLastName
andFirstName
fields.The
nativeDataType
for theLastName
andFirstName
fields has been updated to"VARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS"
, aligning with SQL Server's expected data type specifications.Also applies to: 1706-1706
metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_with_lower_case_urn.json (8)
115-115
: Change approved: Updatedjob_id
.The
job_id
has been updated to a new UUID. This change is straightforward and does not introduce any issues.
118-118
: Change approved: Updateddate_created
.The
date_created
timestamp has been updated to a more recent date. This change is straightforward and does not introduce any issues.
119-119
: Change approved: Updateddate_modified
.The
date_modified
timestamp has been updated to a more recent date. This change is straightforward and does not introduce any issues.
1307-1307
: Change approved: UpdatednativeDataType
forID
field.The
nativeDataType
for theID
field has been updated fromINTEGER()
toINTEGER
. This change improves clarity and compliance with SQL Server data type conventions.
1319-1319
: Change approved: UpdatednativeDataType
forProductName
field.The
nativeDataType
for theProductName
field has been updated fromNVARCHAR()
toNVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS
. This change improves clarity and compliance with SQL Server data type conventions.
1561-1561
: Change approved: UpdatednativeDataType
forItemName
field.The
nativeDataType
for theItemName
field has been updated fromNVARCHAR()
toNVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS
. This change improves clarity and compliance with SQL Server data type conventions.
1694-1694
: Change approved: UpdatednativeDataType
forLastName
andFirstName
fields.The
nativeDataType
for theLastName
andFirstName
fields has been updated fromVARCHAR(length=255, collation='SQL_Latin1_General_CP1_CI_AS')
toVARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS
. This change improves clarity and compliance with SQL Server data type conventions.Also applies to: 1706-1706
1838-1838
: Change approved: UpdatednativeDataType
forSomeId
andTempID
fields.The
nativeDataType
for theSomeId
field has been updated fromUNIQUEIDENTIFIER()
toUNIQUEIDENTIFIER
, and for theTempID
field fromINTEGER()
toINTEGER
. This change improves clarity and compliance with SQL Server data type conventions.Also applies to: 1850-1850
metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_no_db_to_file.json (8)
115-115
: LGTM!The
job_id
has been updated to a new UUID, indicating a new job instance or process.
118-118
: LGTM!The
date_created
has been updated to a more recent date, indicating a refresh of the metadata.
119-119
: LGTM!The
date_modified
has been updated to a more recent date, indicating a refresh of the metadata.
1307-1307
: LGTM!The
nativeDataType
for theID
field has been simplified fromINTEGER()
toINTEGER
, aligning with standard SQL syntax.
1319-1319
: LGTM!The
nativeDataType
for theProductName
field has been updated toNVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS
, aligning with standard SQL syntax and specifying collation.
1549-1549
: LGTM!The
nativeDataType
for theID
field has been simplified fromINTEGER()
toINTEGER
, aligning with standard SQL syntax.
1561-1561
: LGTM!The
nativeDataType
for theItemName
field has been updated toNVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS
, aligning with standard SQL syntax and specifying collation.
1681-1681
: LGTM!The
nativeDataType
for theID
field has been simplified fromINTEGER()
toINTEGER
, aligning with standard SQL syntax.
metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
* feat(forms) Handle deleting forms references when hard deleting forms (datahub-project#10820) * refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2) (datahub-project#10764) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * fix(ingestion/airflow-plugin): pipeline tasks discoverable in search (datahub-project#10819) * feat(ingest/transformer): tags to terms transformer (datahub-project#10758) Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> * fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on (datahub-project#10752) Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> * feat(forms) Add java SDK for form entity PATCH + CRUD examples (datahub-project#10822) * feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples (datahub-project#10823) * feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files (datahub-project#10824) * feat(forms) Add CRUD endpoints to GraphQL for Form entities (datahub-project#10825) * add flag for includeSoftDeleted in scroll entities API (datahub-project#10831) * feat(deprecation) Return actor entity with deprecation aspect (datahub-project#10832) * feat(structuredProperties) Add CRUD graphql APIs for structured property entities (datahub-project#10826) * add scroll parameters to openapi v3 spec (datahub-project#10833) * fix(ingest): correct profile_day_of_week implementation (datahub-project#10818) * feat(ingest/glue): allow ingestion of empty databases from Glue (datahub-project#10666) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(cli): add more details to get cli (datahub-project#10815) * fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (datahub-project#10836) * fix(ingestion): fix datajob patcher (datahub-project#10827) * fix(smoke-test): add suffix in temp file creation (datahub-project#10841) * feat(ingest/glue): add helper method to permit user or group ownership (datahub-project#10784) * feat(): Show data platform instances in policy modal if they are set on the policy (datahub-project#10645) Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com> * docs(patch): add patch documentation for how implementation works (datahub-project#10010) Co-authored-by: John Joyce <john@acryl.io> * fix(jar): add missing custom-plugin-jar task (datahub-project#10847) * fix(): also check exceptions/stack trace when filtering log messages (datahub-project#10391) Co-authored-by: John Joyce <john@acryl.io> * docs(): Update posts.md (datahub-project#9893) Co-authored-by: Hyejin Yoon <0327jane@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * chore(ingest): update acryl-datahub-classify version (datahub-project#10844) * refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI (datahub-project#10828) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(restli): log aspect-not-found as a warning rather than as an error (datahub-project#10834) * fix(ingest/nifi): remove duplicate upstream jobs (datahub-project#10849) * fix(smoke-test): test access to create/revoke personal access tokens (datahub-project#10848) * fix(smoke-test): missing test for move domain (datahub-project#10837) * ci: update usernames to not considered for community (datahub-project#10851) * env: change defaults for data contract visibility (datahub-project#10854) * fix(ingest/tableau): quote special characters in external URL (datahub-project#10842) * fix(smoke-test): fix flakiness of auto complete test * ci(ingest): pin dask dependency for feast (datahub-project#10865) * fix(ingestion/lookml): liquid template resolution and view-to-view cll (datahub-project#10542) * feat(ingest/audit): add client id and version in system metadata props (datahub-project#10829) * chore(ingest): Mypy 1.10.1 pin (datahub-project#10867) * docs: use acryl-datahub-actions as expected python package to install (datahub-project#10852) * docs: add new js snippet (datahub-project#10846) * refactor(ingestion): remove company domain for security reason (datahub-project#10839) * fix(ingestion/spark): Platform instance and column level lineage fix (datahub-project#10843) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingestion/tableau): optionally ingest multiple sites and create site containers (datahub-project#10498) Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com> * fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser (datahub-project#10874) * fix(manage-tokens): fix manage access token policy (datahub-project#10853) * Batch get entity endpoints (datahub-project#10880) * feat(system): support conditional write semantics (datahub-project#10868) * fix(build): upgrade vercel builds to Node 20.x (datahub-project#10890) * feat(ingest/lookml): shallow clone repos (datahub-project#10888) * fix(ingest/looker): add missing dependency (datahub-project#10876) * fix(ingest): only populate audit stamps where accurate (datahub-project#10604) * fix(ingest/dbt): always encode tag urns (datahub-project#10799) * fix(ingest/redshift): handle multiline alter table commands (datahub-project#10727) * fix(ingestion/looker): column name missing in explore (datahub-project#10892) * fix(lineage) Fix lineage source/dest filtering with explored per hop limit (datahub-project#10879) * feat(conditional-writes): misc updates and fixes (datahub-project#10901) * feat(ci): update outdated action (datahub-project#10899) * feat(rest-emitter): adding async flag to rest emitter (datahub-project#10902) Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io> * feat(ingest): add snowflake-queries source (datahub-project#10835) * fix(ingest): improve `auto_materialize_referenced_tags_terms` error handling (datahub-project#10906) * docs: add new company to adoption list (datahub-project#10909) * refactor(redshift): Improve redshift error handling with new structured reporting system (datahub-project#10870) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(ui) Finalize support for all entity types on forms (datahub-project#10915) * Index ExecutionRequestResults status field (datahub-project#10811) * feat(ingest): grafana connector (datahub-project#10891) Co-authored-by: Shirshanka Das <shirshanka@apache.org> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(gms) Add Form entity type to EntityTypeMapper (datahub-project#10916) * feat(dataset): add support for external url in Dataset (datahub-project#10877) * docs(saas-overview) added missing features to observe section (datahub-project#10913) Co-authored-by: John Joyce <john@acryl.io> * fix(ingest/spark): Fixing Micrometer warning (datahub-project#10882) * fix(structured properties): allow application of structured properties without schema file (datahub-project#10918) * fix(data-contracts-web) handle other schedule types (datahub-project#10919) * fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error (datahub-project#10866) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * Add feature flag for view defintions (datahub-project#10914) Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io> * feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction (datahub-project#10884) * fix(airflow): add error handling around render_template() (datahub-project#10907) * feat(ingestion/sqlglot): add optional `default_dialect` parameter to sqlglot lineage (datahub-project#10830) * feat(mcp-mutator): new mcp mutator plugin (datahub-project#10904) * fix(ingest/bigquery): changes helper function to decode unicode scape sequences (datahub-project#10845) * feat(ingest/postgres): fetch table sizes for profile (datahub-project#10864) * feat(ingest/abs): Adding azure blob storage ingestion source (datahub-project#10813) * fix(ingest/redshift): reduce severity of SQL parsing issues (datahub-project#10924) * fix(build): fix lint fix web react (datahub-project#10896) * fix(ingest/bigquery): handle quota exceeded for project.list requests (datahub-project#10912) * feat(ingest): report extractor failures more loudly (datahub-project#10908) * feat(ingest/snowflake): integrate snowflake-queries into main source (datahub-project#10905) * fix(ingest): fix docs build (datahub-project#10926) * fix(ingest/snowflake): fix test connection (datahub-project#10927) * fix(ingest/lookml): add view load failures to cache (datahub-project#10923) * docs(slack) overhauled setup instructions and screenshots (datahub-project#10922) Co-authored-by: John Joyce <john@acryl.io> * fix(airflow): Add comma parsing of owners to DataJobs (datahub-project#10903) * fix(entityservice): fix merging sideeffects (datahub-project#10937) * feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S (datahub-project#10938) Co-authored-by: John Joyce <john@Johns-MBP.lan> * chore() Set a default lineage filtering end time on backend when a start time is present (datahub-project#10925) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> Co-authored-by: John Joyce <john@Johns-MBP.lan> * Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. (datahub-project#10939) * docs: add learning center to docs (datahub-project#10921) * doc: Update hubspot form id (datahub-project#10943) * chore(airflow): add python 3.11 w/ Airflow 2.9 to CI (datahub-project#10941) * fix(ingest/Glue): column upstream lineage between S3 and Glue (datahub-project#10895) * fix(ingest/abs): split abs utils into multiple files (datahub-project#10945) * doc(ingest/looker): fix doc for sql parsing documentation (datahub-project#10883) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(ingest/bigquery): Adding missing BigQuery types (datahub-project#10950) * fix(ingest/setup): feast and abs source setup (datahub-project#10951) * fix(connections) Harden adding /gms to connections in backend (datahub-project#10942) * feat(siblings) Add flag to prevent combining siblings in the UI (datahub-project#10952) * fix(docs): make graphql doc gen more automated (datahub-project#10953) * feat(ingest/athena): Add option for Athena partitioned profiling (datahub-project#10723) * fix(spark-lineage): default timeout for future responses (datahub-project#10947) * feat(datajob/flow): add environment filter using info aspects (datahub-project#10814) * fix(ui/ingest): correct privilege used to show tab (datahub-project#10483) Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com> * feat(ingest/looker): include dashboard urns in browse v2 (datahub-project#10955) * add a structured type to batchGet in OpenAPI V3 spec (datahub-project#10956) * fix(ui): scroll on the domain sidebar to show all domains (datahub-project#10966) * fix(ingest/sagemaker): resolve incorrect variable assignment for SageMaker API call (datahub-project#10965) * fix(airflow/build): Pinning mypy (datahub-project#10972) * Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in datahub-project#10939. (datahub-project#10974) * fix(ingest/test): Fix for mssql integration tests (datahub-project#10978) * fix(entity-service) exist check correctly extracts status (datahub-project#10973) * fix(structuredProps) casing bug in StructuredPropertiesValidator (datahub-project#10982) * bugfix: use anyOf instead of allOf when creating references in openapi v3 spec (datahub-project#10986) * fix(ui): Remove ant less imports (datahub-project#10988) * feat(ingest/graph): Add get_results_by_filter to DataHubGraph (datahub-project#10987) * feat(ingest/cli): init does not actually support environment variables (datahub-project#10989) * fix(ingest/graph): Update get_results_by_filter graphql query (datahub-project#10991) * feat(ingest/spark): Promote beta plugin (datahub-project#10881) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingest): support domains in meta -> "datahub" section (datahub-project#10967) * feat(ingest): add `check server-config` command (datahub-project#10990) * feat(cli): Make consistent use of DataHubGraphClientConfig (datahub-project#10466) Deprecates get_url_and_token() in favor of a more complete option: load_graph_config() that returns a full DatahubClientConfig. This change was then propagated across previous usages of get_url_and_token so that connections to DataHub server from the client respect the full breadth of configuration specified by DatahubClientConfig. I.e: You can now specify disable_ssl_verification: true in your ~/.datahubenv file so that all cli functions to the server work when ssl certification is disabled. Fixes datahub-project#9705 * fix(ingest/s3): Fixing container creation when there is no folder in path (datahub-project#10993) * fix(ingest/looker): support platform instance for dashboards & charts (datahub-project#10771) * feat(ingest/bigquery): improve handling of information schema in sql parser (datahub-project#10985) * feat(ingest): improve `ingest deploy` command (datahub-project#10944) * fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups (datahub-project#10920) - allow excluding soft-deleted entities in relationship-queries - exclude soft-deleted members of groups * fix(ingest/looker): downgrade missing chart type log level (datahub-project#10996) * doc(acryl-cloud): release docs for 0.3.4.x (datahub-project#10984) Co-authored-by: John Joyce <john@acryl.io> Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Pedro Silva <pedro@acryl.io> * fix(protobuf/build): Fix protobuf check jar script (datahub-project#11006) * fix(ui/ingest): Support invalid cron jobs (datahub-project#10998) * fix(ingest): fix graph config loading (datahub-project#11002) Co-authored-by: Pedro Silva <pedro@acryl.io> * feat(docs): Document __DATAHUB_TO_FILE_ directive (datahub-project#10968) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI (datahub-project#11011) * feat(ece): support custom ownership type urns in ECE generation (datahub-project#10999) * feat(assertion-v2): changed Validation tab to Quality and created new Governance tab (datahub-project#10935) * fix(ingestion/glue): Add support for missing config options for profiling in Glue (datahub-project#10858) * feat(propagation): Add models for schema field docs, tags, terms (datahub-project#2959) (datahub-project#11016) Co-authored-by: Chris Collins <chriscollins3456@gmail.com> * docs: standardize terminology to DataHub Cloud (datahub-project#11003) * fix(ingestion/transformer): replace the externalUrl container (datahub-project#11013) * docs(slack) troubleshoot docs (datahub-project#11014) * feat(propagation): Add graphql API (datahub-project#11030) Co-authored-by: Chris Collins <chriscollins3456@gmail.com> * feat(propagation): Add models for Action feature settings (datahub-project#11029) * docs(custom properties): Remove duplicate from sidebar (datahub-project#11033) * feat(models): Introducing Dataset Partitions Aspect (datahub-project#10997) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * feat(propagation): Add Documentation Propagation Settings (datahub-project#11038) * fix(models): chart schema fields mapping, add dataHubAction entity, t… (datahub-project#11040) * fix(ci): smoke test lint failures (datahub-project#11044) * docs: fix learning center color scheme & typo (datahub-project#11043) * feat: add cloud main page (datahub-project#11017) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> * feat(restore-indices): add additional step to also clear system metadata service (datahub-project#10662) Co-authored-by: John Joyce <john@acryl.io> * docs: fix typo (datahub-project#11046) * fix(lint): apply spotless (datahub-project#11050) * docs(airflow): example query to get datajobs for a dataflow (datahub-project#11034) * feat(cli): Add run-id option to put sub-command (datahub-project#11023) Adds an option to assign run-id to a given put command execution. This is useful when transformers do not exist for a given ingestion payload, we can follow up with custom metadata and assign it to an ingestion pipeline. * fix(ingest): improve sql error reporting calls (datahub-project#11025) * fix(airflow): fix CI setup (datahub-project#11031) * feat(ingest/dbt): add experimental `prefer_sql_parser_lineage` flag (datahub-project#11039) * fix(ingestion/lookml): enable stack-trace in lookml logs (datahub-project#10971) * (chore): Linting fix (datahub-project#11015) * chore(ci): update deprecated github actions (datahub-project#10977) * Fix ALB configuration example (datahub-project#10981) * chore(ingestion-base): bump base image packages (datahub-project#11053) * feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size (datahub-project#11051) * fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag (datahub-project#11008) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(ingestion/powerbi): fix issue with broken report lineage (datahub-project#10910) * feat(ingest/tableau): add retry on timeout (datahub-project#10995) * change generate kafka connect properties from env (datahub-project#10545) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * fix(ingest): fix oracle cronjob ingestion (datahub-project#11001) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * chore(ci): revert update deprecated github actions (datahub-project#10977) (datahub-project#11062) * feat(ingest/dbt-cloud): update metadata_endpoint inference (datahub-project#11041) * build: Reduce size of datahub-frontend-react image by 50-ish% (datahub-project#10878) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py (datahub-project#11063) * docs(ingest): update developing-a-transformer.md (datahub-project#11019) * feat(search-test): update search tests from datahub-project#10408 (datahub-project#11056) * feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped (datahub-project#11009) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * docs(airflow): update min version for plugin v2 (datahub-project#11065) * doc(ingestion/tableau): doc update for derived permission (datahub-project#11054) Co-authored-by: Pedro Silva <pedro.cls93@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(py): remove dep on types-pkg_resources (datahub-project#11076) * feat(ingest/mode): add option to exclude restricted (datahub-project#11081) * fix(ingest): set lastObserved in sdk when unset (datahub-project#11071) * doc(ingest): Update capabilities (datahub-project#11072) * chore(vulnerability): Log Injection (datahub-project#11090) * chore(vulnerability): Information exposure through a stack trace (datahub-project#11091) * chore(vulnerability): Comparison of narrow type with wide type in loop condition (datahub-project#11089) * chore(vulnerability): Insertion of sensitive information into log files (datahub-project#11088) * chore(vulnerability): Risky Cryptographic Algorithm (datahub-project#11059) * chore(vulnerability): Overly permissive regex range (datahub-project#11061) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix: update customer data (datahub-project#11075) * fix(models): fixing the datasetPartition models (datahub-project#11085) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error (datahub-project#11084) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * feat(docs-site): hiding learn more from cloud page (datahub-project#11097) * fix(docs): Add correct usage of orFilters in search API docs (datahub-project#11082) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> * fix(ingest/mode): Regexp in mode name matcher didn't allow underscore (datahub-project#11098) * docs: Refactor customer stories section (datahub-project#10869) Co-authored-by: Jeff Merrick <jeff@wireform.io> * fix(release): fix full/slim suffix on tag (datahub-project#11087) * feat(config): support alternate hashing algorithm for doc id (datahub-project#10423) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> Co-authored-by: John Joyce <john@acryl.io> * fix(emitter): fix typo in get method of java kafka emitter (datahub-project#11007) * fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect (datahub-project#10898) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * chore: Update contributors list in PR labeler (datahub-project#11105) * feat(ingest): tweak stale entity removal messaging (datahub-project#11064) * fix(ingestion): enforce lastObserved timestamps in SystemMetadata (datahub-project#11104) * fix(ingest/powerbi): fix broken lineage between chart and dataset (datahub-project#11080) * feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view (datahub-project#11069) * docs: update graphql docs on forms & structured properties (datahub-project#11100) * test(search): search openAPI v3 test (datahub-project#11049) * fix(ingest/tableau): prevent empty site content urls (datahub-project#11057) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(entity-client): implement client batch interface (datahub-project#11106) * fix(snowflake): avoid reporting warnings/info for sys tables (datahub-project#11114) * fix(ingest): downgrade column type mapping warning to info (datahub-project#11115) * feat(api): add AuditStamp to the V3 API entity/aspect response (datahub-project#11118) * fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… (datahub-project#11111) * fix(entiy-client): handle null entityUrn case for restli (datahub-project#11122) * fix(sql-parser): prevent bad urns from alter table lineage (datahub-project#11092) * fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set (datahub-project#11121) * fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper (datahub-project#10366) * feat(ui): Changes to allow editable dataset name (datahub-project#10608) Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com> * fix: remove saxo (datahub-project#11127) * feat(mcl-processor): Update mcl processor hooks (datahub-project#11134) * fix(openapi): fix openapi v2 endpoints & v3 documentation update * Revert "fix(openapi): fix openapi v2 endpoints & v3 documentation update" This reverts commit 573c1cb. * docs(policies): updates to policies documentation (datahub-project#11073) * fix(openapi): fix openapi v2 and v3 docs update (datahub-project#11139) * feat(auth): grant type and acr values custom oidc parameters support (datahub-project#11116) * fix(mutator): mutator hook fixes (datahub-project#11140) * feat(search): support sorting on multiple fields (datahub-project#10775) * feat(ingest): various logging improvements (datahub-project#11126) * fix(ingestion/lookml): fix for sql parsing error (datahub-project#11079) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(docs-site) cloud page spacing and content polishes (datahub-project#11141) * feat(ui) Enable editing structured props on fields (datahub-project#11042) * feat(tests): add md5 and last computed to testResult model (datahub-project#11117) * test(openapi): openapi regression smoke tests (datahub-project#11143) * fix(airflow): fix tox tests + update docs (datahub-project#11125) * docs: add chime to adoption stories (datahub-project#11142) * fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 (datahub-project#11158) * fix(kafka-setup): add missing script to image (datahub-project#11190) * fix(config): fix hash algo config (datahub-project#11191) * test(smoke-test): updates to smoke-tests (datahub-project#11152) * fix(elasticsearch): refactor idHashAlgo setting (datahub-project#11193) * chore(kafka): kafka version bump (datahub-project#11211) * readd UsageStatsWorkUnit * fix merge problems * change logo --------- Co-authored-by: Chris Collins <chriscollins3456@gmail.com> Co-authored-by: John Joyce <john@acryl.io> Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> Co-authored-by: dushayntAW <158567391+dushayntAW@users.noreply.github.com> Co-authored-by: sagar-salvi-apptware <159135491+sagar-salvi-apptware@users.noreply.github.com> Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> Co-authored-by: Kevin Chun <kevin1chun@gmail.com> Co-authored-by: jordanjeremy <72943478+jordanjeremy@users.noreply.github.com> Co-authored-by: skrydal <piotr.skrydalewicz@gmail.com> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> Co-authored-by: sid-acryl <155424659+sid-acryl@users.noreply.github.com> Co-authored-by: Julien Jehannet <80408664+aviv-julienjehannet@users.noreply.github.com> Co-authored-by: Hendrik Richert <github@richert.li> Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com> Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com> Co-authored-by: Felix Lüdin <13187726+Masterchen09@users.noreply.github.com> Co-authored-by: Pirry <158024088+chardaway@users.noreply.github.com> Co-authored-by: Hyejin Yoon <0327jane@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: cburroughs <chris.burroughs@gmail.com> Co-authored-by: ksrinath <ksrinath@users.noreply.github.com> Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com> Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com> Co-authored-by: Shirshanka Das <shirshanka@apache.org> Co-authored-by: ipolding-cais <155455744+ipolding-cais@users.noreply.github.com> Co-authored-by: Tamas Nemeth <treff7es@gmail.com> Co-authored-by: Shubham Jagtap <132359390+shubhamjagtap639@users.noreply.github.com> Co-authored-by: haeniya <yanik.haeni@gmail.com> Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com> Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com> Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io> Co-authored-by: 808OVADOZE <52988741+shtephlee@users.noreply.github.com> Co-authored-by: noggi <anton.kuraev@acryl.io> Co-authored-by: Nicholas Pena <npena@foursquare.com> Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> Co-authored-by: ethan-cartwright <ethan.cartwright.m@gmail.com> Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io> Co-authored-by: Nadav Gross <33874964+nadavgross@users.noreply.github.com> Co-authored-by: Patrick Franco Braz <patrickfbraz@poli.ufrj.br> Co-authored-by: pie1nthesky <39328908+pie1nthesky@users.noreply.github.com> Co-authored-by: Joel Pinto Mata (KPN-DSH-DEX team) <130968841+joelmataKPN@users.noreply.github.com> Co-authored-by: Ellie O'Neil <110510035+eboneil@users.noreply.github.com> Co-authored-by: Ajoy Majumdar <ajoymajumdar@hotmail.com> Co-authored-by: deepgarg-visa <149145061+deepgarg-visa@users.noreply.github.com> Co-authored-by: Tristan Heisler <tristankheisler@gmail.com> Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io> Co-authored-by: Davi Arnaut <davi.arnaut@acryl.io> Co-authored-by: Pedro Silva <pedro@acryl.io> Co-authored-by: amit-apptware <132869468+amit-apptware@users.noreply.github.com> Co-authored-by: Sam Black <sam.black@acryl.io> Co-authored-by: Raj Tekal <varadaraj_tekal@optum.com> Co-authored-by: Steffen Grohsschmiedt <gitbhub@steffeng.eu> Co-authored-by: jaegwon.seo <162448493+wornjs@users.noreply.github.com> Co-authored-by: Renan F. Lima <51028757+lima-renan@users.noreply.github.com> Co-authored-by: Matt Exchange <xkollar@users.noreply.github.com> Co-authored-by: Jonny Dixon <45681293+acrylJonny@users.noreply.github.com> Co-authored-by: Pedro Silva <pedro.cls93@gmail.com> Co-authored-by: Pinaki Bhattacharjee <pinakipb2@gmail.com> Co-authored-by: Jeff Merrick <jeff@wireform.io> Co-authored-by: skrydal <piotr.skrydalewicz@acryl.io> Co-authored-by: AndreasHegerNuritas <163423418+AndreasHegerNuritas@users.noreply.github.com> Co-authored-by: jayasimhankv <145704974+jayasimhankv@users.noreply.github.com> Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com> Co-authored-by: David Leifker <david.leifker@acryl.io>
@hsheth2 As discussed yesterday - here is the (draft) PR for you to check the issue with the native data types in the SQLAlchemy sources. 😊
Checklist
Summary by CodeRabbit
New Features
Inspector
parameter.Bug Fixes
nativeDataType
fields across various integrations by removing unnecessary parentheses and attributes.Documentation
Chores