Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Instrument test_yaml_config() #2981

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
079179a
Add usage stats messages to test_yaml_config
anthonyburdi Jun 18, 2021
c0b8fc2
Add "data_context.test_yaml_config" event to schema
anthonyburdi Jun 18, 2021
341e1bb
Test example "data_context.test_yaml_config" events
anthonyburdi Jun 18, 2021
2659616
Add ERRONEOUS_CONFIG and CUSTOM_CONFIG messages for test_yaml_config
anthonyburdi Jun 18, 2021
6fcf9cf
Add ERRONEOUS_CONFIG and CUSTOM_CONFIG messages for test_yaml_config
anthonyburdi Jun 18, 2021
f697350
Add to existing tests to ensure usage stats messages are sent for tes…
anthonyburdi Jun 21, 2021
a529e14
Add to existing tests to ensure usage stats messages are sent for tes…
anthonyburdi Jun 21, 2021
a2f55f0
WIP add "diagnostic_info" and standardize supported types in test_yam…
anthonyburdi Jun 22, 2021
eea80f4
Merge branch 'develop' into MAINTENANCE/GEA-1/GEA-7/update_usage_stat…
anthonyburdi Jun 23, 2021
1698d54
Add diagnostic message types
anthonyburdi Jun 24, 2021
e6caaa0
Merge branch 'develop' into MAINTENANCE/GEA-1/GEA-7/update_usage_stat…
anthonyburdi Jun 24, 2021
c563cf3
Use anonymizers in test_yaml_config()
anthonyburdi Jun 25, 2021
1fce61d
Update schemas for new `data_context.test_yaml_config` event
anthonyburdi Jun 25, 2021
48f03da
WIP test updates
anthonyburdi Jun 25, 2021
92ccbfa
Use Anonymizers to determine if a custom class is a subclass of a cor…
anthonyburdi Jun 30, 2021
7ae669d
Handle SimpleSqlalchemyDatasource in datasource_anonymizer
anthonyburdi Jun 30, 2021
53dc535
Handle anonymized_name
anthonyburdi Jun 30, 2021
1d3089a
Handle subclasses of core GE types
anthonyburdi Jun 30, 2021
81e8e47
Handle SimpleSqlalchemyDatasource
anthonyburdi Jun 30, 2021
0a3b913
Custom expectation_suite fixture plugin for tests
anthonyburdi Jun 30, 2021
de2bf15
diagnostic_info is a list, add parent_class on error if class_name is…
anthonyburdi Jun 30, 2021
15dea73
Align event fixtures with anonymizer output
anthonyburdi Jun 30, 2021
45ebf85
WIP add test_yaml_config usage_stats tests not captured elsewhere
anthonyburdi Jun 30, 2021
ad4fdc5
Rename to expectations_store, cleanup
anthonyburdi Jun 30, 2021
cad87bd
test when no class name is provided for custom class
anthonyburdi Jun 30, 2021
f29d85d
test when custom type is not a core ge class or subclass thereof
anthonyburdi Jun 30, 2021
a101209
test for usage_stats when SimpleSqlalchemyDatasource is subclassed
anthonyburdi Jun 30, 2021
e5c1df0
Merge branch 'develop' into MAINTENANCE/GEA-1/GEA-7/update_usage_stat…
anthonyburdi Jun 30, 2021
1197b87
Ignore custom v2 Datasources
anthonyburdi Jun 30, 2021
3f8b8d9
Custom v2 Datasources supported
anthonyburdi Jun 30, 2021
26696e5
Fix test_checkpoint
anthonyburdi Jun 30, 2021
b30310c
Update usage stats messages & schema tests for Anonymizer message schema
anthonyburdi Jul 1, 2021
55d89ed
Fix test_data_context.py
anthonyburdi Jul 1, 2021
6676b2f
Merge branch 'develop' into MAINTENANCE/GEA-1/GEA-7/update_usage_stat…
anthonyburdi Jul 1, 2021
2bd6519
Linting fixtures
anthonyburdi Jul 1, 2021
699217b
test DatasourceAnonymizer
anthonyburdi Jul 1, 2021
d847701
Remove self_check from MyCustomV2ApiDatasource and related changes
anthonyburdi Jul 1, 2021
82261a6
Merge branch 'develop' into MAINTENANCE/GEA-1/GEA-7/update_usage_stat…
anthonyburdi Jul 1, 2021
9929203
changelog
anthonyburdi Jul 1, 2021
41b7f09
add test_anonymizer__is_parent_class_recognized
anthonyburdi Jul 1, 2021
f8e4e27
fix test_test_yaml_config_usage_stats_simple_sqlalchemy_datasource_su…
anthonyburdi Jul 1, 2021
fd9cadf
Merge branch 'develop' into MAINTENANCE/GEA-1/GEA-7/update_usage_stat…
anthonyburdi Jul 6, 2021
522d2af
Add comments per Will's suggestions
anthonyburdi Jul 6, 2021
f5741ea
Linting fixtures
anthonyburdi Jul 6, 2021
b2adada
Merge branch 'develop' into MAINTENANCE/GEA-1/GEA-7/update_usage_stat…
anthonyburdi Jul 6, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs_rtd/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Develop
- Addition of the "bootstrap" mode of parameter estimation (default) to NumericMetricRangeMultiBatchParameterBuilder
- Initial documentation
* [BUGFIX] Modify read_excel() to handle new optional-dependency openpyxl for pandas >= 1.3.0 #2989

* [MAINTENANCE] Instrumented BaseDataContext.test_yaml_config() and updated Anonymizers

0.13.21
-----------------
Expand Down
34 changes: 34 additions & 0 deletions great_expectations/core/usage_statistics/anonymizers/anonymizer.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import logging
from hashlib import md5
from typing import Optional

from great_expectations.util import load_class

Expand Down Expand Up @@ -67,3 +68,36 @@ def anonymize_object_info(
anonymized_info_dict["anonymized_class"] = self.anonymize(object_class_name)

return anonymized_info_dict

def _is_parent_class_recognized(
self,
classes_to_check,
object_=None,
object_class=None,
object_config=None,
) -> Optional[str]:
"""
Check if the parent class is a subclass of any core GE class.
This private method is intended to be used by anonymizers in a public `is_parent_class_recognized()` method. These anonymizers define and provide the core GE classes_to_check.
Returns:
The name of the parent class found, or None if no parent class was found
"""
assert (
object_ or object_class or object_config
), "Must pass either object_ or object_class or object_config."
try:
if object_class is None and object_ is not None:
object_class = object_.__class__
elif object_class is None and object_config is not None:
object_class_name = object_config.get("class_name")
object_module_name = object_config.get("module_name")
object_class = load_class(object_class_name, object_module_name)

for class_to_check in classes_to_check:
if issubclass(object_class, class_to_check):
return class_to_check.__name__

return None

except AttributeError:
return None
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from great_expectations.checkpoint import Checkpoint, SimpleCheckpoint
from great_expectations.core.usage_statistics.anonymizers.anonymizer import Anonymizer


class CheckpointAnonymizer(Anonymizer):
def __init__(self, salt=None):
super().__init__(salt=salt)

# ordered bottom up in terms of inheritance order
self._ge_classes = [SimpleCheckpoint, Checkpoint]

def anonymize_checkpoint_info(self, name, config):
anonymized_info_dict = dict()
anonymized_info_dict["anonymized_name"] = self.anonymize(name)

self.anonymize_object_info(
anonymized_info_dict=anonymized_info_dict,
ge_classes=self._ge_classes,
object_config=config,
)

return anonymized_info_dict

def is_parent_class_recognized(self, config):
return self._is_parent_class_recognized(
classes_to_check=self._ge_classes,
object_config=config,
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from great_expectations.core.usage_statistics.anonymizers.anonymizer import Anonymizer
from great_expectations.datasource.data_connector import (
ConfiguredAssetFilePathDataConnector,
ConfiguredAssetFilesystemDataConnector,
ConfiguredAssetS3DataConnector,
ConfiguredAssetSqlDataConnector,
DataConnector,
FilePathDataConnector,
InferredAssetFilePathDataConnector,
InferredAssetFilesystemDataConnector,
InferredAssetS3DataConnector,
InferredAssetSqlDataConnector,
RuntimeDataConnector,
)


class DataConnectorAnonymizer(Anonymizer):
def __init__(self, salt=None):
super().__init__(salt=salt)

# This list should contain all DataConnector types. When new DataConnector types
# are created, please make sure to add ordered bottom up in terms of inheritance order
self._ge_classes = [
InferredAssetS3DataConnector,
InferredAssetFilesystemDataConnector,
InferredAssetFilePathDataConnector,
InferredAssetSqlDataConnector,
ConfiguredAssetS3DataConnector,
ConfiguredAssetFilesystemDataConnector,
ConfiguredAssetFilePathDataConnector,
ConfiguredAssetSqlDataConnector,
RuntimeDataConnector,
FilePathDataConnector,
DataConnector,
]
anthonyburdi marked this conversation as resolved.
Show resolved Hide resolved

def anonymize_data_connector_info(self, name, config):
anonymized_info_dict = dict()
anonymized_info_dict["anonymized_name"] = self.anonymize(name)

self.anonymize_object_info(
anonymized_info_dict=anonymized_info_dict,
ge_classes=self._ge_classes,
object_config=config,
)

return anonymized_info_dict

def is_parent_class_recognized(self, config):
return self._is_parent_class_recognized(
classes_to_check=self._ge_classes,
object_config=config,
)
Original file line number Diff line number Diff line change
@@ -1,7 +1,18 @@
from typing import Optional

from great_expectations.core.usage_statistics.anonymizers.anonymizer import Anonymizer
from great_expectations.core.usage_statistics.anonymizers.data_connector_anonymizer import (
DataConnectorAnonymizer,
)
from great_expectations.core.usage_statistics.anonymizers.execution_engine_anonymizer import (
ExecutionEngineAnonymizer,
)
from great_expectations.datasource import (
BaseDatasource,
Datasource,
LegacyDatasource,
PandasDatasource,
SimpleSqlalchemyDatasource,
SparkDFDatasource,
SqlAlchemyDatasource,
)
Expand All @@ -12,21 +23,142 @@ def __init__(self, salt=None):
super().__init__(salt=salt)

# ordered bottom up in terms of inheritance order
self._ge_classes = [
self._legacy_ge_classes = [
PandasDatasource,
SqlAlchemyDatasource,
SparkDFDatasource,
LegacyDatasource,
]

# ordered bottom up in terms of inheritance order
self._ge_classes = [
SimpleSqlalchemyDatasource,
Datasource,
BaseDatasource,
]

self._execution_engine_anonymizer = ExecutionEngineAnonymizer(salt=salt)
self._data_connector_anonymizer = DataConnectorAnonymizer(salt=salt)

def anonymize_datasource_info(self, name, config):
anonymized_info_dict = dict()
anonymized_info_dict["anonymized_name"] = self.anonymize(name)

# Legacy Datasources (<= v0.12 v2 BatchKwargs API)
if self.is_parent_class_recognized_v2_api(config=config) is not None:
self.anonymize_object_info(
anonymized_info_dict=anonymized_info_dict,
ge_classes=self._legacy_ge_classes,
object_config=config,
)
# Datasources (>= v0.13 v3 BatchRequest API), and custom v2 BatchKwargs API
elif self.is_parent_class_recognized_v3_api(config=config) is not None:
self.anonymize_object_info(
anonymized_info_dict=anonymized_info_dict,
ge_classes=self._ge_classes,
object_config=config,
)
execution_engine_config = config.get("execution_engine")
anonymized_info_dict[
"anonymized_execution_engine"
] = self._execution_engine_anonymizer.anonymize_execution_engine_info(
name=execution_engine_config.get("name", ""),
config=execution_engine_config,
)
data_connector_configs = config.get("data_connectors")
anonymized_info_dict["anonymized_data_connectors"] = [
self._data_connector_anonymizer.anonymize_data_connector_info(
name=data_connector_name, config=data_connector_config
)
for data_connector_name, data_connector_config in data_connector_configs.items()
]

return anonymized_info_dict

def anonymize_simple_sqlalchemy_datasource(self, name, config):
"""
SimpleSqlalchemyDatasource requires a separate anonymization scheme.
"""
anonymized_info_dict = dict()
anonymized_info_dict["anonymized_name"] = self.anonymize(name)
if config.get("module_name") is None:
config["module_name"] = "great_expectations.datasource"
self.anonymize_object_info(
anonymized_info_dict=anonymized_info_dict,
ge_classes=self._ge_classes,
object_config=config,
)

# Only and directly provide parent_class of execution engine
anonymized_info_dict["anonymized_execution_engine"] = {
"parent_class": "SqlAlchemyExecutionEngine"
}

# Use the `introspection` and `tables` keys to find data_connectors in SimpleSqlalchemyDatasources
introspection_data_connector_configs = config.get("introspection")
tables_data_connector_configs = config.get("tables")

introspection_data_connector_anonymized_configs = []
if introspection_data_connector_configs is not None:
for (
data_connector_name,
data_connector_config,
) in introspection_data_connector_configs.items():
if data_connector_config.get("class_name") is None:
data_connector_config[
"class_name"
] = "InferredAssetSqlDataConnector"
if data_connector_config.get("module_name") is None:
data_connector_config[
"module_name"
] = "great_expectations.datasource.data_connector"
introspection_data_connector_anonymized_configs.append(
self._data_connector_anonymizer.anonymize_data_connector_info(
name=data_connector_name, config=data_connector_config
)
)

tables_data_connector_anonymized_configs = []
if tables_data_connector_configs is not None:
for (
data_connector_name,
data_connector_config,
) in tables_data_connector_configs.items():
if data_connector_config.get("class_name") is None:
data_connector_config[
"class_name"
] = "ConfiguredAssetSqlDataConnector"
if data_connector_config.get("module_name") is None:
data_connector_config[
"module_name"
] = "great_expectations.datasource.data_connector"
tables_data_connector_anonymized_configs.append(
self._data_connector_anonymizer.anonymize_data_connector_info(
name=data_connector_name, config=data_connector_config
)
)

anonymized_info_dict["anonymized_data_connectors"] = (
introspection_data_connector_anonymized_configs
+ tables_data_connector_anonymized_configs
)

return anonymized_info_dict

def is_parent_class_recognized(self, config) -> Optional[str]:
return self._is_parent_class_recognized(
classes_to_check=self._ge_classes + self._legacy_ge_classes,
object_config=config,
)

def is_parent_class_recognized_v2_api(self, config) -> Optional[str]:
return self._is_parent_class_recognized(
classes_to_check=self._legacy_ge_classes,
object_config=config,
)

def is_parent_class_recognized_v3_api(self, config) -> Optional[str]:
return self._is_parent_class_recognized(
classes_to_check=self._ge_classes,
object_config=config,
)
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
StoreBackendAnonymizer,
)
from great_expectations.data_context.store import (
CheckpointStore,
ConfigurationStore,
EvaluationParameterStore,
ExpectationsStore,
HtmlSiteStore,
Expand All @@ -17,10 +19,12 @@ def __init__(self, salt=None):
super().__init__(salt=salt)
# ordered bottom up in terms of inheritance order
self._ge_classes = [
CheckpointStore,
ValidationsStore,
ExpectationsStore,
EvaluationParameterStore,
MetricStore,
ConfigurationStore,
Store,
HtmlSiteStore,
]
Expand All @@ -44,3 +48,8 @@ def anonymize_store_info(self, store_name, store_obj):
)

return anonymized_info_dict

def is_parent_class_recognized(self, store_obj):
return self._is_parent_class_recognized(
classes_to_check=self._ge_classes, object_=store_obj
)
Loading