From 6f05c13038a3a1d4397fc8026ed0b3bb6a4d48b2 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 12 Oct 2021 12:51:52 -0400 Subject: [PATCH 01/62] [DOCS] How to configure an InferredAssetDataConnector --- ...configure_an_inferredassetdataconnector.md | 462 ++++++++++++++++++ sidebars.js | 1 + 2 files changed, 463 insertions(+) create mode 100644 docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md new file mode 100644 index 000000000000..debfca411e0b --- /dev/null +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -0,0 +1,462 @@ +--- +title: How to configure an InferredAssetDataConnector +--- +import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx' + +This guide demonstrates how to configure an InferredAssetDataConnector, and provides several examples you +can use for configuration. + + + +- [Understand the basics of Datasources in 0.13 or later](../../reference/datasources.md) +- Learned how to configure a [Data Context using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) + + + +Great Expectations provides two types of `DataConnector` classes for connecting to file-system-like data. This includes files on disk, +but also S3 object stores, etc: + +- A ConfiguredAssetDataConnector requires an explicit listing of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup. +- An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. + +InferredAssetDataConnector has fewer options, so it's simpler to set up. It’s a good choice if you want to connect to a single `DataAsset`, or several `DataAssets` that all share the same naming convention. + +If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md). + +Set up a Datasource +------------------- + +All the examples below assume you’re testing configurations using something like: + +```python +import great_expectations as ge +context = ge.DataContext() + +context.test_yaml_config(""" +my_data_source: + class_name: Datasource + execution_engine: + class_name: PandasExecutionEngine + data_connectors: + my_filesystem_data_connector: + {data_connector configuration goes here} +""") +``` + +If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md). + +Choose a DataConnector +---------------------- + +InferredAssetDataConnectors like the `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector` +require a `default_regex` parameter, with a configured regex `pattern` and capture `group_names`. + +Imagine you have the following files in `my_directory/`: + +``` +my_directory/alpha-2020-01-01.csv +my_directory/alpha-2020-01-02.csv +my_directory/alpha-2020-01-03.csv +``` + +We can imagine 2 approaches to loading the data into GE. + +The simplest approach would be to consider each file to be its own DataAsset. In that case, the configuration would look like the following: + +```yaml +class_name: Datasource +execution_engine: + class_name: PandasExecutionEngine +data_connectors: + my_filesystem_data_connector: + class_name: InferredAssetFilesystemDataConnector + datasource_name: my_data_source + base_directory: my_directory/ + default_regex: + group_names: + - data_asset_name + pattern: (.*)\.csv +``` + +Notice that the `default_regex` is configured to have one capture group (`(.*)`) which captures the entire filename. That capture group is assigned to `data_asset_name` under `group_names`. +Running `test_yaml_config()` would result in 3 DataAssets : `alpha-2020-01-01`, `alpha-2020-01-02` and `alpha-2020-01-03`. + +However, a closer look at the filenames reveals a pattern that is common to the 3 files. Each have `alpha-` in the name, and have date information afterwards. These are the types of patterns that InferredAssetDataConnectors allow you to take advantage of. + +We could treat `alpha-*.csv` files as batches within the `alpha` DataAsset with a more specific regex `pattern` and adding `group_names` for `year`, `month` and `day`. + +**Note: ** We have chosen to be more specific in the capture groups for the `year` `month` and `day` by specifying the integer value (using `\d`) and the number of digits, but a simpler capture group like `(.*)` would also work. + +```yaml +class_name: Datasource +execution_engine: + class_name: PandasExecutionEngine +data_connectors: + my_filesystem_data_connector: + class_name: InferredAssetFilesystemDataConnector + datasource_name: my_data_source + base_directory: my_directory/ + default_regex: + group_names: + - data_asset_name + - year + - month + - day + pattern: (.*)-(\d{4})-(\d{2})-(\d{2})\.csv +``` + +Running `test_yaml_config()` would result in 1 DataAsset `alpha` with 3 associated data_references: `alpha-2020-01-01.csv`, `alpha-2020-01-02.csv` and `alpha-2020-01-03.csv`, seen also in Example 1 below. + +A corresponding configuration for `InferredAssetS3DataConnector` would look similar but would require `bucket` and `prefix` values instead of `base_directory`. + +```yaml +class_name: Datasource +execution_engine: + class_name: PandasExecutionEngine +data_connectors: + my_filesystem_data_connector: + class_name: InferredAssetS3DataConnector + datasource_name: my_data_source + bucket: MY_S3_BUCKET + prefix: MY_S3_BUCKET_PREFIX + default_regex: + group_names: + - data_asset_name + - year + - month + - day + pattern: (.*)-(\d{4})-(\d{2})-(\d{2})\.csv +``` + +The following examples will show scenarios that InferredAssetDataConnectors can help you analyze, using `InferredAssetFilesystemDataConnector` as an example and only show the configuration under `data_connectors` for simplicity. + + +Example 1: Basic configuration for a single DataAsset +----------------------------------------------------- + +Continuing the example above, imagine you have the following files in the directory `my_directory/`: + +``` +my_directory/alpha-2020-01-01.csv +my_directory/alpha-2020-01-02.csv +my_directory/alpha-2020-01-03.csv +``` + +Then this configuration... + +```yaml +class_name: InferredAssetFilesystemDataConnector +base_directory: my_directory/ +default_regex: + group_names: + - data_asset_name + - year + - month + - day + pattern: (.*)-(\d{4})-(\d{2})-(\d{2})\.csv +``` + +...will make available the following data_references: + +```bash +Available data_asset_names (1 of 1): + alpha (3 of 3): [ + 'alpha-2020-01-01.csv', + 'alpha-2020-01-02.csv', + 'alpha-2020-01-03.csv' + ] + +Unmatched data_references (0 of 0): [] +``` + +Once configured, you can get `Validators` from the `Data Context` as follows: + +```python +my_validator = my_context.get_validator( + execution_engine_name="my_execution_engine", + data_connector_name="my_data_connector", + data_asset_name="alpha", + create_expectation_suite_with_name="my_expectation_suite", +) +``` + +Example 2: Basic configuration with more than one DataAsset +----------------------------------------------------------- + +Here’s a similar example, but this time two data_assets are mixed together in one folder. + +**Note**: For an equivalent configuration using `ConfiguredAssetFilesSystemDataconnector`, please see Example 2 +in [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_configuredassetdataconnector). + +``` +test_data/alpha-2020-01-01.csv +test_data/beta-2020-01-01.csv +test_data/alpha-2020-01-02.csv +test_data/beta-2020-01-02.csv +test_data/alpha-2020-01-03.csv +test_data/beta-2020-01-03.csv +``` + +The same configuration as Example 1... + +```yaml +class_name: InferredAssetFilesystemDataConnector +base_directory: test_data/ +default_regex: + group_names: + - data_asset_name + - year + - month + - day +pattern: (.*)-(\d{4})-(\d{2})-(\d{2})\.csv +``` + +...will now make `alpha` and `beta` both available a DataAssets, with the following data_references: + +```bash +Available data_asset_names (2 of 2): + alpha (3 of 3): [ + 'alpha-2020-01-01.csv', + 'alpha-2020-01-02.csv', + 'alpha-2020-01-03.csv' + ] + + beta (3 of 3): [ + 'beta-2020-01-01.csv', + 'beta-2020-01-02.csv', + 'beta-2020-01-03.csv' + ] + +Unmatched data_references (0 of 0): [] +``` + + +Example 3: Nested directory structure with the data_asset_name on the inside +---------------------------------------------------------------------------- + +Here’s a similar example, with a nested directory structure... + +``` +2020/01/01/alpha.csv +2020/01/02/alpha.csv +2020/01/03/alpha.csv +2020/01/04/alpha.csv +2020/01/04/beta.csv +2020/01/05/alpha.csv +2020/01/05/beta.csv +``` + +Then this configuration... + +```yaml +class_name: InferredAssetFilesystemDataConnector +base_directory: my_directory/ +default_regex: + group_names: + - year + - month + - day + - data_asset_name + pattern: (\d{4})/(\d{2})/(\d{2})/(.*)\.csv +``` + +...will now make `alpha` and `beta` both available a DataAssets, with the following data_references: + +```bash +Available data_asset_names (2 of 2): + alpha (3 of 5): [ + 'alpha-2020-01-01.csv', + 'alpha-2020-01-02.csv', + 'alpha-2020-01-03.csv' + ] + + beta (2 of 2): [ + 'beta-2020-01-04.csv', + 'beta-2020-01-05.csv', + ] + +Unmatched data_references (0 of 0): [] +``` + + +Example 4: Nested directory structure with the data_asset_name on the outside +----------------------------------------------------------------------------- + +In the following example, files are placed in a folder structure with the `data_asset_name` defined by the folder name (A, B, C, or D) + +``` +A/A-1.csv +A/A-2.csv +A/A-3.csv +B/B-1.csv +B/B-2.csv +B/B-3.csv +C/C-1.csv +C/C-2.csv +C/C-3.csv +D/D-1.csv +D/D-2.csv +D/D-3.csv +``` + +Then this configuration... + +```yaml +class_name: InferredAssetFilesystemDataConnector +base_directory: / + +default_regex: + group_names: + - data_asset_name + - letter + - number + pattern: (\w{1})/(\w{1})-(\d{1})\.csv +``` + +...will now make `A` and `B` and `C` into data_assets, with each containing 3 data_references + +```bash +Available data_asset_names (3 of 4): + A (3 of 3): ['test_dir_charlie/A/A-1.csv', + 'test_dir_charlie/A/A-2.csv', + 'test_dir_charlie/A/A-3.csv'] + B (3 of 3): ['test_dir_charlie/B/B-1.csv', + 'test_dir_charlie/B/B-2.csv', + 'test_dir_charlie/B/B-3.csv'] + C (3 of 3): ['test_dir_charlie/C/C-1.csv', + 'test_dir_charlie/C/C-2.csv', + 'test_dir_charlie/C/C-3.csv'] + +Unmatched data_references (0 of 0): [] +``` + +Example 5: Redundant information in the naming convention (S3 Bucket) +---------------------------------------------------------------------- + +Here’s another example of a nested directory structure with data_asset_name defined in the bucket_name. + +``` +my_bucket/2021/01/01/log_file-20210101.txt.gz, +my_bucket/2021/01/02/log_file-20210102.txt.gz, +my_bucket/2021/01/03/log_file-20210103.txt.gz, +my_bucket/2021/01/04/log_file-20210104.txt.gz, +my_bucket/2021/01/05/log_file-20210105.txt.gz, +my_bucket/2021/01/06/log_file-20210106.txt.gz, +my_bucket/2021/01/07/log_file-20210107.txt.gz, +``` + + +Here’s a configuration that will allow all the log files in the bucket to be associated with a single data_asset, `my_bucket` + +```yaml +class_name: InferredAssetFilesystemDataConnector +base_directory: / + +default_regex: + group_names: + - year + - month + - day + - data_asset_name + pattern: (\w{11})/(\d{4})/(\d{2})/(\d{2})/log_file-.*\.csv +``` + +All the log files will be mapped to a single data_asset named `my_bucket`. + +```bash +Available data_asset_names (1 of 1): + my_bucket (3 of 7): [ + 'my_bucket/2021/01/03/log_file-*.csv', + 'my_bucket/2021/01/04/log_file-*.csv', + 'my_bucket/2021/01/05/log_file-*.csv' + ] + +Unmatched data_references (0 of 0): [] +``` + + +Example 6: Random information in the naming convention +------------------------------------------------------------------------------- + +In the following example, files are placed in folders according to the date of creation, and given a random hash value in their name. + +``` +2021/01/01/log_file-2f1e94b40f310274b485e72050daf591.txt.gz +2021/01/02/log_file-7f5d35d4f90bce5bf1fad680daac48a2.txt.gz +2021/01/03/log_file-99d5ed1123f877c714bbe9a2cfdffc4b.txt.gz +2021/01/04/log_file-885d40a5661bbbea053b2405face042f.txt.gz +2021/01/05/log_file-d8e478f817b608729cfc8fb750ebfc84.txt.gz +2021/01/06/log_file-b1ca8d1079c00fd4e210f7ef31549162.txt.gz +2021/01/07/log_file-d34b4818c52e74b7827504920af19a5c.txt.gz +``` + +Here’s a configuration that will allow all the log files to be associated with a single data_asset, `log_file` + +```yaml +class_name: InferredAssetFilesystemDataConnector +base_directory: / + +default_regex: + group_names: + - year + - month + - day + - data_asset_name + pattern: (\d{4})/(\d{2})/(\d{2})/(log_file)-.*\.txt\.gz +``` + +... will give you the following output + +```bash +Available data_asset_names (1 of 1): + log_file (3 of 7): [ + '2021/01/03/log_file-*.txt.gz', + '2021/01/04/log_file-*.txt.gz', + '2021/01/05/log_file-*.txt.gz' + ] + +Unmatched data_references (0 of 0): [] +``` + +Example 7: Redundant information in the naming convention (timestamp of file creation) +-------------------------------------------------------------------------------------- + +In the following example, files are placed in a single folder, and the name includes a timestamp of when the files were created + +``` +log_file-2021-01-01-035419.163324.txt.gz +log_file-2021-01-02-035513.905752.txt.gz +log_file-2021-01-03-035455.848839.txt.gz +log_file-2021-01-04-035251.47582.txt.gz +log_file-2021-01-05-033034.289789.txt.gz +log_file-2021-01-06-034958.505688.txt.gz +log_file-2021-01-07-033545.600898.txt.gz +``` + +Here’s a configuration that will allow all the log files to be associated with a single data_asset named `log_file`. + +```yaml +class_name: InferredAssetFilesystemDataConnector +base_directory: / + +default_regex: + group_names: + - data_asset_name + - year + - month + - day + pattern: (log_file)-(\d{4})-(\d{2})-(\d{2})-.*\.*\.txt\.gz +``` + +All the log files will be mapped to the data_asset `log_file`. + +```bash +Available data_asset_names (1 of 1): + some_bucket (3 of 7): [ + 'some_bucket/2021/01/03/log_file-*.txt.gz', + 'some_bucket/2021/01/04/log_file-*.txt.gz', + 'some_bucket/2021/01/05/log_file-*.txt.gz' +] + +Unmatched data_references (0 of 0): [] +``` diff --git a/sidebars.js b/sidebars.js index 5b13b97fb652..1a7f76d88d19 100644 --- a/sidebars.js +++ b/sidebars.js @@ -94,6 +94,7 @@ module.exports = { type: 'category', label: 'Core skills', items: [ + 'guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector', 'guides/connecting_to_your_data/how_to_configure_a_dataconnector_to_introspect_and_partition_a_file_system_or_blob_store', // 'guides/connecting_to_your_data/how_to_configure_a_dataconnector_to_introspect_and_partition_tables_in_sql', 'guides/connecting_to_your_data/how_to_create_a_batch_of_data_from_an_in_memory_spark_or_pandas_dataframe', From 95075005d1387e3ebf7a759408623dd64d837280 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 12 Oct 2021 12:56:41 -0400 Subject: [PATCH 02/62] [DOCS] Change log (#3533) --- docs/changelog.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/changelog.md b/docs/changelog.md index f7c582d6f694..108ddc8c255d 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -5,6 +5,7 @@ title: Changelog ### Develop * [BUGFIX] runtime_parameters: query: Custom query used as subquery in table metrics (#3508) * [BUGFIX] runtime_parameters: batch_data: Spark DF serialization (#3502) +* [DOCS] Choosing and configuring DataConnectors (#3533) * [DOCS] Added details on Anonymous Usage Statistics to the reference documentation. ### 0.13.37 From c0bd2d11e5e7a9edf3bd36c26991a0e72b9b290e Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 12 Oct 2021 14:34:42 -0400 Subject: [PATCH 03/62] [DOCS] How to configure a ConfiguredAssetDataConnector (#3533) --- ...onfigure_a_configuredassetdataconnector.md | 394 ++++++++++++++++++ ...configure_an_inferredassetdataconnector.md | 4 +- sidebars.js | 1 + 3 files changed, 397 insertions(+), 2 deletions(-) create mode 100644 docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md new file mode 100644 index 000000000000..dd43fec80028 --- /dev/null +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -0,0 +1,394 @@ +--- +title: How to configure an ConfiguredAssetDataConnector +--- +import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx' + +This guide demonstrates how to configure a ConfiguredAssetDataConnector, and provides several examples you can use for configuration. + + + +- [Understand the basics of Datasources in 0.13 or later](../../reference/datasources.md) +- Learned how to configure a [Data Context using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) + + + +Great Expectations provides two `DataConnector` classes for connecting to file-system-like data. This includes files on disk, +but also S3 object stores, etc: + +- A ConfiguredAssetDataConnector requires an explicit listing of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup. +- An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. + +If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md). + +Set up a Datasource +------------------- + +All of the examples below assume you’re testing configuration using something like: + +```python +import great_expectations as ge +context = ge.get_context() +config = f""" +class_name: Datasource +execution_engine: + class_name: PandasExecutionEngine +data_connectors: + my_filesystem_data_connector: + {data_connector configuration goes here} +""" +context.test_yaml_config( + name="my_pandas_datasource", + yaml_config=config +) +``` + +If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) + +Choose a DataConnector +---------------------- + +ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require DataAssets to be +explicitly named. Each DataAsset can have their own regex `pattern` and `group_names`, and if configured, will override any +`pattern` or `group_names` under `default_regex`. + +Imagine you have the following files in `my_directory/`: + +``` +my_directory/alpha-1.csv +my_directory/alpha-2.csv +my_directory/alpha-3.csv +``` + +We could create a DataAsset `alpha` that contains 3 data_references (`alpha-1.csv`, `alpha-2.csv`, and `alpha-3.csv`). +In that case, the configuration would look like the following: + +```yaml + my_data_source: + class_name: Datasource + execution_engine: + class_name: PandasExecutionEngine + data_connectors: + my_filesystem_data_connector: + class_name: ConfiguredAssetFilesystemDataConnector + base_directory: my_directory/ + default_regex: + assets: + alpha: + pattern: alpha-(.*)\.csv + group_names: + - index +``` + +Notice that we have specified a pattern that captures the number after `alpha-` in the filename and assigns it to the `group_name` `index`. + +The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the index on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. + +Later on we could retrieve the data in `alpha-2.csv` of `alpha` as its own batch using `context.get_batch()` by specifying `{"index": "2"}` as the `batch_identifier`. + +```python +my_batch = context.get_batch( + datasource_name="my_data_source", + data_connector_name="my_filesystem_data_connector", + data_asset_name="alpha", + batch_identifiers={"index": "2"} +) +``` + +This ability to access specific Batches using `batch_identifiers` is very useful when validating DataAssets that span multiple files. +For more information on `batches` and `batch_identifiers`, please refer to the [Core Concepts document](../../reference/dividing_data_assets_into_batches.md). + +A corresponding configuration for `ConfiguredAssetS3DataConnector` would look similar but would require `bucket` and `prefix` values instead of `base_directory`. + +```yaml +class_name: ConfiguredAssetS3DataConnector +bucket: MY_S3_BUCKET +prefix: MY_S3_BUCKET_PREFIX +default_regex: +assets: + alpha: + pattern: alpha-(.*)\.csv + group_names: + - index +``` + +The following examples will show scenarios that ConfiguredAssetDataConnectors can help you analyze, using `ConfiguredAssetFilesystemDataConnector`. + +**Note**: The examples will only show the configuration for `data_connectors` for simplicity. + +Example 1: Basic Configuration for a single DataAsset +----------------------------------------------------- + +Continuing the example above, imagine you have the following files in the directory `my_directory/`: + +``` +test/alpha-1.csv +test/alpha-2.csv +test/alpha-3.csv +``` + +Then this configuration... + +```yaml +class_name: ConfiguredAssetFilesystemDataConnector +base_directory: test/ +default_regex: +assets: + alpha: + pattern: alpha-(.*)\.csv + group_names: + - index +``` + +...will make available `alpha` as a single DataAsset with the following data_references: + +```bash +Available data_asset_names (1 of 1): + alpha (3 of 3): [ + 'alpha-1.csv', + 'alpha-2.csv', + 'alpha-3.csv' + ] +``` + +Once configured, you can get a `Validator` from the `Data Context` as follows: + +```python +my_validator = context.get_validator( + datasource_name="my_data_source", + data_connector_name="my_filesystem_data_connector", + data_asset_name="alpha", + batch_identifiers={ + "index": "2" + }, + expectation_suite_name="my_expectation_suite" # the suite with this name must exist by the time of this call +) +``` + +But what if the regex does not match any files in the directory? + +Then this configuration... + +```yaml +class_name: ConfiguredAssetFilesystemDataConnector +base_directory: test/ +default_regex: +assets: + alpha: + pattern: beta-(.*)\.csv + group_names: + - index +``` + +...will give you this output + +```bash +Successfully instantiated ConfiguredAssetFilesystemDataConnector +Available data_asset_names (1 of 1): + alpha (0 of 0): [] + +Unmatched data_references (3 of 3): ['alpha-1.csv', 'alpha-2.csv', 'alpha-3.csv'] +``` + +Notice that `alpha` has 0 data_references, and there are 3 `Unmatched data_references` listed. +This would indicate that some part of the configuration is incorrect and would need to be reviewed. +In our case, changing `pattern` to : `alpha-(.*)\\.csv` will fix our problem and give the same output to above. + + +Example 2: Basic configuration with more than one DataAsset +----------------------------------------------------------- + +Here’s a similar example, but this time two data_assets are mixed together in one folder. + +**Note**: For an equivalent configuration using `InferredAssetFileSystemDataConnector`, please see Example 2 in [How to configure an InferredAssetDataConnector](./how_to_configure_an_inferredassetdataconnector). + +``` +test_data/alpha-2020-01-01.csv +test_data/beta-2020-01-01.csv +test_data/alpha-2020-01-02.csv +test_data/beta-2020-01-02.csv +test_data/alpha-2020-01-03.csv +test_data/beta-2020-01-03.csv +``` + +Then this configuration... + +```yaml +class_name: ConfiguredAssetFilesystemDataConnector +base_directory: test_data/ +assets: + alpha: + group_names: + - name + - year + - month + - day + pattern: alpha-(\d{4})-(\d{2})-(\d{2})\.csv + beta: + group_names: + - name + - year + - month + - day + pattern: beta-(\d{4})-(\d{2})-(\d{2})\.csv +``` + +...will now make `alpha` and `beta` both available a DataAssets, with the following data_references: + +```bash +Available data_asset_names (2 of 2): + alpha (3 of 3): [ + 'alpha-2020-01-01.csv', + 'alpha-2020-01-02.csv', + 'alpha-2020-01-03.csv' + ] + + beta (3 of 3): [ + 'beta-2020-01-01.csv', + 'beta-2020-01-02.csv', + 'beta-2020-01-03.csv' + ] + +Unmatched data_references (0 of 0): [] +``` + +Example 3: Example with Nested Folders +-------------------------------------------------- + +In the following example, files are placed folders that match the `data_asset_names` we want: `A`, `B`, `C`, and `D`. + +``` +test_dir/A/A-1.csv +test_dir/A/A-2.csv +test_dir/A/A-3.csv +test_dir/B/B-1.txt +test_dir/B/B-2.txt +test_dir/B/B-3.txt +test_dir/C/C-2017.csv +test_dir/C/C-2018.csv +test_dir/C/C-2019.csv +test_dir/D/D-aaa.csv +test_dir/D/D-bbb.csv +test_dir/D/D-ccc.csv +test_dir/D/D-ddd.csv +test_dir/D/D-eee.csv +``` + +```yaml +module_name: great_expectations.datasource.data_connector +class_name: ConfiguredAssetFilesystemDataConnector +base_directory: test_dir/ +assets: + A: + base_directory: A/ + B: + base_directory: B/ + pattern: (.*)-(.*)\.txt + group_names: + - part_1 + - part_2 + C: + glob_directive: "*" + base_directory: C/ + D: + glob_directive: "*" + base_directory: D/ +default_regex: + pattern: (.*)-(.*)\.csv + group_names: + - part_1 + - part_2 +``` + +...will now make `A`, `B`, `C` and `D` available a DataAssets, with the following data_references: + +```bash +Available data_asset_names (4 of 4): + A (3 of 3): [ + 'A-1.csv', + 'A-2.csv', + 'A-3.csv', + ] + B (3 of 3): [ + 'B-1', + 'B-2', + 'B-3', + ] + C (3 of 3): [ + 'C-2017', + 'C-2018', + 'C-2019', + ] + D (5 of 5): [ + 'D-aaa.csv', + 'D-bbb.csv', + 'D-ccc.csv', + 'D-ddd.csv', + 'D-eee.csv', + ] +``` + +Example 4: Example with Explicit data_asset_names and more complex nesting +-------------------------------------------------------------------------- + +In this example, the assets `alpha`, `beta` and `gamma` are being explicitly defined in the configuration, and have a more complex nesting pattern. + +``` +my_base_directory/alpha/files/go/here/alpha-202001.csv +my_base_directory/alpha/files/go/here/alpha-202002.csv +my_base_directory/alpha/files/go/here/alpha-202003.csv +my_base_directory/beta_here/beta-202001.txt +my_base_directory/beta_here/beta-202002.txt +my_base_directory/beta_here/beta-202003.txt +my_base_directory/beta_here/beta-202004.txt +my_base_directory/gamma-202001.csv +my_base_directory/gamma-202002.csv +my_base_directory/gamma-202003.csv +my_base_directory/gamma-202004.csv +my_base_directory/gamma-202005.csv +``` + +The following configuration... + +```yaml +class_name: ConfiguredAssetFilesystemDataConnector +base_directory: my_base_directory/ +default_regex: + pattern: ^(.+)-(\d{4})(\d{2})\.(csv|txt)$ + group_names: + - data_asset_name + - year_dir + - month_dir +assets: + alpha: + base_directory: my_base_directory/alpha/files/go/here/ + glob_directive: "*.csv" + beta: + base_directory: my_base_directory/beta_here/ + glob_directive: "*.txt" + gamma: + glob_directive: "*.csv" +``` + +...will make `alpha`, `beta` and `gamma` available a DataAssets, with the following data_references: + +```bash +Available data_asset_names (3 of 3): + alpha (3 of 3): [ + 'alpha-202001.csv', + 'alpha-202002.csv', + 'alpha-202003.csv' + ] + beta (4 of 4): [ + 'beta-202001.txt', + 'beta-202002.txt', + 'beta-202003.txt', + 'beta-202004.txt' + ] + gamma (5 of 5): [ + 'gamma-202001.csv', + 'gamma-202002.csv', + 'gamma-202003.csv', + 'gamma-202004.csv', + 'gamma-202005.csv', + ] +``` diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index debfca411e0b..db32b9d3cd27 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -43,12 +43,12 @@ my_data_source: """) ``` -If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md). +If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) Choose a DataConnector ---------------------- -InferredAssetDataConnectors like the `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector` +InferredAssetDataConnectors like `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector` require a `default_regex` parameter, with a configured regex `pattern` and capture `group_names`. Imagine you have the following files in `my_directory/`: diff --git a/sidebars.js b/sidebars.js index 1a7f76d88d19..829803ba8214 100644 --- a/sidebars.js +++ b/sidebars.js @@ -95,6 +95,7 @@ module.exports = { label: 'Core skills', items: [ 'guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector', + 'guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector', 'guides/connecting_to_your_data/how_to_configure_a_dataconnector_to_introspect_and_partition_a_file_system_or_blob_store', // 'guides/connecting_to_your_data/how_to_configure_a_dataconnector_to_introspect_and_partition_tables_in_sql', 'guides/connecting_to_your_data/how_to_create_a_batch_of_data_from_an_in_memory_spark_or_pandas_dataframe', From e76cc75e35ec15d09eac0870dd6b1cacc5545eba Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 12 Oct 2021 15:13:27 -0400 Subject: [PATCH 04/62] [DOCS] How to choose which DataConnector to use (#3533) --- ...ow_to_choose_which_dataconnector_to_use.md | 157 ++++++++++++++++++ sidebars.js | 1 + 2 files changed, 158 insertions(+) create mode 100644 docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md new file mode 100644 index 000000000000..8f71a021b0cc --- /dev/null +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -0,0 +1,157 @@ +--- +title: How to choose which DataConnector to use +--- + +Great Expectations provides two types of `DataConnector` classes for connecting to file-system-like data. This includes files on disk, but also S3 object stores, etc: + +- A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each DataAsset you want to connect to. Examples of this type of `DataConnector` include `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector`. +- An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. Examples of this type of `DataConnector` include `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector`. + +The following examples will use `DataConnector` classes designed to connect to files on disk, namely `InferredAssetFilesystemDataConnector` and `ConfiguredAssetFilesystemDataConnector`. + +------------------------------------------ +When to use an InferredAssetDataConnector +------------------------------------------ + +If you have the following `my_data/` directory in your filesystem, and you want to treat the `A-*.csv` files as batches within the `A` DataAsset, and do the same for `B` and `C`: + +``` +my_data/A/A-1.csv +my_data/A/A-2.csv +my_data/A/A-3.csv +my_data/B/B-1.csv +my_data/B/B-2.csv +my_data/B/B-3.csv +my_data/C/C-1.csv +my_data/C/C-2.csv +my_data/C/C-3.csv +``` + +This config... + +```yaml +class_name: Datasource +data_connectors: + my_data_connector: + class_name: InferredAssetFilesystemDataConnector + base_directory: my_data/ + default_regex: + pattern: (.*)/.*-(\d+)\.csv + group_names: + - data_asset_name + - id +``` + +...will make available the following DataAssets and data_references: + +```bash +Available data_asset_names (3 of 3): + A (3 of 3): [ + 'A/A-1.csv', + 'A/A-2.csv', + 'A/A-3.csv' + ] + B (3 of 3): [ + 'B/B-1.csv', + 'B/B-2.csv', + 'B/B-3.csv' + ] + C (3 of 3): [ + 'C/C-1.csv', + 'C/C-2.csv', + 'C/C-3.csv' + ] + +Unmatched data_references (0 of 0): [] +``` + +Note that the `InferredAssetFileSystemDataConnector` **infers** `data_asset_names` **from the regex you provide.** This is the key difference between InferredAssetDataConnector and ConfiguredAssetDataConnector, and also requires that one of the `group_names` in the `default_regex` configuration be `data_asset_name`. + +------------------------------------------ +When to use a ConfiguredAssetDataConnector +------------------------------------------ + +On the other hand, `ConfiguredAssetFilesSystemDataConnector` requires an explicit listing of each DataAsset you want to connect to. This tends to be helpful when the naming conventions for your DataAssets are less standardized. + +If you have the following `my_messier_data/` directory in your filesystem, + +``` + my_messier_data/1/A-1.csv + my_messier_data/1/B-1.txt + + my_messier_data/2/A-2.csv + my_messier_data/2/B-2.txt + + my_messier_data/2017/C-1.csv + my_messier_data/2018/C-2.csv + my_messier_data/2019/C-3.csv + + my_messier_data/aaa/D-1.csv + my_messier_data/bbb/D-2.csv + my_messier_data/ccc/D-3.csv +``` + +Then this config... + +```yaml +class_name: Datasource +execution_engine: + class_name: PandasExecutionEngine +data_connectors: + my_data_connector: + class_name: ConfiguredAssetFilesystemDataConnector + glob_directive: "*/*" + base_directory: my_messier_data/ + assets: + A: + pattern: (.+A)-(\d+)\.csv + group_names: + - name + - id + B: + pattern: (.+B)-(\d+)\.txt + group_names: + - name + - val + C: + pattern: (.+C)-(\d+)\.csv + group_names: + - name + - id + D: + pattern: (.+D)-(\d+)\.csv + group_names: + - name + - id +``` + +...will make available the following DataAssets and data_references: + +```bash +Available data_asset_names (4 of 4): + A (2 of 2): [ + '1/A-1.csv', + '2/A-2.csv' + ] + B (2 of 2): [ + '1/B-1.txt', + '2/B-2.txt' + ] + C (3 of 3): [ + '2017/C-1.csv', + '2018/C-2.csv', + '2019/C-3.csv' + ] + D (3 of 3): [ + 'aaa/D-1.csv', + 'bbb/D-2.csv', + 'ccc/D-3.csv' + ] +``` + +---------------- +Additional Notes +---------------- + +- Additional examples and configurations for `ConfiguredAssetFilesystemDataConnectors` can be found here: [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_configuredassetdataconnector.md) +- Additional examples and configurations for `InferredAssetFilesystemDataConnectors` can be found here: [How to configure an InferredAssetDataConnector](./how_to_configure_an_inferredassetdataconnector.md) diff --git a/sidebars.js b/sidebars.js index 829803ba8214..2a94170db378 100644 --- a/sidebars.js +++ b/sidebars.js @@ -94,6 +94,7 @@ module.exports = { type: 'category', label: 'Core skills', items: [ + 'guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use', 'guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector', 'guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector', 'guides/connecting_to_your_data/how_to_configure_a_dataconnector_to_introspect_and_partition_a_file_system_or_blob_store', From 9485973d97ca2b62fb13204efdb186ca2448bfe8 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 13 Oct 2021 10:23:37 -0400 Subject: [PATCH 05/62] [DOCS] How to configure a RuntimeDataConnector (#3533) --- ...how_to_configure_a_runtimedataconnector.md | 83 +++++++++++++++++++ sidebars.js | 1 + 2 files changed, 84 insertions(+) create mode 100644 docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md new file mode 100644 index 000000000000..094e7d15682b --- /dev/null +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md @@ -0,0 +1,83 @@ +--- +title: How to configure a RuntimeDataConnector +--- +import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx' + +This guide demonstrates how to configure a RuntimeDataConnector and only applies for the V3 (Batch Request) API. A `RuntimeDataConnector` allows you to specify a Batch using a Runtime Batch Request, which is used to create a Validator. A Validator is the key object used to create Expectations and validate datasets. + + + +- [Understand the basics of Datasources in the V3 (Batch Request) API](../../reference/datasources.md) +- Learned how to configure a [Data Context using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) + + + +A RuntimeDataConnector is a special kind of [Data Connector](../../reference/datasources.md) that enables you to use a RuntimeBatchRequest to provide a [Batch's](../../reference/datasources.md#batches) data directly at runtime. The RuntimeBatchRequest can wrap either an in-memory dataframe, filepath, or SQL query, and must include batch identifiers that uniquely identify the data (e.g. a `run_id` from an AirFlow DAG run). The batch identifiers that must be passed in at runtime are specified in the RuntimeDataConnector's configuration. + +Add a RuntimeDataConnector to a Datasource configuration +--------------------------------------------------------- + +The following example uses `test_yaml_config` and `sanitize_yaml_and_save_datasource` to add a new SQL Datasource to a project's `great_expectations.yml`. If you already have configured Datasources, you can add an additional RuntimeDataConnector configuration directly to your `great_expectations.yml`. + +:::note +Currently, RuntimeDataConnector cannot be used with Datasources of type SimpleSqlalchemyDatasource. +::: + +```python +import great_expectations as ge +from great_expectations.cli.datasource import sanitize_yaml_and_save_datasource + +context = ge.get_context() +config = f""" +name: my_sqlite_datasource +class_name: Datasource +execution_engine: + class_name: SqlAlchemyExecutionEngine + connection_string: sqlite:///my_db_file +data_connectors: + my_runtime_data_connector: + class_name: RuntimeDataConnector + batch_identifiers: + - pipeline_stage_name + - airflow_run_id +""" +context.test_yaml_config( + yaml_config=config +) +sanitize_yaml_and_save_datasource(context, config, overwrite_existing=False) +``` + +At runtime, you would get a Validator from the Data Context as follows: + +```python +validator = context.get_validator( + batch_request=RuntimeBatchRequest( + datasource_name="my_sqlite_datasource", + data_connector_name="my_runtime_data_connector", + data_asset_name="my_data_asset_name", + runtime_parameters={ + "query": "SELECT * FROM table_partitioned_by_date_column__A" + }, + batch_identifiers={ + "pipeline_stage_name": "core_processing", + "airflow_run_id": 1234567890, + }, + ), + expectation_suite=my_expectation_suite, +) + + # Simplified call to get_validator - RuntimeBatchRequest is inferred under the hood + validator = context.get_validator( + datasource_name="my_sqlite_datasource", + data_connector_name="my_runtime_data_connector", + data_asset_name="my_data_asset_name", + runtime_parameters={ + "query": "SELECT * FROM table_partitioned_by_date_column__A" + }, + batch_identifiers={ + "pipeline_stage_name": "core_processing", + "airflow_run_id": 1234567890, + }, + expectation_suite=my_expectation_suite, + ) +``` diff --git a/sidebars.js b/sidebars.js index 2a94170db378..d148e30a6b1c 100644 --- a/sidebars.js +++ b/sidebars.js @@ -97,6 +97,7 @@ module.exports = { 'guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use', 'guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector', 'guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector', + 'guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector', 'guides/connecting_to_your_data/how_to_configure_a_dataconnector_to_introspect_and_partition_a_file_system_or_blob_store', // 'guides/connecting_to_your_data/how_to_configure_a_dataconnector_to_introspect_and_partition_tables_in_sql', 'guides/connecting_to_your_data/how_to_create_a_batch_of_data_from_an_in_memory_spark_or_pandas_dataframe', From c3a4f34d69268370845e5a46183daa6f7bd3bd27 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 13 Oct 2021 12:54:33 -0400 Subject: [PATCH 06/62] [DOCS] Add option for RuntimeDataConnector to how-to-choose document (#3533) --- .../how_to_choose_which_dataconnector_to_use.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index 8f71a021b0cc..6279da5166bb 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -2,12 +2,18 @@ title: How to choose which DataConnector to use --- -Great Expectations provides two types of `DataConnector` classes for connecting to file-system-like data. This includes files on disk, but also S3 object stores, etc: +Great Expectations provides three types of `DataConnector` classes. Two classes are for connecting to file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data: - A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each DataAsset you want to connect to. Examples of this type of `DataConnector` include `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector`. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. Examples of this type of `DataConnector` include `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector`. -The following examples will use `DataConnector` classes designed to connect to files on disk, namely `InferredAssetFilesystemDataConnector` and `ConfiguredAssetFilesystemDataConnector`. +The third type of `DataConnector` class is for providing a batch's data directly at runtime: + +- A `RuntimeDataConnector` enables you to use a `RuntimeBatchRequest` to wrap either an in-memory dataframe, filepath, or SQL query, and must include batch identifiers that uniquely identify the data (e.g. a `run_id` from an AirFlow DAG run). + +If you know for example, that your Pipeline Runner will already have your batch data in memory at runtime, you can choose to configure a `RuntimeDataConnector` with unique batch identifiers. Reference the documents on [How to configure a RuntimeDataConnector](guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md) and [How to create a Batch of data from an in-memory Spark or Pandas dataframe](guides/connecting_to_your_data/how_to_create_a_batch_of_data_from_an_in_memory_spark_or_pandas_dataframe.md) to get started with `RuntimeDataConnectors`. + +If you aren't sure which type of the remaining `DataConnector`s to use, the following examples will use `DataConnector` classes designed to connect to files on disk, namely `InferredAssetFilesystemDataConnector` and `ConfiguredAssetFilesystemDataConnector` to demonstrate the difference between these types of `DataConnectors`. ------------------------------------------ When to use an InferredAssetDataConnector @@ -153,5 +159,6 @@ Available data_asset_names (4 of 4): Additional Notes ---------------- -- Additional examples and configurations for `ConfiguredAssetFilesystemDataConnectors` can be found here: [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_configuredassetdataconnector.md) -- Additional examples and configurations for `InferredAssetFilesystemDataConnectors` can be found here: [How to configure an InferredAssetDataConnector](./how_to_configure_an_inferredassetdataconnector.md) +- Additional examples and configurations for `ConfiguredAssetFilesystemDataConnector`s can be found here: [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_configuredassetdataconnector.md) +- Additional examples and configurations for `InferredAssetFilesystemDataConnector`s can be found here: [How to configure an InferredAssetDataConnector](./how_to_configure_an_inferredassetdataconnector.md) +- Additional examples and configurations for `RuntimeDataConnector`s can be found here: [How to configure a RuntimeDataConnector](./how_to_configure_a_runtimedataconnector.md) From c43f5698a2b3553399e7023611b36f2f9f5d5cfb Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 19 Oct 2021 11:29:45 -0400 Subject: [PATCH 07/62] [DOCS] Clarify that the data we are connecting to is known as a DataAsset (#3533) --- .../how_to_choose_which_dataconnector_to_use.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index 6279da5166bb..de19165e1a12 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -2,9 +2,9 @@ title: How to choose which DataConnector to use --- -Great Expectations provides three types of `DataConnector` classes. Two classes are for connecting to file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data: +Great Expectations provides three types of `DataConnector` classes. Two classes are for connecting to `DataAsset`s stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data: -- A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each DataAsset you want to connect to. Examples of this type of `DataConnector` include `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector`. +- A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each `DataAsset` you want to connect to. Examples of this type of `DataConnector` include `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector`. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. Examples of this type of `DataConnector` include `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector`. The third type of `DataConnector` class is for providing a batch's data directly at runtime: From 979d581899a0ac3126737393cb14d84620cac274 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 19 Oct 2021 11:59:32 -0400 Subject: [PATCH 08/62] [DOCS] Cleanup working and a typo (#3533) --- .../how_to_configure_a_configuredassetdataconnector.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index dd43fec80028..d88d76d9794f 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -1,5 +1,5 @@ --- -title: How to configure an ConfiguredAssetDataConnector +title: How to configure a ConfiguredAssetDataConnector --- import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx' @@ -15,7 +15,7 @@ This guide demonstrates how to configure a ConfiguredAssetDataConnector, and pro Great Expectations provides two `DataConnector` classes for connecting to file-system-like data. This includes files on disk, but also S3 object stores, etc: -- A ConfiguredAssetDataConnector requires an explicit listing of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup. +- A ConfiguredAssetDataConnector requires an explicit listing of each `DataAsset` you want to connect to. This allows more fine-tuning, but also requires more setup. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md). From c9ca799b5bc0b70134aa16f7ecef1c1be0c952bf Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 19 Oct 2021 12:01:01 -0400 Subject: [PATCH 09/62] [DOCS] Rearrange this list into the order of the examples below (#3533) --- .../how_to_choose_which_dataconnector_to_use.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index de19165e1a12..02f67f7d5571 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -4,8 +4,8 @@ title: How to choose which DataConnector to use Great Expectations provides three types of `DataConnector` classes. Two classes are for connecting to `DataAsset`s stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data: -- A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each `DataAsset` you want to connect to. Examples of this type of `DataConnector` include `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector`. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. Examples of this type of `DataConnector` include `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector`. +- A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each `DataAsset` you want to connect to. Examples of this type of `DataConnector` include `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector`. The third type of `DataConnector` class is for providing a batch's data directly at runtime: From bc05b3c7a0e3856510d376e00ebfdf86816ece15 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 19 Oct 2021 12:05:37 -0400 Subject: [PATCH 10/62] [DOCS] Typo (#3533) --- .../how_to_configure_an_inferredassetdataconnector.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index db32b9d3cd27..52fd655d4ae8 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -16,7 +16,7 @@ can use for configuration. Great Expectations provides two types of `DataConnector` classes for connecting to file-system-like data. This includes files on disk, but also S3 object stores, etc: -- A ConfiguredAssetDataConnector requires an explicit listing of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup. +- A ConfiguredAssetDataConnector requires an explicit listing of each `DataAsset` you want to connect to. This allows more fine-tuning, but also requires more setup. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. InferredAssetDataConnector has fewer options, so it's simpler to set up. It’s a good choice if you want to connect to a single `DataAsset`, or several `DataAssets` that all share the same naming convention. From 44985cb78d0a7a85834e27d62a4b8745db4f3ec5 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 19 Oct 2021 14:08:16 -0400 Subject: [PATCH 11/62] Typo --- .../how_to_configure_a_configuredassetdataconnector.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index d88d76d9794f..180b9ef79d1c 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -47,8 +47,8 @@ If you’re not familiar with the `test_yaml_config` method, please check out: [ Choose a DataConnector ---------------------- -ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require DataAssets to be -explicitly named. Each DataAsset can have their own regex `pattern` and `group_names`, and if configured, will override any +ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require `DataAsset`s to be +explicitly named. Each `DataAsset` can have their own regex `pattern` and `group_names`, and if configured, will override any `pattern` or `group_names` under `default_regex`. Imagine you have the following files in `my_directory/`: From 5806b671eec2683d88dcbaf353967c343f17ba1d Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 19 Oct 2021 14:11:35 -0400 Subject: [PATCH 12/62] Clarify that the data is what we are referring to as a DataAsset --- .../how_to_configure_a_configuredassetdataconnector.md | 2 +- .../how_to_configure_an_inferredassetdataconnector.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 180b9ef79d1c..73a9be62a1bf 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -12,7 +12,7 @@ This guide demonstrates how to configure a ConfiguredAssetDataConnector, and pro -Great Expectations provides two `DataConnector` classes for connecting to file-system-like data. This includes files on disk, +Great Expectations provides two `DataConnector` classes for connecting to `DataAsset`s stored as file-system-like data. This includes files on disk, but also S3 object stores, etc: - A ConfiguredAssetDataConnector requires an explicit listing of each `DataAsset` you want to connect to. This allows more fine-tuning, but also requires more setup. diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 52fd655d4ae8..ea9606ccb828 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -13,7 +13,7 @@ can use for configuration. -Great Expectations provides two types of `DataConnector` classes for connecting to file-system-like data. This includes files on disk, +Great Expectations provides two types of `DataConnector` classes for connecting to `DataAsset`s stored as file-system-like data. This includes files on disk, but also S3 object stores, etc: - A ConfiguredAssetDataConnector requires an explicit listing of each `DataAsset` you want to connect to. This allows more fine-tuning, but also requires more setup. From 32d2769a62168878c5b4b83e42f5c258bd5f77ef Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 19 Oct 2021 14:20:35 -0400 Subject: [PATCH 13/62] Clean up --- .../how_to_configure_a_runtimedataconnector.md | 4 ++-- .../how_to_configure_an_inferredassetdataconnector.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md index 094e7d15682b..b7cb718765db 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md @@ -3,7 +3,7 @@ title: How to configure a RuntimeDataConnector --- import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx' -This guide demonstrates how to configure a RuntimeDataConnector and only applies for the V3 (Batch Request) API. A `RuntimeDataConnector` allows you to specify a Batch using a Runtime Batch Request, which is used to create a Validator. A Validator is the key object used to create Expectations and validate datasets. +This guide demonstrates how to configure a RuntimeDataConnector and only applies to the V3 (Batch Request) API. A `RuntimeDataConnector` allows you to specify a Batch using a Runtime Batch Request, which is used to create a Validator. A Validator is the key object used to create Expectations and validate datasets. @@ -12,7 +12,7 @@ This guide demonstrates how to configure a RuntimeDataConnector and only applies -A RuntimeDataConnector is a special kind of [Data Connector](../../reference/datasources.md) that enables you to use a RuntimeBatchRequest to provide a [Batch's](../../reference/datasources.md#batches) data directly at runtime. The RuntimeBatchRequest can wrap either an in-memory dataframe, filepath, or SQL query, and must include batch identifiers that uniquely identify the data (e.g. a `run_id` from an AirFlow DAG run). The batch identifiers that must be passed in at runtime are specified in the RuntimeDataConnector's configuration. +A RuntimeDataConnector is a special kind of [Data Connector](../../reference/datasources.md) that enables you to use a RuntimeBatchRequest to provide a [Batch's](../../reference/datasources.md#batches) data directly at runtime. The RuntimeBatchRequest can wrap an in-memory dataframe, a filepath, or a SQL query, and must include batch identifiers that uniquely identify the data (e.g. a `run_id` from an AirFlow DAG run). The batch identifiers that must be passed in at runtime are specified in the RuntimeDataConnector's configuration. Add a RuntimeDataConnector to a Datasource configuration --------------------------------------------------------- diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index ea9606ccb828..28ffbb2997e5 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -59,7 +59,7 @@ my_directory/alpha-2020-01-02.csv my_directory/alpha-2020-01-03.csv ``` -We can imagine 2 approaches to loading the data into GE. +We can imagine two approaches to loading the data into GE. The simplest approach would be to consider each file to be its own DataAsset. In that case, the configuration would look like the following: From 5953dd8155a7340ee4703642e543a3aff6d97ec1 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 19 Oct 2021 19:27:00 -0400 Subject: [PATCH 14/62] Basic working integration tests --- ...onfigure_a_configuredassetdataconnector.py | 65 +++++++++++++++++++ ...how_to_configure_a_runtimedataconnector.py | 58 +++++++++++++++++ ...configure_an_inferredassetdataconnector.py | 65 +++++++++++++++++++ tests/integration/test_script_runner.py | 20 ++++++ 4 files changed, 208 insertions(+) create mode 100644 tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py create mode 100644 tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py create mode 100644 tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py new file mode 100644 index 000000000000..7a02ff4c2fc3 --- /dev/null +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -0,0 +1,65 @@ +from ruamel import yaml + +import great_expectations as ge +from great_expectations.core.batch import BatchRequest + +context = ge.get_context() + +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_configured_data_connector_name": { + "class_name": "ConfiguredAssetFilesystemDataConnector", + "base_directory": "my_directory/", + "assets": { + "taxi": { + "pattern": "yellow_trip_data_sample_(.*)\.csv", + "group_names": ["month"], + } + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_config["data_connectors"]["default_configured_data_connector_name"][ + "base_directory" +] = "../data/" + +context.test_yaml_config(yaml.dump(datasource_config)) + +context.add_datasource(**datasource_config) + +# Here is a BatchRequest using a path to a single CSV file +batch_request = BatchRequest( + datasource_name="taxi_datasource", + data_connector_name="default_configured_data_connector_name", + data_asset_name="taxi", +) + +context.create_expectation_suite( + expectation_suite_name="test_suite", overwrite_existing=True +) + +validator = context.get_validator( + batch_request=batch_request, + expectation_suite_name="test_suite", + batch_identifiers={"month": "2019-02"}, +) +print(validator.head()) + +# NOTE: The following code is only for testing and can be ignored by users. +assert isinstance(validator, ge.validator.validator.Validator) +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "taxi" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py new file mode 100644 index 000000000000..5298e0b8df8a --- /dev/null +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py @@ -0,0 +1,58 @@ +from ruamel import yaml + +import great_expectations as ge +from great_expectations.core.batch import RuntimeBatchRequest + +context = ge.get_context() + +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_runtime_data_connector_name": { + "class_name": "RuntimeDataConnector", + "batch_identifiers": ["default_identifier_name"], + }, + }, +} + +context.test_yaml_config(yaml.dump(datasource_config)) + +context.add_datasource(**datasource_config) + +batch_request = RuntimeBatchRequest( + datasource_name="taxi_datasource", + data_connector_name="default_runtime_data_connector_name", + data_asset_name="", # This can be anything that identifies this data_asset for you + runtime_parameters={"path": ""}, # Add your path here. + batch_identifiers={"default_identifier_name": ""}, +) + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the BatchRequest above. +batch_request.runtime_parameters["path"] = "./data/yellow_trip_data_sample_2019-01.csv" + +context.create_expectation_suite( + expectation_suite_name="test_suite", overwrite_existing=True +) + +validator = context.get_validator( + batch_request=batch_request, + expectation_suite_name="test_suite", + batch_identifiers={"month": "2019-02"}, +) +print(validator.head()) + +# NOTE: The following code is only for testing and can be ignored by users. +assert isinstance(validator, ge.validator.validator.Validator) +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_runtime_data_connector_name" + ] +) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py new file mode 100644 index 000000000000..eb60a8d64986 --- /dev/null +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -0,0 +1,65 @@ +from ruamel import yaml + +import great_expectations as ge +from great_expectations.core.batch import BatchRequest + +context = ge.get_context() + +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_inferred_data_connector_name": { + "class_name": "InferredAssetFilesystemDataConnector", + "base_directory": "my_directory/", + "default_regex": { + "group_names": ["data_asset_name"], + "pattern": "(.*)\.csv", + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "base_directory" +] = "../data/" + +context.test_yaml_config(yaml.dump(datasource_config)) + +context.add_datasource(**datasource_config) + +# Here is a BatchRequest using a path to a single CSV file +batch_request = BatchRequest( + datasource_name="taxi_datasource", + data_connector_name="default_inferred_data_connector_name", + data_asset_name="", +) + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your data asset name directly in the BatchRequest above. +batch_request.data_asset_name = "yellow_trip_data_sample_2019-01" + +context.create_expectation_suite( + expectation_suite_name="test_suite", overwrite_existing=True +) + +validator = context.get_validator( + batch_request=batch_request, expectation_suite_name="test_suite" +) +print(validator.head()) + +# NOTE: The following code is only for testing and can be ignored by users. +assert isinstance(validator, ge.validator.validator.Validator) +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_trip_data_sample_2019-01" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) diff --git a/tests/integration/test_script_runner.py b/tests/integration/test_script_runner.py index fc92fc0b5629..3222ca238ec3 100755 --- a/tests/integration/test_script_runner.py +++ b/tests/integration/test_script_runner.py @@ -298,6 +298,26 @@ class BackendDependencies(enum.Enum): "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", "extra_backend_dependencies": BackendDependencies.MSSQL, }, + { + "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py", + "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations", + "data_dir": "tests/test_sets/taxi_yellow_trip_data_samples/first_3_files", + "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", + "extra_backend_dependencies": BackendDependencies.POSTGRESQL, + }, + { + "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py", + "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations", + "data_dir": "tests/test_sets/taxi_yellow_trip_data_samples/first_3_files", + "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", + "extra_backend_dependencies": BackendDependencies.POSTGRESQL, + }, + { + "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py", + "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations", + "data_dir": "tests/test_sets/taxi_yellow_trip_data_samples/first_3_files", + "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", + }, # { # "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/database/mysql_yaml_example.py", # "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations", From 98112bee1efe426e4fdfd0beb8cfa1808927f6d6 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 19 Oct 2021 19:30:45 -0400 Subject: [PATCH 15/62] Cleanup --- .../how_to_configure_a_configuredassetdataconnector.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 73a9be62a1bf..1ad45a723c77 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -15,7 +15,7 @@ This guide demonstrates how to configure a ConfiguredAssetDataConnector, and pro Great Expectations provides two `DataConnector` classes for connecting to `DataAsset`s stored as file-system-like data. This includes files on disk, but also S3 object stores, etc: -- A ConfiguredAssetDataConnector requires an explicit listing of each `DataAsset` you want to connect to. This allows more fine-tuning, but also requires more setup. +- A ConfiguredAssetDataConnector allows you to specify that you have multiple `DataAsset`s in a `Datasource`, but also requires an explicit listing of each `DataAsset` you want to connect to. This allows more fine-tuning, but also requires more setup. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md). @@ -48,7 +48,7 @@ Choose a DataConnector ---------------------- ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require `DataAsset`s to be -explicitly named. Each `DataAsset` can have their own regex `pattern` and `group_names`, and if configured, will override any +explicitly named. Each `DataAsset` can have their own regex `pattern` and `group_names`, and if configured, will override any `pattern` or `group_names` under `default_regex`. Imagine you have the following files in `my_directory/`: From 9d9224062522c90df8de9fca7e7df3b154e5ff03 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 20 Oct 2021 10:02:05 -0400 Subject: [PATCH 16/62] Test datasets --- .../green/2018/tripdata-10.csv | 21 +++++++++++++++++++ .../green/2018/tripdata-11.csv | 21 +++++++++++++++++++ .../green/2018/tripdata-12.csv | 21 +++++++++++++++++++ .../green/2019/tripdata-01.csv | 21 +++++++++++++++++++ .../green/2019/tripdata-02.csv | 21 +++++++++++++++++++ .../green/2019/tripdata-03.csv | 21 +++++++++++++++++++ .../green/tripdata_2018-10.csv | 21 +++++++++++++++++++ .../green/tripdata_2018-11.csv | 21 +++++++++++++++++++ .../green/tripdata_2018-12.csv | 21 +++++++++++++++++++ .../green/tripdata_2019-01.csv | 21 +++++++++++++++++++ .../green/tripdata_2019-02.csv | 21 +++++++++++++++++++ .../green/tripdata_2019-03.csv | 21 +++++++++++++++++++ .../green_tripdata_2018-10.csv | 21 +++++++++++++++++++ .../green_tripdata_2018-11.csv | 21 +++++++++++++++++++ .../green_tripdata_2018-12.csv | 21 +++++++++++++++++++ .../green_tripdata_2019-01.csv | 21 +++++++++++++++++++ .../green_tripdata_2019-02.csv | 21 +++++++++++++++++++ .../green_tripdata_2019-03.csv | 21 +++++++++++++++++++ .../yellow/2018/10/tripdata.csv | 21 +++++++++++++++++++ .../yellow/2018/11/tripdata.csv | 21 +++++++++++++++++++ .../yellow/2018/12/tripdata.csv | 21 +++++++++++++++++++ .../yellow/2019/01/tripdata.csv | 21 +++++++++++++++++++ .../yellow/2019/02/tripdata.csv | 21 +++++++++++++++++++ .../yellow/2019/03/tripdata.csv | 21 +++++++++++++++++++ .../yellow/tripdata_2018-10.csv | 21 +++++++++++++++++++ .../yellow/tripdata_2018-11.csv | 21 +++++++++++++++++++ .../yellow/tripdata_2018-12.csv | 21 +++++++++++++++++++ .../yellow/tripdata_2019-01.csv | 21 +++++++++++++++++++ .../yellow/tripdata_2019-02.csv | 21 +++++++++++++++++++ .../yellow/tripdata_2019-03.csv | 21 +++++++++++++++++++ .../yellow_tripdata_2018-10.csv | 21 +++++++++++++++++++ .../yellow_tripdata_2018-11.csv | 21 +++++++++++++++++++ .../yellow_tripdata_2018-12.csv | 21 +++++++++++++++++++ .../yellow_tripdata_2019-01.csv | 21 +++++++++++++++++++ .../yellow_tripdata_2019-02.csv | 21 +++++++++++++++++++ .../yellow_tripdata_2019-03.csv | 21 +++++++++++++++++++ 36 files changed, 756 insertions(+) create mode 100644 tests/test_sets/dataconnector_docs/green/2018/tripdata-10.csv create mode 100644 tests/test_sets/dataconnector_docs/green/2018/tripdata-11.csv create mode 100644 tests/test_sets/dataconnector_docs/green/2018/tripdata-12.csv create mode 100644 tests/test_sets/dataconnector_docs/green/2019/tripdata-01.csv create mode 100644 tests/test_sets/dataconnector_docs/green/2019/tripdata-02.csv create mode 100644 tests/test_sets/dataconnector_docs/green/2019/tripdata-03.csv create mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2018-10.csv create mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2018-11.csv create mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2018-12.csv create mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2019-01.csv create mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2019-02.csv create mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2019-03.csv create mode 100644 tests/test_sets/dataconnector_docs/green_tripdata_2018-10.csv create mode 100644 tests/test_sets/dataconnector_docs/green_tripdata_2018-11.csv create mode 100644 tests/test_sets/dataconnector_docs/green_tripdata_2018-12.csv create mode 100644 tests/test_sets/dataconnector_docs/green_tripdata_2019-01.csv create mode 100644 tests/test_sets/dataconnector_docs/green_tripdata_2019-02.csv create mode 100644 tests/test_sets/dataconnector_docs/green_tripdata_2019-03.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/2018/10/tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/2018/11/tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/2018/12/tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/2019/01/tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/2019/02/tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/2019/03/tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/tripdata_2018-10.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/tripdata_2018-11.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/tripdata_2018-12.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/tripdata_2019-01.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/tripdata_2019-02.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow/tripdata_2019-03.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow_tripdata_2018-10.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow_tripdata_2018-11.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow_tripdata_2018-12.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow_tripdata_2019-01.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow_tripdata_2019-02.csv create mode 100644 tests/test_sets/dataconnector_docs/yellow_tripdata_2019-03.csv diff --git a/tests/test_sets/dataconnector_docs/green/2018/tripdata-10.csv b/tests/test_sets/dataconnector_docs/green/2018/tripdata-10.csv new file mode 100644 index 000000000000..2b518c3e4a83 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/2018/tripdata-10.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type +448278,2,2018-10-20 13:58:45,2018-10-20 14:05:35,N,1,112,256,1,1.03,6.5,0.0,0.5,0.0,0.0,,0.3,7.3,2,1.0 +520261,1,2018-10-23 17:48:10,2018-10-23 17:55:35,Y,1,181,181,1,0.9,6.5,1.0,0.5,1.5,0.0,,0.3,9.8,1,1.0 +520094,2,2018-10-23 17:39:25,2018-10-23 18:42:28,N,1,75,133,1,14.6,48.5,1.0,0.5,11.21,5.76,,0.3,67.27,1,1.0 +465246,2,2018-10-21 03:41:56,2018-10-21 03:45:23,N,1,95,95,2,0.86,5.0,0.5,0.5,1.0,0.0,,0.3,7.3,1,1.0 +652784,2,2018-10-29 12:03:25,2018-10-29 12:14:19,N,1,18,31,1,2.24,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,2,1.0 +234842,2,2018-10-11 15:24:48,2018-10-11 15:47:51,N,1,197,95,1,3.42,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1.0 +199443,2,2018-10-09 22:19:35,2018-10-09 22:23:38,N,1,41,42,1,0.63,5.0,0.5,0.5,1.26,0.0,,0.3,7.56,1,1.0 +478271,2,2018-10-21 18:16:09,2018-10-21 18:25:49,N,1,74,263,1,2.25,9.0,0.0,0.5,1.96,0.0,,0.3,11.76,1,1.0 +480009,2,2018-10-21 19:32:30,2018-10-21 19:54:38,N,1,95,260,2,4.31,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 +621419,2,2018-10-27 22:52:50,2018-10-27 22:55:15,N,1,136,136,1,0.01,3.5,0.5,0.5,0.0,0.0,,0.3,4.8,2,1.0 +60768,2,2018-10-03 19:47:13,2018-10-03 19:50:39,N,1,256,255,1,0.52,4.0,1.0,0.5,1.45,0.0,,0.3,7.25,1,1.0 +559361,2,2018-10-25 12:04:52,2018-10-25 12:39:25,N,5,65,76,1,5.8,21.86,0.0,0.5,0.0,0.0,,0.0,22.36,1,2.0 +226070,2,2018-10-11 08:52:56,2018-10-11 09:22:58,N,1,166,163,1,3.73,20.0,0.0,0.5,4.16,0.0,,0.3,24.96,1,1.0 +578687,2,2018-10-26 08:44:43,2018-10-26 09:20:34,N,1,49,114,5,3.58,23.0,0.0,0.5,4.76,0.0,,0.3,30.51,1,1.0 +133625,2,2018-10-06 19:59:11,2018-10-06 20:38:07,N,1,181,100,2,9.41,34.5,0.0,0.5,10.26,5.76,,0.3,51.32,1,1.0 +118040,2,2018-10-06 03:38:45,2018-10-07 03:18:41,N,1,256,256,1,0.65,4.5,0.5,0.5,0.0,0.0,,0.3,5.8,2,1.0 +199254,2,2018-10-09 22:23:05,2018-10-09 22:33:00,N,1,181,97,1,1.53,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 +508650,2,2018-10-23 08:21:15,2018-10-23 08:40:55,N,1,166,142,5,2.41,14.0,0.0,0.5,1.0,0.0,,0.3,15.8,1,1.0 +597703,2,2018-10-26 21:14:29,2018-10-26 21:22:49,N,1,41,239,5,1.8,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 +472658,2,2018-10-21 13:36:13,2018-10-21 13:41:32,N,1,7,179,1,0.97,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 diff --git a/tests/test_sets/dataconnector_docs/green/2018/tripdata-11.csv b/tests/test_sets/dataconnector_docs/green/2018/tripdata-11.csv new file mode 100644 index 000000000000..f68bd7715093 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/2018/tripdata-11.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type +206147,2,2018-11-09 21:58:22,2018-11-09 22:09:19,N,1,97,17,1,1.69,9.0,0.5,0.5,0.0,0.0,,0.3,10.3,2,1.0 +586398,2,2018-11-28 07:16:28,2018-11-28 07:40:30,N,1,81,250,1,4.39,20.5,0.0,0.5,0.0,0.0,,0.3,21.3,1,1.0 +410503,2,2018-11-19 08:36:21,2018-11-19 08:53:22,N,1,116,128,1,4.59,17.5,0.0,0.5,4.58,0.0,,0.3,22.88,1,1.0 +284524,2,2018-11-13 14:57:24,2018-11-13 15:23:48,N,1,53,121,1,5.38,21.5,0.0,0.5,0.0,0.0,,0.3,22.3,1,1.0 +652213,2,2018-11-30 20:18:38,2018-11-30 20:23:04,N,1,75,74,1,1.23,6.0,0.5,0.5,1.46,0.0,,0.3,8.76,1,1.0 +453632,2,2018-11-21 08:46:22,2018-11-21 09:12:23,N,5,17,35,1,3.68,16.39,0.0,0.5,0.0,0.0,,0.0,16.89,1,2.0 +514609,2,2018-11-24 14:03:10,2018-11-24 14:26:29,N,1,149,89,1,4.51,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 +570411,2,2018-11-27 12:59:00,2018-11-27 13:03:56,N,1,25,25,1,0.79,5.5,0.0,0.5,1.26,0.0,,0.3,7.56,1,1.0 +328751,2,2018-11-15 12:26:07,2018-11-15 12:58:31,N,5,52,37,1,6.39,21.25,0.0,0.5,0.0,0.0,,0.0,21.75,1,2.0 +290145,2,2018-11-13 18:54:48,2018-11-13 19:04:14,N,1,74,41,1,1.15,7.5,1.0,0.5,1.86,0.0,,0.3,11.16,1,1.0 +273210,2,2018-11-12 23:48:01,2018-11-12 23:48:03,N,5,166,166,1,0.0,20.0,0.0,0.0,4.0,0.0,,0.0,24.0,1,2.0 +598576,2,2018-11-28 16:55:56,2018-11-28 17:03:05,N,1,41,74,2,0.71,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 +19526,2,2018-11-01 19:20:33,2018-11-01 19:27:39,N,1,25,181,1,0.87,6.5,1.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 +645647,2,2018-11-30 16:50:09,2018-11-30 16:56:01,N,1,7,7,2,0.69,5.5,1.0,0.5,1.46,0.0,,0.3,10.71,1,1.0 +642343,2,2018-11-30 14:56:19,2018-11-30 15:06:20,N,1,33,97,1,1.08,7.5,0.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 +284366,2,2018-11-13 14:23:08,2018-11-13 14:53:25,N,1,81,20,1,6.94,26.0,0.0,0.5,0.0,0.0,,0.3,26.8,1,1.0 +608380,2,2018-11-29 03:50:30,2018-11-29 03:55:17,N,1,74,42,1,0.98,5.5,0.5,0.5,2.04,0.0,,0.3,8.84,1,1.0 +131427,2,2018-11-06 17:46:21,2018-11-06 17:50:42,N,1,41,42,1,0.85,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1.0 +368687,2,2018-11-17 10:13:10,2018-11-17 10:27:14,N,1,95,82,1,1.65,10.5,0.0,0.5,0.0,0.0,,0.3,11.3,2,1.0 +13155,1,2018-11-01 15:40:11,2018-11-01 15:48:58,N,1,43,236,1,0.8,7.0,0.0,0.5,1.17,0.0,,0.3,8.97,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/green/2018/tripdata-12.csv b/tests/test_sets/dataconnector_docs/green/2018/tripdata-12.csv new file mode 100644 index 000000000000..375b98f8a05a --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/2018/tripdata-12.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type +644743,2,2018-12-29 22:07:39,2018-12-29 22:23:38,N,1,255,7,1,4.2,15.0,0.5,0.5,3.26,0.0,,0.3,19.56,1,1.0 +241539,1,2018-12-11 13:27:48,2018-12-11 14:01:25,N,1,244,87,1,13.4,40.5,0.0,0.5,8.25,0.0,,0.3,49.55,1,1.0 +519266,2,2018-12-22 23:20:48,2018-12-22 23:42:53,N,1,255,225,1,4.03,16.5,0.5,0.5,3.56,0.0,,0.3,21.36,1,1.0 +419676,2,2018-12-18 20:30:48,2018-12-18 20:42:37,N,1,166,116,1,2.59,11.0,0.5,0.5,0.0,0.0,,0.3,12.3,1,1.0 +110768,2,2018-12-05 21:20:39,2018-12-05 21:45:12,N,1,255,61,1,4.08,18.5,0.5,0.5,5.94,0.0,,0.3,25.74,1,1.0 +591680,2,2018-12-27 11:21:47,2018-12-27 12:21:54,N,1,50,9,1,15.01,56.0,0.0,0.5,0.0,5.76,,0.3,62.56,1,1.0 +532284,2,2018-12-23 16:30:52,2018-12-23 16:39:42,N,1,74,75,1,0.57,7.0,0.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 +149369,2,2018-12-07 14:18:11,2018-12-07 14:40:06,N,1,179,95,1,6.96,23.0,0.0,0.5,4.76,0.0,,0.3,28.56,1,1.0 +40899,2,2018-12-02 19:07:00,2018-12-02 19:17:53,N,1,97,49,1,1.74,9.0,0.0,0.5,2.45,0.0,,0.3,12.25,1,1.0 +341430,2,2018-12-15 14:19:06,2018-12-15 14:32:47,N,5,11,29,1,4.9,15.06,0.0,0.5,0.0,0.0,,0.0,15.56,1,2.0 +400460,2,2018-12-18 06:09:49,2018-12-18 06:27:36,N,5,14,231,1,7.75,24.39,0.0,0.5,0.0,5.76,,0.0,30.65,1,2.0 +320076,2,2018-12-14 18:00:27,2018-12-14 18:09:47,N,1,7,7,1,0.83,7.0,1.0,0.5,0.0,0.0,,0.3,8.8,2,1.0 +263463,2,2018-12-12 12:06:59,2018-12-12 12:17:05,N,1,260,223,1,3.48,12.5,0.0,0.5,0.0,0.0,,0.3,13.3,2,1.0 +245734,2,2018-12-11 16:20:31,2018-12-11 16:30:51,N,1,75,151,1,1.58,8.5,1.0,0.5,0.0,0.0,,0.3,10.3,2,1.0 +173368,2,2018-12-08 11:34:24,2018-12-08 11:53:37,N,1,181,61,1,3.79,15.5,0.0,0.5,0.0,0.0,,0.3,16.3,1,1.0 +37580,2,2018-12-02 15:23:58,2018-12-02 15:45:49,N,1,82,28,1,3.73,16.5,0.0,0.5,0.0,0.0,,0.3,17.3,1,1.0 +82903,2,2018-12-04 19:07:55,2018-12-04 19:32:12,N,5,242,167,1,4.84,19.86,0.0,0.5,0.0,0.0,,0.0,20.36,1,2.0 +531182,2,2018-12-23 15:43:36,2018-12-23 16:16:42,N,1,82,173,1,2.87,20.0,0.0,0.5,0.0,0.0,,0.3,20.8,2,1.0 +532295,2,2018-12-23 16:09:21,2018-12-23 16:15:12,N,1,181,181,1,0.64,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 +112713,2,2018-12-05 23:07:24,2018-12-05 23:15:40,N,1,129,129,1,1.3,7.5,0.5,0.5,1.76,0.0,,0.3,10.56,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/green/2019/tripdata-01.csv b/tests/test_sets/dataconnector_docs/green/2019/tripdata-01.csv new file mode 100644 index 000000000000..07a92dc26d64 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/2019/tripdata-01.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +198236,2,2019-01-11 01:12:46,2019-01-11 01:20:37,N,1,129,129,1,1.39,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, +33286,1,2019-01-02 23:41:23,2019-01-02 23:49:14,N,1,255,112,1,1.5,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, +517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1, +368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1, +155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1, +366694,2,2019-01-18 22:58:57,2019-01-18 23:15:37,N,1,75,185,1,8.71,24.5,0.5,0.5,0.0,0.0,,0.3,25.8,1,1, +474972,2,2019-01-24 17:59:00,2019-01-24 18:35:53,N,5,195,90,1,6.46,23.45,0.0,0.5,0.0,0.0,,0.0,23.95,1,2, +69932,1,2019-01-04 18:07:02,2019-01-04 18:34:37,N,1,97,35,2,4.7,20.0,1.0,0.5,0.0,0.0,,0.3,21.8,1,1, +244782,2,2019-01-13 02:16:18,2019-01-13 02:37:30,N,1,260,198,1,3.9,16.5,0.5,0.5,0.0,0.0,,0.3,17.8,2,1, +482363,2,2019-01-25 04:42:54,2019-01-25 04:47:06,N,1,82,56,1,0.74,5.0,0.5,0.5,0.0,0.0,,0.3,6.3,2,1, +573895,2,2019-01-29 11:36:48,2019-01-29 11:43:03,N,1,43,236,1,0.91,6.0,0.0,0.5,2.0,0.0,,0.3,8.8,1,1,0.0 +182986,2,2019-01-10 12:07:54,2019-01-10 12:19:31,N,1,29,155,1,5.49,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, +526993,1,2019-01-26 22:13:18,2019-01-26 22:13:51,N,1,33,33,1,0.1,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1, +490940,2,2019-01-25 13:09:50,2019-01-25 13:15:30,N,1,220,153,1,0.35,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,1,1, +145635,2,2019-01-08 16:01:58,2019-01-08 16:10:16,N,1,77,72,1,1.48,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,1,1, +242083,1,2019-01-12 23:43:28,2019-01-13 00:02:13,N,1,210,216,1,12.3,34.5,0.5,0.5,0.0,0.0,,0.3,35.8,2,1, +328924,2,2019-01-17 10:07:09,2019-01-17 10:25:59,N,1,32,242,1,4.71,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, +132260,1,2019-01-07 21:00:39,2019-01-07 21:18:13,N,1,52,188,1,3.6,14.5,0.5,0.5,0.0,0.0,,0.3,15.8,2,1, +568032,2,2019-01-29 07:01:03,2019-01-29 07:54:28,N,1,225,251,1,16.91,55.0,0.0,0.5,0.0,11.52,,0.3,67.32,1,1,0.0 +92205,2,2019-01-05 20:26:49,2019-01-05 20:44:48,N,1,29,108,1,3.93,15.5,0.5,0.5,0.0,0.0,,0.3,16.8,1,1, diff --git a/tests/test_sets/dataconnector_docs/green/2019/tripdata-02.csv b/tests/test_sets/dataconnector_docs/green/2019/tripdata-02.csv new file mode 100644 index 000000000000..9a6442e61899 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/2019/tripdata-02.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +504629,2,2019-02-25 16:05:49,2019-02-25 16:11:15,N,1,179,179,1,0.89,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,2,1,0.0 +475628,1,2019-02-23 23:18:58,2019-02-23 23:31:00,N,1,210,55,1,4.2,15.0,0.5,0.5,0.0,0.0,,0.3,16.3,2,1,0.0 +150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75 +245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0 +151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0 +18916,2,2019-02-01 19:54:28,2019-02-01 19:58:41,N,1,74,262,1,1.04,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,1,1,0.0 +110154,2,2019-02-06 08:33:30,2019-02-06 08:48:52,N,1,66,87,1,2.14,12.0,0.0,0.5,4.66,0.0,,0.3,20.21,1,1,2.75 +335114,2,2019-02-16 18:40:24,2019-02-16 18:46:47,N,1,74,75,1,0.89,6.0,0.0,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 +453776,2,2019-02-22 22:38:18,2019-02-22 22:49:20,N,1,116,41,1,1.86,10.0,0.5,0.5,2.26,0.0,,0.3,13.56,1,1,0.0 +192685,2,2019-02-09 19:56:13,2019-02-09 20:13:56,N,1,167,235,1,2.59,13.0,0.0,0.5,0.0,0.0,,0.3,13.8,1,1,0.0 +574982,2,2019-02-28 22:42:50,2019-02-28 22:47:40,N,1,42,116,1,0.91,5.5,0.5,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 +155855,2,2019-02-08 09:02:21,2019-02-08 10:17:23,N,5,210,161,1,20.74,59.86,0.0,0.5,0.0,0.0,,0.0,60.36,1,2,0.0 +130109,2,2019-02-07 05:18:25,2019-02-07 05:24:30,N,1,62,188,4,1.54,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,1,1,0.0 +157145,2,2019-02-08 10:20:47,2019-02-08 10:47:36,N,1,74,31,2,7.73,25.5,0.0,0.5,5.26,0.0,,0.3,31.56,1,1,0.0 +411748,2,2019-02-20 22:31:32,2019-02-20 22:37:11,N,1,129,129,4,0.9,5.5,0.5,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 +470538,2,2019-02-23 18:15:38,2019-02-23 18:27:05,N,1,126,213,1,2.21,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,1,1,0.0 +262705,2,2019-02-13 13:07:02,2019-02-13 13:19:50,N,1,186,249,1,1.5,9.5,0.0,0.5,2.56,0.0,,0.3,15.36,1,1,2.5 +163160,2,2019-02-08 15:39:26,2019-02-08 16:06:31,N,1,33,17,1,2.9,17.5,0.0,0.5,0.0,0.0,,0.3,18.3,2,1,0.0 +263174,2,2019-02-13 15:06:57,2019-02-13 15:28:26,N,1,165,123,5,1.66,13.5,0.0,0.5,0.0,0.0,,0.3,14.3,1,1,0.0 +451270,2,2019-02-22 21:00:04,2019-02-22 21:00:24,N,1,82,82,1,0.06,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/green/2019/tripdata-03.csv b/tests/test_sets/dataconnector_docs/green/2019/tripdata-03.csv new file mode 100644 index 000000000000..5104e10f24c5 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/2019/tripdata-03.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +337596,1,2019-03-18 08:20:31,2019-03-18 09:12:55,N,1,49,35,1,6.1,33.5,0.0,0.5,0.0,0.0,,0.3,34.3,1,1,0.0 +378981,2,2019-03-20 13:03:10,2019-03-20 13:16:30,N,1,152,41,1,1.14,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 +6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0 +76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75 +282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0 +439163,2,2019-03-23 12:07:55,2019-03-23 12:11:20,N,1,166,166,1,0.75,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1,0.0 +518619,2,2019-03-27 18:33:46,2019-03-27 18:40:59,N,1,7,223,1,0.59,6.0,1.0,0.5,1.95,0.0,,0.3,9.75,1,1,0.0 +385766,2,2019-03-20 18:19:37,2019-03-20 18:28:31,N,1,129,129,1,1.21,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,2,1,0.0 +131793,2,2019-03-07 18:40:23,2019-03-07 18:58:26,N,1,51,32,1,3.35,14.0,1.0,0.5,0.0,0.0,,0.3,15.8,1,1,0.0 +203472,2,2019-03-11 12:46:20,2019-03-11 13:20:08,N,1,119,254,3,6.47,27.5,0.0,0.5,0.0,0.0,,0.3,28.3,1,1,0.0 +399106,2,2019-03-21 13:10:44,2019-03-21 13:21:19,N,1,74,75,1,1.27,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 +425811,2,2019-03-22 19:00:23,2019-03-22 19:02:51,N,1,95,95,1,0.51,4.0,1.0,0.5,2.0,0.0,,0.3,7.8,1,1,0.0 +36345,2,2019-03-02 19:27:41,2019-03-02 19:33:52,N,1,7,179,1,0.86,6.0,0.0,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 +323601,2,2019-03-17 10:19:28,2019-03-17 10:22:59,N,1,97,49,1,0.68,4.5,0.0,0.5,1.06,0.0,,0.3,6.36,1,1,0.0 +246980,2,2019-03-13 17:03:07,2019-03-13 17:12:04,N,1,75,263,2,1.32,8.0,1.0,0.5,2.51,0.0,,0.3,15.06,1,1,2.75 +269737,1,2019-03-14 18:03:35,2019-03-14 18:10:01,N,1,66,65,1,0.6,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1,0.0 +145416,2,2019-03-08 12:21:02,2019-03-08 12:22:57,N,1,75,75,1,0.5,3.5,0.0,0.5,0.0,0.0,,0.3,4.3,2,1,0.0 +142333,2,2019-03-08 09:39:09,2019-03-08 10:03:48,N,5,215,16,1,7.14,21.62,0.0,0.5,0.0,0.0,,0.0,22.12,1,2,0.0 +381567,2,2019-03-20 15:09:47,2019-03-20 15:22:12,N,1,89,188,1,1.75,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 +476898,1,2019-03-25 13:10:35,2019-03-25 13:20:23,N,1,244,243,1,1.1,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2018-10.csv new file mode 100644 index 000000000000..2b518c3e4a83 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/tripdata_2018-10.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type +448278,2,2018-10-20 13:58:45,2018-10-20 14:05:35,N,1,112,256,1,1.03,6.5,0.0,0.5,0.0,0.0,,0.3,7.3,2,1.0 +520261,1,2018-10-23 17:48:10,2018-10-23 17:55:35,Y,1,181,181,1,0.9,6.5,1.0,0.5,1.5,0.0,,0.3,9.8,1,1.0 +520094,2,2018-10-23 17:39:25,2018-10-23 18:42:28,N,1,75,133,1,14.6,48.5,1.0,0.5,11.21,5.76,,0.3,67.27,1,1.0 +465246,2,2018-10-21 03:41:56,2018-10-21 03:45:23,N,1,95,95,2,0.86,5.0,0.5,0.5,1.0,0.0,,0.3,7.3,1,1.0 +652784,2,2018-10-29 12:03:25,2018-10-29 12:14:19,N,1,18,31,1,2.24,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,2,1.0 +234842,2,2018-10-11 15:24:48,2018-10-11 15:47:51,N,1,197,95,1,3.42,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1.0 +199443,2,2018-10-09 22:19:35,2018-10-09 22:23:38,N,1,41,42,1,0.63,5.0,0.5,0.5,1.26,0.0,,0.3,7.56,1,1.0 +478271,2,2018-10-21 18:16:09,2018-10-21 18:25:49,N,1,74,263,1,2.25,9.0,0.0,0.5,1.96,0.0,,0.3,11.76,1,1.0 +480009,2,2018-10-21 19:32:30,2018-10-21 19:54:38,N,1,95,260,2,4.31,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 +621419,2,2018-10-27 22:52:50,2018-10-27 22:55:15,N,1,136,136,1,0.01,3.5,0.5,0.5,0.0,0.0,,0.3,4.8,2,1.0 +60768,2,2018-10-03 19:47:13,2018-10-03 19:50:39,N,1,256,255,1,0.52,4.0,1.0,0.5,1.45,0.0,,0.3,7.25,1,1.0 +559361,2,2018-10-25 12:04:52,2018-10-25 12:39:25,N,5,65,76,1,5.8,21.86,0.0,0.5,0.0,0.0,,0.0,22.36,1,2.0 +226070,2,2018-10-11 08:52:56,2018-10-11 09:22:58,N,1,166,163,1,3.73,20.0,0.0,0.5,4.16,0.0,,0.3,24.96,1,1.0 +578687,2,2018-10-26 08:44:43,2018-10-26 09:20:34,N,1,49,114,5,3.58,23.0,0.0,0.5,4.76,0.0,,0.3,30.51,1,1.0 +133625,2,2018-10-06 19:59:11,2018-10-06 20:38:07,N,1,181,100,2,9.41,34.5,0.0,0.5,10.26,5.76,,0.3,51.32,1,1.0 +118040,2,2018-10-06 03:38:45,2018-10-07 03:18:41,N,1,256,256,1,0.65,4.5,0.5,0.5,0.0,0.0,,0.3,5.8,2,1.0 +199254,2,2018-10-09 22:23:05,2018-10-09 22:33:00,N,1,181,97,1,1.53,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 +508650,2,2018-10-23 08:21:15,2018-10-23 08:40:55,N,1,166,142,5,2.41,14.0,0.0,0.5,1.0,0.0,,0.3,15.8,1,1.0 +597703,2,2018-10-26 21:14:29,2018-10-26 21:22:49,N,1,41,239,5,1.8,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 +472658,2,2018-10-21 13:36:13,2018-10-21 13:41:32,N,1,7,179,1,0.97,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2018-11.csv new file mode 100644 index 000000000000..f68bd7715093 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/tripdata_2018-11.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type +206147,2,2018-11-09 21:58:22,2018-11-09 22:09:19,N,1,97,17,1,1.69,9.0,0.5,0.5,0.0,0.0,,0.3,10.3,2,1.0 +586398,2,2018-11-28 07:16:28,2018-11-28 07:40:30,N,1,81,250,1,4.39,20.5,0.0,0.5,0.0,0.0,,0.3,21.3,1,1.0 +410503,2,2018-11-19 08:36:21,2018-11-19 08:53:22,N,1,116,128,1,4.59,17.5,0.0,0.5,4.58,0.0,,0.3,22.88,1,1.0 +284524,2,2018-11-13 14:57:24,2018-11-13 15:23:48,N,1,53,121,1,5.38,21.5,0.0,0.5,0.0,0.0,,0.3,22.3,1,1.0 +652213,2,2018-11-30 20:18:38,2018-11-30 20:23:04,N,1,75,74,1,1.23,6.0,0.5,0.5,1.46,0.0,,0.3,8.76,1,1.0 +453632,2,2018-11-21 08:46:22,2018-11-21 09:12:23,N,5,17,35,1,3.68,16.39,0.0,0.5,0.0,0.0,,0.0,16.89,1,2.0 +514609,2,2018-11-24 14:03:10,2018-11-24 14:26:29,N,1,149,89,1,4.51,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 +570411,2,2018-11-27 12:59:00,2018-11-27 13:03:56,N,1,25,25,1,0.79,5.5,0.0,0.5,1.26,0.0,,0.3,7.56,1,1.0 +328751,2,2018-11-15 12:26:07,2018-11-15 12:58:31,N,5,52,37,1,6.39,21.25,0.0,0.5,0.0,0.0,,0.0,21.75,1,2.0 +290145,2,2018-11-13 18:54:48,2018-11-13 19:04:14,N,1,74,41,1,1.15,7.5,1.0,0.5,1.86,0.0,,0.3,11.16,1,1.0 +273210,2,2018-11-12 23:48:01,2018-11-12 23:48:03,N,5,166,166,1,0.0,20.0,0.0,0.0,4.0,0.0,,0.0,24.0,1,2.0 +598576,2,2018-11-28 16:55:56,2018-11-28 17:03:05,N,1,41,74,2,0.71,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 +19526,2,2018-11-01 19:20:33,2018-11-01 19:27:39,N,1,25,181,1,0.87,6.5,1.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 +645647,2,2018-11-30 16:50:09,2018-11-30 16:56:01,N,1,7,7,2,0.69,5.5,1.0,0.5,1.46,0.0,,0.3,10.71,1,1.0 +642343,2,2018-11-30 14:56:19,2018-11-30 15:06:20,N,1,33,97,1,1.08,7.5,0.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 +284366,2,2018-11-13 14:23:08,2018-11-13 14:53:25,N,1,81,20,1,6.94,26.0,0.0,0.5,0.0,0.0,,0.3,26.8,1,1.0 +608380,2,2018-11-29 03:50:30,2018-11-29 03:55:17,N,1,74,42,1,0.98,5.5,0.5,0.5,2.04,0.0,,0.3,8.84,1,1.0 +131427,2,2018-11-06 17:46:21,2018-11-06 17:50:42,N,1,41,42,1,0.85,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1.0 +368687,2,2018-11-17 10:13:10,2018-11-17 10:27:14,N,1,95,82,1,1.65,10.5,0.0,0.5,0.0,0.0,,0.3,11.3,2,1.0 +13155,1,2018-11-01 15:40:11,2018-11-01 15:48:58,N,1,43,236,1,0.8,7.0,0.0,0.5,1.17,0.0,,0.3,8.97,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2018-12.csv new file mode 100644 index 000000000000..375b98f8a05a --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/tripdata_2018-12.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type +644743,2,2018-12-29 22:07:39,2018-12-29 22:23:38,N,1,255,7,1,4.2,15.0,0.5,0.5,3.26,0.0,,0.3,19.56,1,1.0 +241539,1,2018-12-11 13:27:48,2018-12-11 14:01:25,N,1,244,87,1,13.4,40.5,0.0,0.5,8.25,0.0,,0.3,49.55,1,1.0 +519266,2,2018-12-22 23:20:48,2018-12-22 23:42:53,N,1,255,225,1,4.03,16.5,0.5,0.5,3.56,0.0,,0.3,21.36,1,1.0 +419676,2,2018-12-18 20:30:48,2018-12-18 20:42:37,N,1,166,116,1,2.59,11.0,0.5,0.5,0.0,0.0,,0.3,12.3,1,1.0 +110768,2,2018-12-05 21:20:39,2018-12-05 21:45:12,N,1,255,61,1,4.08,18.5,0.5,0.5,5.94,0.0,,0.3,25.74,1,1.0 +591680,2,2018-12-27 11:21:47,2018-12-27 12:21:54,N,1,50,9,1,15.01,56.0,0.0,0.5,0.0,5.76,,0.3,62.56,1,1.0 +532284,2,2018-12-23 16:30:52,2018-12-23 16:39:42,N,1,74,75,1,0.57,7.0,0.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 +149369,2,2018-12-07 14:18:11,2018-12-07 14:40:06,N,1,179,95,1,6.96,23.0,0.0,0.5,4.76,0.0,,0.3,28.56,1,1.0 +40899,2,2018-12-02 19:07:00,2018-12-02 19:17:53,N,1,97,49,1,1.74,9.0,0.0,0.5,2.45,0.0,,0.3,12.25,1,1.0 +341430,2,2018-12-15 14:19:06,2018-12-15 14:32:47,N,5,11,29,1,4.9,15.06,0.0,0.5,0.0,0.0,,0.0,15.56,1,2.0 +400460,2,2018-12-18 06:09:49,2018-12-18 06:27:36,N,5,14,231,1,7.75,24.39,0.0,0.5,0.0,5.76,,0.0,30.65,1,2.0 +320076,2,2018-12-14 18:00:27,2018-12-14 18:09:47,N,1,7,7,1,0.83,7.0,1.0,0.5,0.0,0.0,,0.3,8.8,2,1.0 +263463,2,2018-12-12 12:06:59,2018-12-12 12:17:05,N,1,260,223,1,3.48,12.5,0.0,0.5,0.0,0.0,,0.3,13.3,2,1.0 +245734,2,2018-12-11 16:20:31,2018-12-11 16:30:51,N,1,75,151,1,1.58,8.5,1.0,0.5,0.0,0.0,,0.3,10.3,2,1.0 +173368,2,2018-12-08 11:34:24,2018-12-08 11:53:37,N,1,181,61,1,3.79,15.5,0.0,0.5,0.0,0.0,,0.3,16.3,1,1.0 +37580,2,2018-12-02 15:23:58,2018-12-02 15:45:49,N,1,82,28,1,3.73,16.5,0.0,0.5,0.0,0.0,,0.3,17.3,1,1.0 +82903,2,2018-12-04 19:07:55,2018-12-04 19:32:12,N,5,242,167,1,4.84,19.86,0.0,0.5,0.0,0.0,,0.0,20.36,1,2.0 +531182,2,2018-12-23 15:43:36,2018-12-23 16:16:42,N,1,82,173,1,2.87,20.0,0.0,0.5,0.0,0.0,,0.3,20.8,2,1.0 +532295,2,2018-12-23 16:09:21,2018-12-23 16:15:12,N,1,181,181,1,0.64,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 +112713,2,2018-12-05 23:07:24,2018-12-05 23:15:40,N,1,129,129,1,1.3,7.5,0.5,0.5,1.76,0.0,,0.3,10.56,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2019-01.csv new file mode 100644 index 000000000000..07a92dc26d64 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/tripdata_2019-01.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +198236,2,2019-01-11 01:12:46,2019-01-11 01:20:37,N,1,129,129,1,1.39,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, +33286,1,2019-01-02 23:41:23,2019-01-02 23:49:14,N,1,255,112,1,1.5,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, +517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1, +368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1, +155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1, +366694,2,2019-01-18 22:58:57,2019-01-18 23:15:37,N,1,75,185,1,8.71,24.5,0.5,0.5,0.0,0.0,,0.3,25.8,1,1, +474972,2,2019-01-24 17:59:00,2019-01-24 18:35:53,N,5,195,90,1,6.46,23.45,0.0,0.5,0.0,0.0,,0.0,23.95,1,2, +69932,1,2019-01-04 18:07:02,2019-01-04 18:34:37,N,1,97,35,2,4.7,20.0,1.0,0.5,0.0,0.0,,0.3,21.8,1,1, +244782,2,2019-01-13 02:16:18,2019-01-13 02:37:30,N,1,260,198,1,3.9,16.5,0.5,0.5,0.0,0.0,,0.3,17.8,2,1, +482363,2,2019-01-25 04:42:54,2019-01-25 04:47:06,N,1,82,56,1,0.74,5.0,0.5,0.5,0.0,0.0,,0.3,6.3,2,1, +573895,2,2019-01-29 11:36:48,2019-01-29 11:43:03,N,1,43,236,1,0.91,6.0,0.0,0.5,2.0,0.0,,0.3,8.8,1,1,0.0 +182986,2,2019-01-10 12:07:54,2019-01-10 12:19:31,N,1,29,155,1,5.49,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, +526993,1,2019-01-26 22:13:18,2019-01-26 22:13:51,N,1,33,33,1,0.1,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1, +490940,2,2019-01-25 13:09:50,2019-01-25 13:15:30,N,1,220,153,1,0.35,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,1,1, +145635,2,2019-01-08 16:01:58,2019-01-08 16:10:16,N,1,77,72,1,1.48,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,1,1, +242083,1,2019-01-12 23:43:28,2019-01-13 00:02:13,N,1,210,216,1,12.3,34.5,0.5,0.5,0.0,0.0,,0.3,35.8,2,1, +328924,2,2019-01-17 10:07:09,2019-01-17 10:25:59,N,1,32,242,1,4.71,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, +132260,1,2019-01-07 21:00:39,2019-01-07 21:18:13,N,1,52,188,1,3.6,14.5,0.5,0.5,0.0,0.0,,0.3,15.8,2,1, +568032,2,2019-01-29 07:01:03,2019-01-29 07:54:28,N,1,225,251,1,16.91,55.0,0.0,0.5,0.0,11.52,,0.3,67.32,1,1,0.0 +92205,2,2019-01-05 20:26:49,2019-01-05 20:44:48,N,1,29,108,1,3.93,15.5,0.5,0.5,0.0,0.0,,0.3,16.8,1,1, diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2019-02.csv new file mode 100644 index 000000000000..9a6442e61899 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/tripdata_2019-02.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +504629,2,2019-02-25 16:05:49,2019-02-25 16:11:15,N,1,179,179,1,0.89,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,2,1,0.0 +475628,1,2019-02-23 23:18:58,2019-02-23 23:31:00,N,1,210,55,1,4.2,15.0,0.5,0.5,0.0,0.0,,0.3,16.3,2,1,0.0 +150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75 +245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0 +151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0 +18916,2,2019-02-01 19:54:28,2019-02-01 19:58:41,N,1,74,262,1,1.04,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,1,1,0.0 +110154,2,2019-02-06 08:33:30,2019-02-06 08:48:52,N,1,66,87,1,2.14,12.0,0.0,0.5,4.66,0.0,,0.3,20.21,1,1,2.75 +335114,2,2019-02-16 18:40:24,2019-02-16 18:46:47,N,1,74,75,1,0.89,6.0,0.0,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 +453776,2,2019-02-22 22:38:18,2019-02-22 22:49:20,N,1,116,41,1,1.86,10.0,0.5,0.5,2.26,0.0,,0.3,13.56,1,1,0.0 +192685,2,2019-02-09 19:56:13,2019-02-09 20:13:56,N,1,167,235,1,2.59,13.0,0.0,0.5,0.0,0.0,,0.3,13.8,1,1,0.0 +574982,2,2019-02-28 22:42:50,2019-02-28 22:47:40,N,1,42,116,1,0.91,5.5,0.5,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 +155855,2,2019-02-08 09:02:21,2019-02-08 10:17:23,N,5,210,161,1,20.74,59.86,0.0,0.5,0.0,0.0,,0.0,60.36,1,2,0.0 +130109,2,2019-02-07 05:18:25,2019-02-07 05:24:30,N,1,62,188,4,1.54,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,1,1,0.0 +157145,2,2019-02-08 10:20:47,2019-02-08 10:47:36,N,1,74,31,2,7.73,25.5,0.0,0.5,5.26,0.0,,0.3,31.56,1,1,0.0 +411748,2,2019-02-20 22:31:32,2019-02-20 22:37:11,N,1,129,129,4,0.9,5.5,0.5,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 +470538,2,2019-02-23 18:15:38,2019-02-23 18:27:05,N,1,126,213,1,2.21,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,1,1,0.0 +262705,2,2019-02-13 13:07:02,2019-02-13 13:19:50,N,1,186,249,1,1.5,9.5,0.0,0.5,2.56,0.0,,0.3,15.36,1,1,2.5 +163160,2,2019-02-08 15:39:26,2019-02-08 16:06:31,N,1,33,17,1,2.9,17.5,0.0,0.5,0.0,0.0,,0.3,18.3,2,1,0.0 +263174,2,2019-02-13 15:06:57,2019-02-13 15:28:26,N,1,165,123,5,1.66,13.5,0.0,0.5,0.0,0.0,,0.3,14.3,1,1,0.0 +451270,2,2019-02-22 21:00:04,2019-02-22 21:00:24,N,1,82,82,1,0.06,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2019-03.csv new file mode 100644 index 000000000000..5104e10f24c5 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green/tripdata_2019-03.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +337596,1,2019-03-18 08:20:31,2019-03-18 09:12:55,N,1,49,35,1,6.1,33.5,0.0,0.5,0.0,0.0,,0.3,34.3,1,1,0.0 +378981,2,2019-03-20 13:03:10,2019-03-20 13:16:30,N,1,152,41,1,1.14,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 +6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0 +76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75 +282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0 +439163,2,2019-03-23 12:07:55,2019-03-23 12:11:20,N,1,166,166,1,0.75,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1,0.0 +518619,2,2019-03-27 18:33:46,2019-03-27 18:40:59,N,1,7,223,1,0.59,6.0,1.0,0.5,1.95,0.0,,0.3,9.75,1,1,0.0 +385766,2,2019-03-20 18:19:37,2019-03-20 18:28:31,N,1,129,129,1,1.21,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,2,1,0.0 +131793,2,2019-03-07 18:40:23,2019-03-07 18:58:26,N,1,51,32,1,3.35,14.0,1.0,0.5,0.0,0.0,,0.3,15.8,1,1,0.0 +203472,2,2019-03-11 12:46:20,2019-03-11 13:20:08,N,1,119,254,3,6.47,27.5,0.0,0.5,0.0,0.0,,0.3,28.3,1,1,0.0 +399106,2,2019-03-21 13:10:44,2019-03-21 13:21:19,N,1,74,75,1,1.27,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 +425811,2,2019-03-22 19:00:23,2019-03-22 19:02:51,N,1,95,95,1,0.51,4.0,1.0,0.5,2.0,0.0,,0.3,7.8,1,1,0.0 +36345,2,2019-03-02 19:27:41,2019-03-02 19:33:52,N,1,7,179,1,0.86,6.0,0.0,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 +323601,2,2019-03-17 10:19:28,2019-03-17 10:22:59,N,1,97,49,1,0.68,4.5,0.0,0.5,1.06,0.0,,0.3,6.36,1,1,0.0 +246980,2,2019-03-13 17:03:07,2019-03-13 17:12:04,N,1,75,263,2,1.32,8.0,1.0,0.5,2.51,0.0,,0.3,15.06,1,1,2.75 +269737,1,2019-03-14 18:03:35,2019-03-14 18:10:01,N,1,66,65,1,0.6,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1,0.0 +145416,2,2019-03-08 12:21:02,2019-03-08 12:22:57,N,1,75,75,1,0.5,3.5,0.0,0.5,0.0,0.0,,0.3,4.3,2,1,0.0 +142333,2,2019-03-08 09:39:09,2019-03-08 10:03:48,N,5,215,16,1,7.14,21.62,0.0,0.5,0.0,0.0,,0.0,22.12,1,2,0.0 +381567,2,2019-03-20 15:09:47,2019-03-20 15:22:12,N,1,89,188,1,1.75,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 +476898,1,2019-03-25 13:10:35,2019-03-25 13:20:23,N,1,244,243,1,1.1,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/green_tripdata_2018-10.csv new file mode 100644 index 000000000000..2b518c3e4a83 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green_tripdata_2018-10.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type +448278,2,2018-10-20 13:58:45,2018-10-20 14:05:35,N,1,112,256,1,1.03,6.5,0.0,0.5,0.0,0.0,,0.3,7.3,2,1.0 +520261,1,2018-10-23 17:48:10,2018-10-23 17:55:35,Y,1,181,181,1,0.9,6.5,1.0,0.5,1.5,0.0,,0.3,9.8,1,1.0 +520094,2,2018-10-23 17:39:25,2018-10-23 18:42:28,N,1,75,133,1,14.6,48.5,1.0,0.5,11.21,5.76,,0.3,67.27,1,1.0 +465246,2,2018-10-21 03:41:56,2018-10-21 03:45:23,N,1,95,95,2,0.86,5.0,0.5,0.5,1.0,0.0,,0.3,7.3,1,1.0 +652784,2,2018-10-29 12:03:25,2018-10-29 12:14:19,N,1,18,31,1,2.24,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,2,1.0 +234842,2,2018-10-11 15:24:48,2018-10-11 15:47:51,N,1,197,95,1,3.42,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1.0 +199443,2,2018-10-09 22:19:35,2018-10-09 22:23:38,N,1,41,42,1,0.63,5.0,0.5,0.5,1.26,0.0,,0.3,7.56,1,1.0 +478271,2,2018-10-21 18:16:09,2018-10-21 18:25:49,N,1,74,263,1,2.25,9.0,0.0,0.5,1.96,0.0,,0.3,11.76,1,1.0 +480009,2,2018-10-21 19:32:30,2018-10-21 19:54:38,N,1,95,260,2,4.31,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 +621419,2,2018-10-27 22:52:50,2018-10-27 22:55:15,N,1,136,136,1,0.01,3.5,0.5,0.5,0.0,0.0,,0.3,4.8,2,1.0 +60768,2,2018-10-03 19:47:13,2018-10-03 19:50:39,N,1,256,255,1,0.52,4.0,1.0,0.5,1.45,0.0,,0.3,7.25,1,1.0 +559361,2,2018-10-25 12:04:52,2018-10-25 12:39:25,N,5,65,76,1,5.8,21.86,0.0,0.5,0.0,0.0,,0.0,22.36,1,2.0 +226070,2,2018-10-11 08:52:56,2018-10-11 09:22:58,N,1,166,163,1,3.73,20.0,0.0,0.5,4.16,0.0,,0.3,24.96,1,1.0 +578687,2,2018-10-26 08:44:43,2018-10-26 09:20:34,N,1,49,114,5,3.58,23.0,0.0,0.5,4.76,0.0,,0.3,30.51,1,1.0 +133625,2,2018-10-06 19:59:11,2018-10-06 20:38:07,N,1,181,100,2,9.41,34.5,0.0,0.5,10.26,5.76,,0.3,51.32,1,1.0 +118040,2,2018-10-06 03:38:45,2018-10-07 03:18:41,N,1,256,256,1,0.65,4.5,0.5,0.5,0.0,0.0,,0.3,5.8,2,1.0 +199254,2,2018-10-09 22:23:05,2018-10-09 22:33:00,N,1,181,97,1,1.53,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 +508650,2,2018-10-23 08:21:15,2018-10-23 08:40:55,N,1,166,142,5,2.41,14.0,0.0,0.5,1.0,0.0,,0.3,15.8,1,1.0 +597703,2,2018-10-26 21:14:29,2018-10-26 21:22:49,N,1,41,239,5,1.8,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 +472658,2,2018-10-21 13:36:13,2018-10-21 13:41:32,N,1,7,179,1,0.97,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/green_tripdata_2018-11.csv new file mode 100644 index 000000000000..f68bd7715093 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green_tripdata_2018-11.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type +206147,2,2018-11-09 21:58:22,2018-11-09 22:09:19,N,1,97,17,1,1.69,9.0,0.5,0.5,0.0,0.0,,0.3,10.3,2,1.0 +586398,2,2018-11-28 07:16:28,2018-11-28 07:40:30,N,1,81,250,1,4.39,20.5,0.0,0.5,0.0,0.0,,0.3,21.3,1,1.0 +410503,2,2018-11-19 08:36:21,2018-11-19 08:53:22,N,1,116,128,1,4.59,17.5,0.0,0.5,4.58,0.0,,0.3,22.88,1,1.0 +284524,2,2018-11-13 14:57:24,2018-11-13 15:23:48,N,1,53,121,1,5.38,21.5,0.0,0.5,0.0,0.0,,0.3,22.3,1,1.0 +652213,2,2018-11-30 20:18:38,2018-11-30 20:23:04,N,1,75,74,1,1.23,6.0,0.5,0.5,1.46,0.0,,0.3,8.76,1,1.0 +453632,2,2018-11-21 08:46:22,2018-11-21 09:12:23,N,5,17,35,1,3.68,16.39,0.0,0.5,0.0,0.0,,0.0,16.89,1,2.0 +514609,2,2018-11-24 14:03:10,2018-11-24 14:26:29,N,1,149,89,1,4.51,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 +570411,2,2018-11-27 12:59:00,2018-11-27 13:03:56,N,1,25,25,1,0.79,5.5,0.0,0.5,1.26,0.0,,0.3,7.56,1,1.0 +328751,2,2018-11-15 12:26:07,2018-11-15 12:58:31,N,5,52,37,1,6.39,21.25,0.0,0.5,0.0,0.0,,0.0,21.75,1,2.0 +290145,2,2018-11-13 18:54:48,2018-11-13 19:04:14,N,1,74,41,1,1.15,7.5,1.0,0.5,1.86,0.0,,0.3,11.16,1,1.0 +273210,2,2018-11-12 23:48:01,2018-11-12 23:48:03,N,5,166,166,1,0.0,20.0,0.0,0.0,4.0,0.0,,0.0,24.0,1,2.0 +598576,2,2018-11-28 16:55:56,2018-11-28 17:03:05,N,1,41,74,2,0.71,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 +19526,2,2018-11-01 19:20:33,2018-11-01 19:27:39,N,1,25,181,1,0.87,6.5,1.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 +645647,2,2018-11-30 16:50:09,2018-11-30 16:56:01,N,1,7,7,2,0.69,5.5,1.0,0.5,1.46,0.0,,0.3,10.71,1,1.0 +642343,2,2018-11-30 14:56:19,2018-11-30 15:06:20,N,1,33,97,1,1.08,7.5,0.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 +284366,2,2018-11-13 14:23:08,2018-11-13 14:53:25,N,1,81,20,1,6.94,26.0,0.0,0.5,0.0,0.0,,0.3,26.8,1,1.0 +608380,2,2018-11-29 03:50:30,2018-11-29 03:55:17,N,1,74,42,1,0.98,5.5,0.5,0.5,2.04,0.0,,0.3,8.84,1,1.0 +131427,2,2018-11-06 17:46:21,2018-11-06 17:50:42,N,1,41,42,1,0.85,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1.0 +368687,2,2018-11-17 10:13:10,2018-11-17 10:27:14,N,1,95,82,1,1.65,10.5,0.0,0.5,0.0,0.0,,0.3,11.3,2,1.0 +13155,1,2018-11-01 15:40:11,2018-11-01 15:48:58,N,1,43,236,1,0.8,7.0,0.0,0.5,1.17,0.0,,0.3,8.97,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/green_tripdata_2018-12.csv new file mode 100644 index 000000000000..375b98f8a05a --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green_tripdata_2018-12.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type +644743,2,2018-12-29 22:07:39,2018-12-29 22:23:38,N,1,255,7,1,4.2,15.0,0.5,0.5,3.26,0.0,,0.3,19.56,1,1.0 +241539,1,2018-12-11 13:27:48,2018-12-11 14:01:25,N,1,244,87,1,13.4,40.5,0.0,0.5,8.25,0.0,,0.3,49.55,1,1.0 +519266,2,2018-12-22 23:20:48,2018-12-22 23:42:53,N,1,255,225,1,4.03,16.5,0.5,0.5,3.56,0.0,,0.3,21.36,1,1.0 +419676,2,2018-12-18 20:30:48,2018-12-18 20:42:37,N,1,166,116,1,2.59,11.0,0.5,0.5,0.0,0.0,,0.3,12.3,1,1.0 +110768,2,2018-12-05 21:20:39,2018-12-05 21:45:12,N,1,255,61,1,4.08,18.5,0.5,0.5,5.94,0.0,,0.3,25.74,1,1.0 +591680,2,2018-12-27 11:21:47,2018-12-27 12:21:54,N,1,50,9,1,15.01,56.0,0.0,0.5,0.0,5.76,,0.3,62.56,1,1.0 +532284,2,2018-12-23 16:30:52,2018-12-23 16:39:42,N,1,74,75,1,0.57,7.0,0.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 +149369,2,2018-12-07 14:18:11,2018-12-07 14:40:06,N,1,179,95,1,6.96,23.0,0.0,0.5,4.76,0.0,,0.3,28.56,1,1.0 +40899,2,2018-12-02 19:07:00,2018-12-02 19:17:53,N,1,97,49,1,1.74,9.0,0.0,0.5,2.45,0.0,,0.3,12.25,1,1.0 +341430,2,2018-12-15 14:19:06,2018-12-15 14:32:47,N,5,11,29,1,4.9,15.06,0.0,0.5,0.0,0.0,,0.0,15.56,1,2.0 +400460,2,2018-12-18 06:09:49,2018-12-18 06:27:36,N,5,14,231,1,7.75,24.39,0.0,0.5,0.0,5.76,,0.0,30.65,1,2.0 +320076,2,2018-12-14 18:00:27,2018-12-14 18:09:47,N,1,7,7,1,0.83,7.0,1.0,0.5,0.0,0.0,,0.3,8.8,2,1.0 +263463,2,2018-12-12 12:06:59,2018-12-12 12:17:05,N,1,260,223,1,3.48,12.5,0.0,0.5,0.0,0.0,,0.3,13.3,2,1.0 +245734,2,2018-12-11 16:20:31,2018-12-11 16:30:51,N,1,75,151,1,1.58,8.5,1.0,0.5,0.0,0.0,,0.3,10.3,2,1.0 +173368,2,2018-12-08 11:34:24,2018-12-08 11:53:37,N,1,181,61,1,3.79,15.5,0.0,0.5,0.0,0.0,,0.3,16.3,1,1.0 +37580,2,2018-12-02 15:23:58,2018-12-02 15:45:49,N,1,82,28,1,3.73,16.5,0.0,0.5,0.0,0.0,,0.3,17.3,1,1.0 +82903,2,2018-12-04 19:07:55,2018-12-04 19:32:12,N,5,242,167,1,4.84,19.86,0.0,0.5,0.0,0.0,,0.0,20.36,1,2.0 +531182,2,2018-12-23 15:43:36,2018-12-23 16:16:42,N,1,82,173,1,2.87,20.0,0.0,0.5,0.0,0.0,,0.3,20.8,2,1.0 +532295,2,2018-12-23 16:09:21,2018-12-23 16:15:12,N,1,181,181,1,0.64,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 +112713,2,2018-12-05 23:07:24,2018-12-05 23:15:40,N,1,129,129,1,1.3,7.5,0.5,0.5,1.76,0.0,,0.3,10.56,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/green_tripdata_2019-01.csv new file mode 100644 index 000000000000..07a92dc26d64 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green_tripdata_2019-01.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +198236,2,2019-01-11 01:12:46,2019-01-11 01:20:37,N,1,129,129,1,1.39,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, +33286,1,2019-01-02 23:41:23,2019-01-02 23:49:14,N,1,255,112,1,1.5,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, +517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1, +368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1, +155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1, +366694,2,2019-01-18 22:58:57,2019-01-18 23:15:37,N,1,75,185,1,8.71,24.5,0.5,0.5,0.0,0.0,,0.3,25.8,1,1, +474972,2,2019-01-24 17:59:00,2019-01-24 18:35:53,N,5,195,90,1,6.46,23.45,0.0,0.5,0.0,0.0,,0.0,23.95,1,2, +69932,1,2019-01-04 18:07:02,2019-01-04 18:34:37,N,1,97,35,2,4.7,20.0,1.0,0.5,0.0,0.0,,0.3,21.8,1,1, +244782,2,2019-01-13 02:16:18,2019-01-13 02:37:30,N,1,260,198,1,3.9,16.5,0.5,0.5,0.0,0.0,,0.3,17.8,2,1, +482363,2,2019-01-25 04:42:54,2019-01-25 04:47:06,N,1,82,56,1,0.74,5.0,0.5,0.5,0.0,0.0,,0.3,6.3,2,1, +573895,2,2019-01-29 11:36:48,2019-01-29 11:43:03,N,1,43,236,1,0.91,6.0,0.0,0.5,2.0,0.0,,0.3,8.8,1,1,0.0 +182986,2,2019-01-10 12:07:54,2019-01-10 12:19:31,N,1,29,155,1,5.49,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, +526993,1,2019-01-26 22:13:18,2019-01-26 22:13:51,N,1,33,33,1,0.1,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1, +490940,2,2019-01-25 13:09:50,2019-01-25 13:15:30,N,1,220,153,1,0.35,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,1,1, +145635,2,2019-01-08 16:01:58,2019-01-08 16:10:16,N,1,77,72,1,1.48,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,1,1, +242083,1,2019-01-12 23:43:28,2019-01-13 00:02:13,N,1,210,216,1,12.3,34.5,0.5,0.5,0.0,0.0,,0.3,35.8,2,1, +328924,2,2019-01-17 10:07:09,2019-01-17 10:25:59,N,1,32,242,1,4.71,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, +132260,1,2019-01-07 21:00:39,2019-01-07 21:18:13,N,1,52,188,1,3.6,14.5,0.5,0.5,0.0,0.0,,0.3,15.8,2,1, +568032,2,2019-01-29 07:01:03,2019-01-29 07:54:28,N,1,225,251,1,16.91,55.0,0.0,0.5,0.0,11.52,,0.3,67.32,1,1,0.0 +92205,2,2019-01-05 20:26:49,2019-01-05 20:44:48,N,1,29,108,1,3.93,15.5,0.5,0.5,0.0,0.0,,0.3,16.8,1,1, diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/green_tripdata_2019-02.csv new file mode 100644 index 000000000000..9a6442e61899 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green_tripdata_2019-02.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +504629,2,2019-02-25 16:05:49,2019-02-25 16:11:15,N,1,179,179,1,0.89,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,2,1,0.0 +475628,1,2019-02-23 23:18:58,2019-02-23 23:31:00,N,1,210,55,1,4.2,15.0,0.5,0.5,0.0,0.0,,0.3,16.3,2,1,0.0 +150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75 +245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0 +151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0 +18916,2,2019-02-01 19:54:28,2019-02-01 19:58:41,N,1,74,262,1,1.04,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,1,1,0.0 +110154,2,2019-02-06 08:33:30,2019-02-06 08:48:52,N,1,66,87,1,2.14,12.0,0.0,0.5,4.66,0.0,,0.3,20.21,1,1,2.75 +335114,2,2019-02-16 18:40:24,2019-02-16 18:46:47,N,1,74,75,1,0.89,6.0,0.0,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 +453776,2,2019-02-22 22:38:18,2019-02-22 22:49:20,N,1,116,41,1,1.86,10.0,0.5,0.5,2.26,0.0,,0.3,13.56,1,1,0.0 +192685,2,2019-02-09 19:56:13,2019-02-09 20:13:56,N,1,167,235,1,2.59,13.0,0.0,0.5,0.0,0.0,,0.3,13.8,1,1,0.0 +574982,2,2019-02-28 22:42:50,2019-02-28 22:47:40,N,1,42,116,1,0.91,5.5,0.5,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 +155855,2,2019-02-08 09:02:21,2019-02-08 10:17:23,N,5,210,161,1,20.74,59.86,0.0,0.5,0.0,0.0,,0.0,60.36,1,2,0.0 +130109,2,2019-02-07 05:18:25,2019-02-07 05:24:30,N,1,62,188,4,1.54,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,1,1,0.0 +157145,2,2019-02-08 10:20:47,2019-02-08 10:47:36,N,1,74,31,2,7.73,25.5,0.0,0.5,5.26,0.0,,0.3,31.56,1,1,0.0 +411748,2,2019-02-20 22:31:32,2019-02-20 22:37:11,N,1,129,129,4,0.9,5.5,0.5,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 +470538,2,2019-02-23 18:15:38,2019-02-23 18:27:05,N,1,126,213,1,2.21,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,1,1,0.0 +262705,2,2019-02-13 13:07:02,2019-02-13 13:19:50,N,1,186,249,1,1.5,9.5,0.0,0.5,2.56,0.0,,0.3,15.36,1,1,2.5 +163160,2,2019-02-08 15:39:26,2019-02-08 16:06:31,N,1,33,17,1,2.9,17.5,0.0,0.5,0.0,0.0,,0.3,18.3,2,1,0.0 +263174,2,2019-02-13 15:06:57,2019-02-13 15:28:26,N,1,165,123,5,1.66,13.5,0.0,0.5,0.0,0.0,,0.3,14.3,1,1,0.0 +451270,2,2019-02-22 21:00:04,2019-02-22 21:00:24,N,1,82,82,1,0.06,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/green_tripdata_2019-03.csv new file mode 100644 index 000000000000..5104e10f24c5 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/green_tripdata_2019-03.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +337596,1,2019-03-18 08:20:31,2019-03-18 09:12:55,N,1,49,35,1,6.1,33.5,0.0,0.5,0.0,0.0,,0.3,34.3,1,1,0.0 +378981,2,2019-03-20 13:03:10,2019-03-20 13:16:30,N,1,152,41,1,1.14,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 +6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0 +76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75 +282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0 +439163,2,2019-03-23 12:07:55,2019-03-23 12:11:20,N,1,166,166,1,0.75,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1,0.0 +518619,2,2019-03-27 18:33:46,2019-03-27 18:40:59,N,1,7,223,1,0.59,6.0,1.0,0.5,1.95,0.0,,0.3,9.75,1,1,0.0 +385766,2,2019-03-20 18:19:37,2019-03-20 18:28:31,N,1,129,129,1,1.21,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,2,1,0.0 +131793,2,2019-03-07 18:40:23,2019-03-07 18:58:26,N,1,51,32,1,3.35,14.0,1.0,0.5,0.0,0.0,,0.3,15.8,1,1,0.0 +203472,2,2019-03-11 12:46:20,2019-03-11 13:20:08,N,1,119,254,3,6.47,27.5,0.0,0.5,0.0,0.0,,0.3,28.3,1,1,0.0 +399106,2,2019-03-21 13:10:44,2019-03-21 13:21:19,N,1,74,75,1,1.27,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 +425811,2,2019-03-22 19:00:23,2019-03-22 19:02:51,N,1,95,95,1,0.51,4.0,1.0,0.5,2.0,0.0,,0.3,7.8,1,1,0.0 +36345,2,2019-03-02 19:27:41,2019-03-02 19:33:52,N,1,7,179,1,0.86,6.0,0.0,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 +323601,2,2019-03-17 10:19:28,2019-03-17 10:22:59,N,1,97,49,1,0.68,4.5,0.0,0.5,1.06,0.0,,0.3,6.36,1,1,0.0 +246980,2,2019-03-13 17:03:07,2019-03-13 17:12:04,N,1,75,263,2,1.32,8.0,1.0,0.5,2.51,0.0,,0.3,15.06,1,1,2.75 +269737,1,2019-03-14 18:03:35,2019-03-14 18:10:01,N,1,66,65,1,0.6,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1,0.0 +145416,2,2019-03-08 12:21:02,2019-03-08 12:22:57,N,1,75,75,1,0.5,3.5,0.0,0.5,0.0,0.0,,0.3,4.3,2,1,0.0 +142333,2,2019-03-08 09:39:09,2019-03-08 10:03:48,N,5,215,16,1,7.14,21.62,0.0,0.5,0.0,0.0,,0.0,22.12,1,2,0.0 +381567,2,2019-03-20 15:09:47,2019-03-20 15:22:12,N,1,89,188,1,1.75,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 +476898,1,2019-03-25 13:10:35,2019-03-25 13:20:23,N,1,244,243,1,1.1,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/yellow/2018/10/tripdata.csv b/tests/test_sets/dataconnector_docs/yellow/2018/10/tripdata.csv new file mode 100644 index 000000000000..0ce6520c5822 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/2018/10/tripdata.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +6984,2,2018-10-30 10:59:02,2018-10-30 11:04:30,1,0.62,1,N,48,163,1,5.5,0.0,0.5,1.26,0.0,0.3,7.56, +3030,2,2018-10-03 19:43:48,2018-10-03 20:01:51,1,3.93,1,N,137,239,1,15.0,1.0,0.5,2.5,0.0,0.3,19.3, +9267,1,2018-10-16 07:10:28,2018-10-16 07:19:21,1,1.4,1,N,142,236,1,8.0,0.0,0.5,0.88,0.0,0.3,9.68, +4342,1,2018-10-10 10:19:16,2018-10-10 10:33:52,1,2.1,1,N,142,50,2,11.5,0.0,0.5,0.0,0.0,0.3,12.3, +8099,2,2018-10-01 19:18:44,2018-10-01 19:31:48,1,2.51,1,N,140,239,1,11.0,1.0,0.5,3.2,0.0,0.3,16.0, +8972,2,2018-10-19 01:40:24,2018-10-19 01:51:42,2,2.54,1,N,249,164,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, +240,1,2018-10-11 19:54:10,2018-10-11 20:19:34,1,5.2,1,N,231,232,1,22.0,1.0,0.5,4.76,0.0,0.3,28.56, +7844,2,2018-10-30 08:58:08,2018-10-30 09:05:09,1,1.08,1,N,249,211,1,6.5,0.0,0.5,1.0,0.0,0.3,8.3, +7024,1,2018-10-11 04:56:32,2018-10-11 05:14:29,1,8.0,1,N,263,138,1,25.0,0.5,0.5,8.0,5.76,0.3,40.06, +7601,2,2018-10-27 00:44:06,2018-10-27 00:57:52,1,2.18,1,N,113,100,1,11.0,0.5,0.5,2.46,0.0,0.3,14.76, +7686,2,2018-10-01 17:13:29,2018-10-01 17:16:10,5,0.36,1,N,263,236,1,3.5,1.0,0.5,1.59,0.0,0.3,6.89, +1344,1,2018-10-03 20:31:19,2018-10-03 21:11:37,1,20.2,3,N,236,1,1,73.5,0.5,0.0,18.35,17.5,0.3,110.15, +2539,2,2018-10-24 00:46:47,2018-10-24 01:07:38,1,14.1,1,N,132,210,1,39.0,0.5,0.5,5.0,0.0,0.3,45.3, +2758,1,2018-10-17 22:25:54,2018-10-17 22:42:38,1,6.2,1,N,237,87,1,20.0,0.5,0.5,2.0,0.0,0.3,23.3, +567,1,2018-10-22 15:57:44,2018-10-22 16:25:52,1,3.1,1,N,264,264,1,18.5,0.0,0.5,3.85,0.0,0.3,23.15, +1994,1,2018-10-28 13:36:15,2018-10-28 13:47:38,1,1.8,1,N,79,232,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, +3549,2,2018-10-25 21:00:53,2018-10-25 21:17:23,1,2.58,1,N,170,48,1,12.0,0.5,0.5,1.0,0.0,0.3,14.3, +3867,2,2018-10-16 13:26:54,2018-10-16 13:51:57,3,1.8,1,N,230,158,1,16.0,0.0,0.5,3.36,0.0,0.3,20.16, +864,2,2018-10-20 10:53:46,2018-10-20 11:03:28,1,1.22,1,N,262,75,1,8.0,0.0,0.5,2.64,0.0,0.3,11.44, +9457,1,2018-10-01 18:19:51,2018-10-01 18:39:05,1,2.6,1,N,144,186,1,14.0,1.0,0.5,3.15,0.0,0.3,18.95, diff --git a/tests/test_sets/dataconnector_docs/yellow/2018/11/tripdata.csv b/tests/test_sets/dataconnector_docs/yellow/2018/11/tripdata.csv new file mode 100644 index 000000000000..87ff7e28e6fa --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/2018/11/tripdata.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +7201,2,2018-11-03 20:17:25,2018-11-03 20:25:26,1,1.15,1,N,166,238,1,7.5,0.5,0.5,2.2,0.0,0.3,11.0, +2578,2,2018-11-14 09:03:52,2018-11-14 09:21:39,1,0.9,1,N,230,230,1,11.5,0.0,0.5,1.0,0.0,0.3,13.3, +9216,1,2018-11-16 13:12:09,2018-11-16 13:24:41,1,1.5,1,N,238,143,2,9.5,0.0,0.5,0.0,0.0,0.3,10.3, +5518,1,2018-11-15 20:44:20,2018-11-15 20:58:41,1,1.7,1,N,236,151,1,11.0,0.5,0.5,0.0,0.0,0.3,12.3, +104,2,2018-11-07 00:01:14,2018-11-07 00:05:56,5,0.85,1,N,234,170,1,5.5,0.5,0.5,1.36,0.0,0.3,8.16, +536,2,2018-11-11 12:43:04,2018-11-11 12:59:33,1,2.11,1,N,233,114,1,11.5,0.0,0.5,2.46,0.0,0.3,14.76, +2167,2,2018-11-15 14:14:45,2018-11-15 14:24:39,2,0.8,1,N,142,230,1,7.5,0.0,0.5,2.08,0.0,0.3,10.38, +5875,4,2018-11-07 07:24:55,2018-11-07 07:30:28,1,0.81,1,N,164,233,2,5.5,0.0,0.5,0.0,0.0,0.3,6.3, +8196,2,2018-11-05 13:46:25,2018-11-05 13:47:12,3,0.08,1,N,236,236,1,2.5,0.0,0.5,0.66,0.0,0.3,3.96, +8175,1,2018-11-13 20:46:11,2018-11-13 20:50:29,1,0.6,1,N,107,137,2,5.0,0.5,0.5,0.0,0.0,0.3,6.3, +6314,2,2018-11-25 19:36:38,2018-11-25 19:41:30,1,1.77,1,N,263,74,2,7.0,0.0,0.5,0.0,0.0,0.3,7.8, +7700,2,2018-11-18 21:33:49,2018-11-18 21:46:58,2,2.76,1,N,163,24,1,12.0,0.5,0.5,3.32,0.0,0.3,16.62, +9062,1,2018-11-03 18:39:31,2018-11-03 18:49:25,1,0.8,1,N,164,230,2,7.5,0.0,0.5,0.0,0.0,0.3,8.3, +6701,1,2018-11-20 06:12:00,2018-11-20 06:19:35,1,2.2,1,N,237,75,2,8.5,0.0,0.5,0.0,0.0,0.3,9.3, +399,2,2018-11-13 21:20:51,2018-11-13 21:34:45,1,1.18,1,N,162,230,1,9.5,0.5,0.5,5.0,0.0,0.3,15.8, +2745,2,2018-11-03 00:07:35,2018-11-03 00:29:31,1,2.55,1,N,68,4,2,15.0,0.5,0.5,0.0,0.0,0.3,16.3, +5363,2,2018-11-23 20:16:12,2018-11-23 20:20:46,1,1.05,1,N,237,162,2,5.5,0.5,0.5,0.0,0.0,0.3,6.8, +383,1,2018-11-10 22:31:50,2018-11-10 22:47:48,1,1.1,1,N,161,141,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, +1537,2,2018-11-17 01:05:40,2018-11-17 01:18:09,1,1.62,1,N,114,79,1,9.5,0.5,0.5,2.16,0.0,0.3,12.96, +1760,2,2018-11-01 13:52:35,2018-11-01 13:59:37,1,0.5,1,N,230,162,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8, diff --git a/tests/test_sets/dataconnector_docs/yellow/2018/12/tripdata.csv b/tests/test_sets/dataconnector_docs/yellow/2018/12/tripdata.csv new file mode 100644 index 000000000000..50eb34d13f83 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/2018/12/tripdata.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +6701,1,2018-12-07 22:07:55,2018-12-07 22:35:18,1,3.5,1,N,237,249,1,18.5,0.5,0.5,2.0,0.0,0.3,21.8, +9645,2,2018-12-10 18:21:08,2018-12-10 18:33:12,1,1.38,1,N,114,158,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, +4105,2,2018-12-15 23:27:23,2018-12-15 23:44:02,1,2.62,1,N,232,137,2,13.0,0.5,0.5,0.0,0.0,0.3,14.3, +2743,2,2018-12-12 19:54:39,2018-12-12 19:57:53,1,0.41,1,N,237,237,1,4.0,1.0,0.5,1.16,0.0,0.3,6.96, +4126,1,2018-12-21 18:57:55,2018-12-21 19:09:35,1,1.0,1,N,249,79,1,8.5,1.0,0.5,2.05,0.0,0.3,12.35, +1566,1,2018-12-16 15:38:10,2018-12-16 15:55:05,3,1.4,1,N,236,141,1,11.5,0.0,0.5,1.85,0.0,0.3,14.15, +4857,2,2018-12-06 17:26:50,2018-12-06 17:39:34,2,0.93,1,N,142,239,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, +304,1,2018-12-27 16:00:34,2018-12-27 16:34:26,2,5.7,1,N,163,209,2,24.0,1.0,0.5,0.0,0.0,0.3,25.8, +8159,2,2018-12-05 18:32:06,2018-12-05 18:41:03,1,1.38,1,N,68,90,1,8.0,1.0,0.5,2.94,0.0,0.3,12.74, +6575,4,2018-12-30 23:53:05,2018-12-30 23:55:40,1,0.39,1,N,256,256,1,4.0,0.5,0.5,1.06,0.0,0.3,6.36, +8327,2,2018-12-14 18:42:59,2018-12-14 18:51:49,1,1.26,1,N,163,236,1,8.0,1.0,0.5,1.5,0.0,0.3,11.3, +5245,1,2018-12-23 16:00:21,2018-12-23 16:12:44,1,1.6,1,N,141,142,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, +3521,2,2018-12-05 00:12:07,2018-12-05 00:30:25,2,3.15,1,N,164,45,1,14.0,0.5,0.5,3.06,0.0,0.3,18.36, +9442,2,2018-12-14 08:58:16,2018-12-14 09:06:42,1,0.67,1,N,239,238,1,6.0,0.0,0.5,0.8,0.0,0.3,7.6, +922,1,2018-12-09 03:45:39,2018-12-09 03:52:09,1,1.8,1,N,90,170,1,7.5,0.5,0.5,1.0,0.0,0.3,9.8, +807,2,2018-12-26 17:30:45,2018-12-26 17:53:04,1,5.34,1,N,164,261,2,19.0,1.0,0.5,0.0,0.0,0.3,20.8, +6354,2,2018-12-31 09:38:01,2018-12-31 09:46:53,1,2.2,1,N,24,236,1,9.5,0.0,0.5,2.0,0.0,0.3,12.3, +7329,2,2018-12-09 02:54:29,2018-12-09 03:00:14,5,1.19,1,N,164,246,1,6.5,0.5,0.5,1.95,0.0,0.3,9.75, +6227,1,2018-12-01 18:32:30,2018-12-01 19:06:38,1,4.2,1,N,90,263,1,22.0,0.0,0.5,4.55,0.0,0.3,27.35, +9796,1,2018-12-17 18:18:12,2018-12-17 18:32:05,1,1.4,1,N,68,107,1,10.0,1.0,0.5,1.2,0.0,0.3,13.0, diff --git a/tests/test_sets/dataconnector_docs/yellow/2019/01/tripdata.csv b/tests/test_sets/dataconnector_docs/yellow/2019/01/tripdata.csv new file mode 100644 index 000000000000..288e8ac8a023 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/2019/01/tripdata.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +4565,2,2019-01-15 02:04:57,2019-01-15 02:17:53,1,4.92,1,N,255,226,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36, +714,2,2019-01-08 00:38:49,2019-01-08 00:52:30,1,7.07,1,N,132,197,1,21.0,0.5,0.5,4.46,0.0,0.3,26.76, +2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0 +5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95, +4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56, +1505,2,2019-01-29 17:58:08,2019-01-29 18:11:19,1,0.72,1,N,13,13,2,9.0,1.0,0.5,0.0,0.0,0.3,10.8,0.0 +8131,1,2019-01-13 16:22:03,2019-01-13 16:33:06,2,1.4,1,N,161,170,1,8.5,0.0,0.5,3.0,0.0,0.3,12.3, +9601,2,2019-01-23 12:07:55,2019-01-23 12:11:19,5,0.57,1,N,239,239,1,4.0,0.0,0.5,1.0,0.0,0.3,5.8,0.0 +4522,2,2019-01-25 15:42:39,2019-01-25 15:50:54,1,1.26,1,N,170,229,1,7.5,0.0,0.5,1.0,0.0,0.3,9.3,0.0 +9008,1,2019-01-13 19:40:27,2019-01-13 19:50:54,3,2.1,1,N,239,263,1,9.5,0.0,0.5,2.05,0.0,0.3,12.35, +7012,2,2019-01-19 20:44:35,2019-01-19 21:05:13,1,3.16,1,N,142,107,1,14.5,0.5,0.5,3.16,0.0,0.3,18.96, +7316,2,2019-01-07 22:38:10,2019-01-07 22:44:31,2,2.4,1,N,233,263,1,8.5,0.5,0.5,1.47,0.0,0.3,11.27, +6384,2,2019-01-25 07:35:59,2019-01-25 07:57:44,1,5.12,1,N,143,231,1,18.5,0.0,0.5,2.5,0.0,0.3,21.8,0.0 +3652,1,2019-01-26 20:43:57,2019-01-26 21:03:51,1,4.4,1,N,13,230,2,17.0,0.5,0.5,0.0,0.0,0.3,18.3,0.0 +6409,2,2019-01-06 22:27:16,2019-01-06 22:33:58,1,1.52,1,N,161,234,2,7.0,0.5,0.5,0.0,0.0,0.3,8.3, +1130,2,2019-01-15 20:05:59,2019-01-15 20:22:11,1,3.96,1,N,233,41,2,14.5,0.5,0.5,0.0,0.0,0.3,15.8, +5618,2,2019-01-14 14:47:18,2019-01-14 14:54:44,1,0.5,1,N,140,237,1,6.0,0.0,0.5,1.0,0.0,0.3,7.8, +8515,1,2019-01-16 16:26:17,2019-01-16 16:41:36,2,1.4,1,N,233,163,1,10.5,1.0,0.5,2.45,0.0,0.3,14.75, +5397,1,2019-01-23 08:31:58,2019-01-23 08:55:43,1,2.6,1,N,246,163,1,16.0,0.0,0.5,5.0,0.0,0.3,21.8,0.0 +4648,1,2019-01-30 19:59:30,2019-01-30 20:10:34,1,1.9,1,N,186,79,1,9.0,0.5,0.5,2.05,0.0,0.3,12.35,0.0 diff --git a/tests/test_sets/dataconnector_docs/yellow/2019/02/tripdata.csv b/tests/test_sets/dataconnector_docs/yellow/2019/02/tripdata.csv new file mode 100644 index 000000000000..573017273621 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/2019/02/tripdata.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +9873,2,2019-02-14 14:06:26,2019-02-14 14:24:36,2,1.24,1,N,68,161,2,11.5,0.0,0.5,0.0,0.0,0.3,14.8,2.5 +9976,1,2019-02-08 14:20:27,2019-02-08 14:25:30,0,0.7,1,N,186,249,2,5.0,2.5,0.5,0.0,0.0,0.3,8.3,2.5 +4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5 +1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5 +6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5 +1633,2,2019-02-09 12:14:12,2019-02-10 11:50:00,3,3.02,1,N,48,236,1,12.5,0.0,0.5,1.58,0.0,0.3,17.38,2.5 +4450,1,2019-02-05 08:56:21,2019-02-05 09:02:06,1,0.8,1,N,262,75,1,5.5,2.5,0.5,1.7,0.0,0.3,10.5,2.5 +8929,2,2019-02-13 11:44:00,2019-02-13 11:57:32,1,1.13,1,N,100,170,2,9.0,0.0,0.5,0.0,0.0,0.3,12.3,2.5 +6968,2,2019-02-27 20:52:49,2019-02-27 21:18:37,1,4.64,1,N,246,125,1,18.5,0.5,0.5,4.46,0.0,0.3,26.76,2.5 +7969,2,2019-02-01 11:12:35,2019-02-01 11:19:39,1,0.54,1,N,236,263,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8,0.0 +7695,1,2019-02-28 00:51:50,2019-02-28 00:55:37,1,1.3,1,N,263,74,2,6.0,0.5,0.5,0.0,0.0,0.3,7.3,0.0 +9904,1,2019-02-16 17:45:00,2019-02-16 17:51:09,1,1.3,1,N,170,162,1,6.5,2.5,0.5,1.95,0.0,0.3,11.75,2.5 +1580,2,2019-02-07 10:14:39,2019-02-07 10:51:38,1,17.49,2,N,186,132,2,52.0,0.0,0.5,0.0,5.76,0.3,61.06,2.5 +699,2,2019-02-18 11:01:16,2019-02-18 11:11:06,2,2.31,1,N,249,170,1,10.0,0.0,0.5,2.66,0.0,0.3,15.96,2.5 +5181,2,2019-02-28 22:39:19,2019-02-28 22:42:13,1,0.49,1,N,237,162,1,4.0,0.5,0.5,1.56,0.0,0.3,9.36,2.5 +2631,2,2019-02-01 00:55:00,2019-02-01 00:55:20,1,0.0,5,N,265,265,1,78.0,0.0,0.0,10.0,10.5,0.3,98.8,0.0 +4018,2,2019-02-11 16:59:03,2019-02-11 17:05:01,1,1.35,1,N,143,238,1,6.5,1.0,0.5,2.16,0.0,0.3,12.96,2.5 +7582,2,2019-02-08 17:29:00,2019-02-08 17:31:14,1,0.47,1,N,236,236,2,3.5,1.0,0.5,0.0,0.0,0.3,7.8,2.5 +1810,1,2019-02-17 13:10:12,2019-02-17 13:16:03,1,0.5,1,N,161,230,2,5.5,2.5,0.5,0.0,0.0,0.3,8.8,2.5 +5095,2,2019-02-09 14:30:05,2019-02-09 14:42:22,0,1.18,1,N,170,234,1,9.0,0.0,0.5,2.46,0.0,0.3,14.76,2.5 diff --git a/tests/test_sets/dataconnector_docs/yellow/2019/03/tripdata.csv b/tests/test_sets/dataconnector_docs/yellow/2019/03/tripdata.csv new file mode 100644 index 000000000000..3d254ce261c2 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/2019/03/tripdata.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +7797,2,2019-03-20 15:40:46,2019-03-20 16:15:38,1,5.8,1,N,164,13,1,26.0,0.0,0.5,5.86,0.0,0.3,35.16,2.5 +671,1,2019-03-14 17:21:03,2019-03-14 17:29:45,1,1.5,1,N,239,166,1,8.0,3.5,0.5,2.45,0.0,0.3,14.75,2.5 +7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0 +9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5 +2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0 +5030,2,2019-03-13 20:22:28,2019-03-13 20:42:11,2,9.81,1,N,138,244,1,28.0,0.5,0.5,7.01,5.76,0.3,42.07,0.0 +7848,4,2019-03-21 13:22:08,2019-03-21 13:31:36,1,0.86,1,N,229,229,1,7.5,0.0,0.5,2.16,0.0,0.3,12.96,2.5 +167,1,2019-03-15 22:37:06,2019-03-15 22:38:34,1,0.0,1,N,264,145,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0 +7197,2,2019-03-14 20:38:43,2019-03-14 20:45:04,5,1.04,1,N,151,166,1,6.0,0.5,0.5,1.1,0.0,0.3,8.4,0.0 +568,1,2019-03-09 09:29:44,2019-03-09 09:40:36,0,2.7,1,N,24,142,1,10.5,2.5,0.5,2.75,0.0,0.3,16.55,2.5 +2198,2,2019-03-11 18:32:36,2019-03-11 18:48:27,1,2.02,1,N,234,162,2,11.5,1.0,0.5,0.0,0.0,0.3,15.8,2.5 +5249,1,2019-03-15 09:35:22,2019-03-15 09:46:33,1,2.4,1,N,140,161,1,10.0,2.5,0.5,1.33,0.0,0.3,14.63,2.5 +5531,1,2019-03-04 19:44:33,2019-03-04 20:14:34,0,15.8,1,N,132,145,1,43.5,1.0,0.5,11.3,0.0,0.3,56.6,0.0 +4943,1,2019-03-03 15:12:19,2019-03-03 15:16:06,1,1.2,1,N,237,236,1,5.5,2.5,0.5,1.1,0.0,0.3,9.9,2.5 +617,2,2019-03-20 14:48:47,2019-03-20 14:52:43,1,0.59,1,N,142,163,1,4.5,0.0,0.5,1.0,0.0,0.3,8.8,2.5 +7014,2,2019-03-03 23:56:36,2019-03-04 00:22:25,1,7.91,1,N,132,258,2,25.0,0.5,0.5,0.0,0.0,0.3,26.3,0.0 +2138,2,2019-03-23 14:47:31,2019-03-23 14:54:11,2,1.12,1,N,234,161,2,6.5,0.0,0.5,0.0,0.0,0.3,9.8,2.5 +5298,2,2019-03-20 15:47:08,2019-03-20 15:51:15,1,0.54,1,N,161,161,1,4.5,0.0,0.5,1.56,0.0,0.3,9.36,2.5 +9838,1,2019-03-13 20:42:59,2019-03-13 20:50:23,1,1.1,1,N,162,48,1,6.5,3.0,0.5,2.05,0.0,0.3,12.35,2.5 +2565,1,2019-03-13 10:28:36,2019-03-13 10:28:38,1,0.0,1,N,246,246,2,2.5,2.5,0.5,0.0,0.0,0.3,5.8,2.5 diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-10.csv new file mode 100644 index 000000000000..0ce6520c5822 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-10.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +6984,2,2018-10-30 10:59:02,2018-10-30 11:04:30,1,0.62,1,N,48,163,1,5.5,0.0,0.5,1.26,0.0,0.3,7.56, +3030,2,2018-10-03 19:43:48,2018-10-03 20:01:51,1,3.93,1,N,137,239,1,15.0,1.0,0.5,2.5,0.0,0.3,19.3, +9267,1,2018-10-16 07:10:28,2018-10-16 07:19:21,1,1.4,1,N,142,236,1,8.0,0.0,0.5,0.88,0.0,0.3,9.68, +4342,1,2018-10-10 10:19:16,2018-10-10 10:33:52,1,2.1,1,N,142,50,2,11.5,0.0,0.5,0.0,0.0,0.3,12.3, +8099,2,2018-10-01 19:18:44,2018-10-01 19:31:48,1,2.51,1,N,140,239,1,11.0,1.0,0.5,3.2,0.0,0.3,16.0, +8972,2,2018-10-19 01:40:24,2018-10-19 01:51:42,2,2.54,1,N,249,164,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, +240,1,2018-10-11 19:54:10,2018-10-11 20:19:34,1,5.2,1,N,231,232,1,22.0,1.0,0.5,4.76,0.0,0.3,28.56, +7844,2,2018-10-30 08:58:08,2018-10-30 09:05:09,1,1.08,1,N,249,211,1,6.5,0.0,0.5,1.0,0.0,0.3,8.3, +7024,1,2018-10-11 04:56:32,2018-10-11 05:14:29,1,8.0,1,N,263,138,1,25.0,0.5,0.5,8.0,5.76,0.3,40.06, +7601,2,2018-10-27 00:44:06,2018-10-27 00:57:52,1,2.18,1,N,113,100,1,11.0,0.5,0.5,2.46,0.0,0.3,14.76, +7686,2,2018-10-01 17:13:29,2018-10-01 17:16:10,5,0.36,1,N,263,236,1,3.5,1.0,0.5,1.59,0.0,0.3,6.89, +1344,1,2018-10-03 20:31:19,2018-10-03 21:11:37,1,20.2,3,N,236,1,1,73.5,0.5,0.0,18.35,17.5,0.3,110.15, +2539,2,2018-10-24 00:46:47,2018-10-24 01:07:38,1,14.1,1,N,132,210,1,39.0,0.5,0.5,5.0,0.0,0.3,45.3, +2758,1,2018-10-17 22:25:54,2018-10-17 22:42:38,1,6.2,1,N,237,87,1,20.0,0.5,0.5,2.0,0.0,0.3,23.3, +567,1,2018-10-22 15:57:44,2018-10-22 16:25:52,1,3.1,1,N,264,264,1,18.5,0.0,0.5,3.85,0.0,0.3,23.15, +1994,1,2018-10-28 13:36:15,2018-10-28 13:47:38,1,1.8,1,N,79,232,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, +3549,2,2018-10-25 21:00:53,2018-10-25 21:17:23,1,2.58,1,N,170,48,1,12.0,0.5,0.5,1.0,0.0,0.3,14.3, +3867,2,2018-10-16 13:26:54,2018-10-16 13:51:57,3,1.8,1,N,230,158,1,16.0,0.0,0.5,3.36,0.0,0.3,20.16, +864,2,2018-10-20 10:53:46,2018-10-20 11:03:28,1,1.22,1,N,262,75,1,8.0,0.0,0.5,2.64,0.0,0.3,11.44, +9457,1,2018-10-01 18:19:51,2018-10-01 18:39:05,1,2.6,1,N,144,186,1,14.0,1.0,0.5,3.15,0.0,0.3,18.95, diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-11.csv new file mode 100644 index 000000000000..87ff7e28e6fa --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-11.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +7201,2,2018-11-03 20:17:25,2018-11-03 20:25:26,1,1.15,1,N,166,238,1,7.5,0.5,0.5,2.2,0.0,0.3,11.0, +2578,2,2018-11-14 09:03:52,2018-11-14 09:21:39,1,0.9,1,N,230,230,1,11.5,0.0,0.5,1.0,0.0,0.3,13.3, +9216,1,2018-11-16 13:12:09,2018-11-16 13:24:41,1,1.5,1,N,238,143,2,9.5,0.0,0.5,0.0,0.0,0.3,10.3, +5518,1,2018-11-15 20:44:20,2018-11-15 20:58:41,1,1.7,1,N,236,151,1,11.0,0.5,0.5,0.0,0.0,0.3,12.3, +104,2,2018-11-07 00:01:14,2018-11-07 00:05:56,5,0.85,1,N,234,170,1,5.5,0.5,0.5,1.36,0.0,0.3,8.16, +536,2,2018-11-11 12:43:04,2018-11-11 12:59:33,1,2.11,1,N,233,114,1,11.5,0.0,0.5,2.46,0.0,0.3,14.76, +2167,2,2018-11-15 14:14:45,2018-11-15 14:24:39,2,0.8,1,N,142,230,1,7.5,0.0,0.5,2.08,0.0,0.3,10.38, +5875,4,2018-11-07 07:24:55,2018-11-07 07:30:28,1,0.81,1,N,164,233,2,5.5,0.0,0.5,0.0,0.0,0.3,6.3, +8196,2,2018-11-05 13:46:25,2018-11-05 13:47:12,3,0.08,1,N,236,236,1,2.5,0.0,0.5,0.66,0.0,0.3,3.96, +8175,1,2018-11-13 20:46:11,2018-11-13 20:50:29,1,0.6,1,N,107,137,2,5.0,0.5,0.5,0.0,0.0,0.3,6.3, +6314,2,2018-11-25 19:36:38,2018-11-25 19:41:30,1,1.77,1,N,263,74,2,7.0,0.0,0.5,0.0,0.0,0.3,7.8, +7700,2,2018-11-18 21:33:49,2018-11-18 21:46:58,2,2.76,1,N,163,24,1,12.0,0.5,0.5,3.32,0.0,0.3,16.62, +9062,1,2018-11-03 18:39:31,2018-11-03 18:49:25,1,0.8,1,N,164,230,2,7.5,0.0,0.5,0.0,0.0,0.3,8.3, +6701,1,2018-11-20 06:12:00,2018-11-20 06:19:35,1,2.2,1,N,237,75,2,8.5,0.0,0.5,0.0,0.0,0.3,9.3, +399,2,2018-11-13 21:20:51,2018-11-13 21:34:45,1,1.18,1,N,162,230,1,9.5,0.5,0.5,5.0,0.0,0.3,15.8, +2745,2,2018-11-03 00:07:35,2018-11-03 00:29:31,1,2.55,1,N,68,4,2,15.0,0.5,0.5,0.0,0.0,0.3,16.3, +5363,2,2018-11-23 20:16:12,2018-11-23 20:20:46,1,1.05,1,N,237,162,2,5.5,0.5,0.5,0.0,0.0,0.3,6.8, +383,1,2018-11-10 22:31:50,2018-11-10 22:47:48,1,1.1,1,N,161,141,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, +1537,2,2018-11-17 01:05:40,2018-11-17 01:18:09,1,1.62,1,N,114,79,1,9.5,0.5,0.5,2.16,0.0,0.3,12.96, +1760,2,2018-11-01 13:52:35,2018-11-01 13:59:37,1,0.5,1,N,230,162,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8, diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-12.csv new file mode 100644 index 000000000000..50eb34d13f83 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-12.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +6701,1,2018-12-07 22:07:55,2018-12-07 22:35:18,1,3.5,1,N,237,249,1,18.5,0.5,0.5,2.0,0.0,0.3,21.8, +9645,2,2018-12-10 18:21:08,2018-12-10 18:33:12,1,1.38,1,N,114,158,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, +4105,2,2018-12-15 23:27:23,2018-12-15 23:44:02,1,2.62,1,N,232,137,2,13.0,0.5,0.5,0.0,0.0,0.3,14.3, +2743,2,2018-12-12 19:54:39,2018-12-12 19:57:53,1,0.41,1,N,237,237,1,4.0,1.0,0.5,1.16,0.0,0.3,6.96, +4126,1,2018-12-21 18:57:55,2018-12-21 19:09:35,1,1.0,1,N,249,79,1,8.5,1.0,0.5,2.05,0.0,0.3,12.35, +1566,1,2018-12-16 15:38:10,2018-12-16 15:55:05,3,1.4,1,N,236,141,1,11.5,0.0,0.5,1.85,0.0,0.3,14.15, +4857,2,2018-12-06 17:26:50,2018-12-06 17:39:34,2,0.93,1,N,142,239,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, +304,1,2018-12-27 16:00:34,2018-12-27 16:34:26,2,5.7,1,N,163,209,2,24.0,1.0,0.5,0.0,0.0,0.3,25.8, +8159,2,2018-12-05 18:32:06,2018-12-05 18:41:03,1,1.38,1,N,68,90,1,8.0,1.0,0.5,2.94,0.0,0.3,12.74, +6575,4,2018-12-30 23:53:05,2018-12-30 23:55:40,1,0.39,1,N,256,256,1,4.0,0.5,0.5,1.06,0.0,0.3,6.36, +8327,2,2018-12-14 18:42:59,2018-12-14 18:51:49,1,1.26,1,N,163,236,1,8.0,1.0,0.5,1.5,0.0,0.3,11.3, +5245,1,2018-12-23 16:00:21,2018-12-23 16:12:44,1,1.6,1,N,141,142,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, +3521,2,2018-12-05 00:12:07,2018-12-05 00:30:25,2,3.15,1,N,164,45,1,14.0,0.5,0.5,3.06,0.0,0.3,18.36, +9442,2,2018-12-14 08:58:16,2018-12-14 09:06:42,1,0.67,1,N,239,238,1,6.0,0.0,0.5,0.8,0.0,0.3,7.6, +922,1,2018-12-09 03:45:39,2018-12-09 03:52:09,1,1.8,1,N,90,170,1,7.5,0.5,0.5,1.0,0.0,0.3,9.8, +807,2,2018-12-26 17:30:45,2018-12-26 17:53:04,1,5.34,1,N,164,261,2,19.0,1.0,0.5,0.0,0.0,0.3,20.8, +6354,2,2018-12-31 09:38:01,2018-12-31 09:46:53,1,2.2,1,N,24,236,1,9.5,0.0,0.5,2.0,0.0,0.3,12.3, +7329,2,2018-12-09 02:54:29,2018-12-09 03:00:14,5,1.19,1,N,164,246,1,6.5,0.5,0.5,1.95,0.0,0.3,9.75, +6227,1,2018-12-01 18:32:30,2018-12-01 19:06:38,1,4.2,1,N,90,263,1,22.0,0.0,0.5,4.55,0.0,0.3,27.35, +9796,1,2018-12-17 18:18:12,2018-12-17 18:32:05,1,1.4,1,N,68,107,1,10.0,1.0,0.5,1.2,0.0,0.3,13.0, diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-01.csv new file mode 100644 index 000000000000..288e8ac8a023 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-01.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +4565,2,2019-01-15 02:04:57,2019-01-15 02:17:53,1,4.92,1,N,255,226,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36, +714,2,2019-01-08 00:38:49,2019-01-08 00:52:30,1,7.07,1,N,132,197,1,21.0,0.5,0.5,4.46,0.0,0.3,26.76, +2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0 +5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95, +4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56, +1505,2,2019-01-29 17:58:08,2019-01-29 18:11:19,1,0.72,1,N,13,13,2,9.0,1.0,0.5,0.0,0.0,0.3,10.8,0.0 +8131,1,2019-01-13 16:22:03,2019-01-13 16:33:06,2,1.4,1,N,161,170,1,8.5,0.0,0.5,3.0,0.0,0.3,12.3, +9601,2,2019-01-23 12:07:55,2019-01-23 12:11:19,5,0.57,1,N,239,239,1,4.0,0.0,0.5,1.0,0.0,0.3,5.8,0.0 +4522,2,2019-01-25 15:42:39,2019-01-25 15:50:54,1,1.26,1,N,170,229,1,7.5,0.0,0.5,1.0,0.0,0.3,9.3,0.0 +9008,1,2019-01-13 19:40:27,2019-01-13 19:50:54,3,2.1,1,N,239,263,1,9.5,0.0,0.5,2.05,0.0,0.3,12.35, +7012,2,2019-01-19 20:44:35,2019-01-19 21:05:13,1,3.16,1,N,142,107,1,14.5,0.5,0.5,3.16,0.0,0.3,18.96, +7316,2,2019-01-07 22:38:10,2019-01-07 22:44:31,2,2.4,1,N,233,263,1,8.5,0.5,0.5,1.47,0.0,0.3,11.27, +6384,2,2019-01-25 07:35:59,2019-01-25 07:57:44,1,5.12,1,N,143,231,1,18.5,0.0,0.5,2.5,0.0,0.3,21.8,0.0 +3652,1,2019-01-26 20:43:57,2019-01-26 21:03:51,1,4.4,1,N,13,230,2,17.0,0.5,0.5,0.0,0.0,0.3,18.3,0.0 +6409,2,2019-01-06 22:27:16,2019-01-06 22:33:58,1,1.52,1,N,161,234,2,7.0,0.5,0.5,0.0,0.0,0.3,8.3, +1130,2,2019-01-15 20:05:59,2019-01-15 20:22:11,1,3.96,1,N,233,41,2,14.5,0.5,0.5,0.0,0.0,0.3,15.8, +5618,2,2019-01-14 14:47:18,2019-01-14 14:54:44,1,0.5,1,N,140,237,1,6.0,0.0,0.5,1.0,0.0,0.3,7.8, +8515,1,2019-01-16 16:26:17,2019-01-16 16:41:36,2,1.4,1,N,233,163,1,10.5,1.0,0.5,2.45,0.0,0.3,14.75, +5397,1,2019-01-23 08:31:58,2019-01-23 08:55:43,1,2.6,1,N,246,163,1,16.0,0.0,0.5,5.0,0.0,0.3,21.8,0.0 +4648,1,2019-01-30 19:59:30,2019-01-30 20:10:34,1,1.9,1,N,186,79,1,9.0,0.5,0.5,2.05,0.0,0.3,12.35,0.0 diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-02.csv new file mode 100644 index 000000000000..573017273621 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-02.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +9873,2,2019-02-14 14:06:26,2019-02-14 14:24:36,2,1.24,1,N,68,161,2,11.5,0.0,0.5,0.0,0.0,0.3,14.8,2.5 +9976,1,2019-02-08 14:20:27,2019-02-08 14:25:30,0,0.7,1,N,186,249,2,5.0,2.5,0.5,0.0,0.0,0.3,8.3,2.5 +4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5 +1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5 +6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5 +1633,2,2019-02-09 12:14:12,2019-02-10 11:50:00,3,3.02,1,N,48,236,1,12.5,0.0,0.5,1.58,0.0,0.3,17.38,2.5 +4450,1,2019-02-05 08:56:21,2019-02-05 09:02:06,1,0.8,1,N,262,75,1,5.5,2.5,0.5,1.7,0.0,0.3,10.5,2.5 +8929,2,2019-02-13 11:44:00,2019-02-13 11:57:32,1,1.13,1,N,100,170,2,9.0,0.0,0.5,0.0,0.0,0.3,12.3,2.5 +6968,2,2019-02-27 20:52:49,2019-02-27 21:18:37,1,4.64,1,N,246,125,1,18.5,0.5,0.5,4.46,0.0,0.3,26.76,2.5 +7969,2,2019-02-01 11:12:35,2019-02-01 11:19:39,1,0.54,1,N,236,263,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8,0.0 +7695,1,2019-02-28 00:51:50,2019-02-28 00:55:37,1,1.3,1,N,263,74,2,6.0,0.5,0.5,0.0,0.0,0.3,7.3,0.0 +9904,1,2019-02-16 17:45:00,2019-02-16 17:51:09,1,1.3,1,N,170,162,1,6.5,2.5,0.5,1.95,0.0,0.3,11.75,2.5 +1580,2,2019-02-07 10:14:39,2019-02-07 10:51:38,1,17.49,2,N,186,132,2,52.0,0.0,0.5,0.0,5.76,0.3,61.06,2.5 +699,2,2019-02-18 11:01:16,2019-02-18 11:11:06,2,2.31,1,N,249,170,1,10.0,0.0,0.5,2.66,0.0,0.3,15.96,2.5 +5181,2,2019-02-28 22:39:19,2019-02-28 22:42:13,1,0.49,1,N,237,162,1,4.0,0.5,0.5,1.56,0.0,0.3,9.36,2.5 +2631,2,2019-02-01 00:55:00,2019-02-01 00:55:20,1,0.0,5,N,265,265,1,78.0,0.0,0.0,10.0,10.5,0.3,98.8,0.0 +4018,2,2019-02-11 16:59:03,2019-02-11 17:05:01,1,1.35,1,N,143,238,1,6.5,1.0,0.5,2.16,0.0,0.3,12.96,2.5 +7582,2,2019-02-08 17:29:00,2019-02-08 17:31:14,1,0.47,1,N,236,236,2,3.5,1.0,0.5,0.0,0.0,0.3,7.8,2.5 +1810,1,2019-02-17 13:10:12,2019-02-17 13:16:03,1,0.5,1,N,161,230,2,5.5,2.5,0.5,0.0,0.0,0.3,8.8,2.5 +5095,2,2019-02-09 14:30:05,2019-02-09 14:42:22,0,1.18,1,N,170,234,1,9.0,0.0,0.5,2.46,0.0,0.3,14.76,2.5 diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-03.csv new file mode 100644 index 000000000000..3d254ce261c2 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-03.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +7797,2,2019-03-20 15:40:46,2019-03-20 16:15:38,1,5.8,1,N,164,13,1,26.0,0.0,0.5,5.86,0.0,0.3,35.16,2.5 +671,1,2019-03-14 17:21:03,2019-03-14 17:29:45,1,1.5,1,N,239,166,1,8.0,3.5,0.5,2.45,0.0,0.3,14.75,2.5 +7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0 +9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5 +2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0 +5030,2,2019-03-13 20:22:28,2019-03-13 20:42:11,2,9.81,1,N,138,244,1,28.0,0.5,0.5,7.01,5.76,0.3,42.07,0.0 +7848,4,2019-03-21 13:22:08,2019-03-21 13:31:36,1,0.86,1,N,229,229,1,7.5,0.0,0.5,2.16,0.0,0.3,12.96,2.5 +167,1,2019-03-15 22:37:06,2019-03-15 22:38:34,1,0.0,1,N,264,145,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0 +7197,2,2019-03-14 20:38:43,2019-03-14 20:45:04,5,1.04,1,N,151,166,1,6.0,0.5,0.5,1.1,0.0,0.3,8.4,0.0 +568,1,2019-03-09 09:29:44,2019-03-09 09:40:36,0,2.7,1,N,24,142,1,10.5,2.5,0.5,2.75,0.0,0.3,16.55,2.5 +2198,2,2019-03-11 18:32:36,2019-03-11 18:48:27,1,2.02,1,N,234,162,2,11.5,1.0,0.5,0.0,0.0,0.3,15.8,2.5 +5249,1,2019-03-15 09:35:22,2019-03-15 09:46:33,1,2.4,1,N,140,161,1,10.0,2.5,0.5,1.33,0.0,0.3,14.63,2.5 +5531,1,2019-03-04 19:44:33,2019-03-04 20:14:34,0,15.8,1,N,132,145,1,43.5,1.0,0.5,11.3,0.0,0.3,56.6,0.0 +4943,1,2019-03-03 15:12:19,2019-03-03 15:16:06,1,1.2,1,N,237,236,1,5.5,2.5,0.5,1.1,0.0,0.3,9.9,2.5 +617,2,2019-03-20 14:48:47,2019-03-20 14:52:43,1,0.59,1,N,142,163,1,4.5,0.0,0.5,1.0,0.0,0.3,8.8,2.5 +7014,2,2019-03-03 23:56:36,2019-03-04 00:22:25,1,7.91,1,N,132,258,2,25.0,0.5,0.5,0.0,0.0,0.3,26.3,0.0 +2138,2,2019-03-23 14:47:31,2019-03-23 14:54:11,2,1.12,1,N,234,161,2,6.5,0.0,0.5,0.0,0.0,0.3,9.8,2.5 +5298,2,2019-03-20 15:47:08,2019-03-20 15:51:15,1,0.54,1,N,161,161,1,4.5,0.0,0.5,1.56,0.0,0.3,9.36,2.5 +9838,1,2019-03-13 20:42:59,2019-03-13 20:50:23,1,1.1,1,N,162,48,1,6.5,3.0,0.5,2.05,0.0,0.3,12.35,2.5 +2565,1,2019-03-13 10:28:36,2019-03-13 10:28:38,1,0.0,1,N,246,246,2,2.5,2.5,0.5,0.0,0.0,0.3,5.8,2.5 diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-10.csv new file mode 100644 index 000000000000..0ce6520c5822 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-10.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +6984,2,2018-10-30 10:59:02,2018-10-30 11:04:30,1,0.62,1,N,48,163,1,5.5,0.0,0.5,1.26,0.0,0.3,7.56, +3030,2,2018-10-03 19:43:48,2018-10-03 20:01:51,1,3.93,1,N,137,239,1,15.0,1.0,0.5,2.5,0.0,0.3,19.3, +9267,1,2018-10-16 07:10:28,2018-10-16 07:19:21,1,1.4,1,N,142,236,1,8.0,0.0,0.5,0.88,0.0,0.3,9.68, +4342,1,2018-10-10 10:19:16,2018-10-10 10:33:52,1,2.1,1,N,142,50,2,11.5,0.0,0.5,0.0,0.0,0.3,12.3, +8099,2,2018-10-01 19:18:44,2018-10-01 19:31:48,1,2.51,1,N,140,239,1,11.0,1.0,0.5,3.2,0.0,0.3,16.0, +8972,2,2018-10-19 01:40:24,2018-10-19 01:51:42,2,2.54,1,N,249,164,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, +240,1,2018-10-11 19:54:10,2018-10-11 20:19:34,1,5.2,1,N,231,232,1,22.0,1.0,0.5,4.76,0.0,0.3,28.56, +7844,2,2018-10-30 08:58:08,2018-10-30 09:05:09,1,1.08,1,N,249,211,1,6.5,0.0,0.5,1.0,0.0,0.3,8.3, +7024,1,2018-10-11 04:56:32,2018-10-11 05:14:29,1,8.0,1,N,263,138,1,25.0,0.5,0.5,8.0,5.76,0.3,40.06, +7601,2,2018-10-27 00:44:06,2018-10-27 00:57:52,1,2.18,1,N,113,100,1,11.0,0.5,0.5,2.46,0.0,0.3,14.76, +7686,2,2018-10-01 17:13:29,2018-10-01 17:16:10,5,0.36,1,N,263,236,1,3.5,1.0,0.5,1.59,0.0,0.3,6.89, +1344,1,2018-10-03 20:31:19,2018-10-03 21:11:37,1,20.2,3,N,236,1,1,73.5,0.5,0.0,18.35,17.5,0.3,110.15, +2539,2,2018-10-24 00:46:47,2018-10-24 01:07:38,1,14.1,1,N,132,210,1,39.0,0.5,0.5,5.0,0.0,0.3,45.3, +2758,1,2018-10-17 22:25:54,2018-10-17 22:42:38,1,6.2,1,N,237,87,1,20.0,0.5,0.5,2.0,0.0,0.3,23.3, +567,1,2018-10-22 15:57:44,2018-10-22 16:25:52,1,3.1,1,N,264,264,1,18.5,0.0,0.5,3.85,0.0,0.3,23.15, +1994,1,2018-10-28 13:36:15,2018-10-28 13:47:38,1,1.8,1,N,79,232,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, +3549,2,2018-10-25 21:00:53,2018-10-25 21:17:23,1,2.58,1,N,170,48,1,12.0,0.5,0.5,1.0,0.0,0.3,14.3, +3867,2,2018-10-16 13:26:54,2018-10-16 13:51:57,3,1.8,1,N,230,158,1,16.0,0.0,0.5,3.36,0.0,0.3,20.16, +864,2,2018-10-20 10:53:46,2018-10-20 11:03:28,1,1.22,1,N,262,75,1,8.0,0.0,0.5,2.64,0.0,0.3,11.44, +9457,1,2018-10-01 18:19:51,2018-10-01 18:39:05,1,2.6,1,N,144,186,1,14.0,1.0,0.5,3.15,0.0,0.3,18.95, diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-11.csv new file mode 100644 index 000000000000..87ff7e28e6fa --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-11.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +7201,2,2018-11-03 20:17:25,2018-11-03 20:25:26,1,1.15,1,N,166,238,1,7.5,0.5,0.5,2.2,0.0,0.3,11.0, +2578,2,2018-11-14 09:03:52,2018-11-14 09:21:39,1,0.9,1,N,230,230,1,11.5,0.0,0.5,1.0,0.0,0.3,13.3, +9216,1,2018-11-16 13:12:09,2018-11-16 13:24:41,1,1.5,1,N,238,143,2,9.5,0.0,0.5,0.0,0.0,0.3,10.3, +5518,1,2018-11-15 20:44:20,2018-11-15 20:58:41,1,1.7,1,N,236,151,1,11.0,0.5,0.5,0.0,0.0,0.3,12.3, +104,2,2018-11-07 00:01:14,2018-11-07 00:05:56,5,0.85,1,N,234,170,1,5.5,0.5,0.5,1.36,0.0,0.3,8.16, +536,2,2018-11-11 12:43:04,2018-11-11 12:59:33,1,2.11,1,N,233,114,1,11.5,0.0,0.5,2.46,0.0,0.3,14.76, +2167,2,2018-11-15 14:14:45,2018-11-15 14:24:39,2,0.8,1,N,142,230,1,7.5,0.0,0.5,2.08,0.0,0.3,10.38, +5875,4,2018-11-07 07:24:55,2018-11-07 07:30:28,1,0.81,1,N,164,233,2,5.5,0.0,0.5,0.0,0.0,0.3,6.3, +8196,2,2018-11-05 13:46:25,2018-11-05 13:47:12,3,0.08,1,N,236,236,1,2.5,0.0,0.5,0.66,0.0,0.3,3.96, +8175,1,2018-11-13 20:46:11,2018-11-13 20:50:29,1,0.6,1,N,107,137,2,5.0,0.5,0.5,0.0,0.0,0.3,6.3, +6314,2,2018-11-25 19:36:38,2018-11-25 19:41:30,1,1.77,1,N,263,74,2,7.0,0.0,0.5,0.0,0.0,0.3,7.8, +7700,2,2018-11-18 21:33:49,2018-11-18 21:46:58,2,2.76,1,N,163,24,1,12.0,0.5,0.5,3.32,0.0,0.3,16.62, +9062,1,2018-11-03 18:39:31,2018-11-03 18:49:25,1,0.8,1,N,164,230,2,7.5,0.0,0.5,0.0,0.0,0.3,8.3, +6701,1,2018-11-20 06:12:00,2018-11-20 06:19:35,1,2.2,1,N,237,75,2,8.5,0.0,0.5,0.0,0.0,0.3,9.3, +399,2,2018-11-13 21:20:51,2018-11-13 21:34:45,1,1.18,1,N,162,230,1,9.5,0.5,0.5,5.0,0.0,0.3,15.8, +2745,2,2018-11-03 00:07:35,2018-11-03 00:29:31,1,2.55,1,N,68,4,2,15.0,0.5,0.5,0.0,0.0,0.3,16.3, +5363,2,2018-11-23 20:16:12,2018-11-23 20:20:46,1,1.05,1,N,237,162,2,5.5,0.5,0.5,0.0,0.0,0.3,6.8, +383,1,2018-11-10 22:31:50,2018-11-10 22:47:48,1,1.1,1,N,161,141,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, +1537,2,2018-11-17 01:05:40,2018-11-17 01:18:09,1,1.62,1,N,114,79,1,9.5,0.5,0.5,2.16,0.0,0.3,12.96, +1760,2,2018-11-01 13:52:35,2018-11-01 13:59:37,1,0.5,1,N,230,162,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8, diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-12.csv new file mode 100644 index 000000000000..50eb34d13f83 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-12.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +6701,1,2018-12-07 22:07:55,2018-12-07 22:35:18,1,3.5,1,N,237,249,1,18.5,0.5,0.5,2.0,0.0,0.3,21.8, +9645,2,2018-12-10 18:21:08,2018-12-10 18:33:12,1,1.38,1,N,114,158,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, +4105,2,2018-12-15 23:27:23,2018-12-15 23:44:02,1,2.62,1,N,232,137,2,13.0,0.5,0.5,0.0,0.0,0.3,14.3, +2743,2,2018-12-12 19:54:39,2018-12-12 19:57:53,1,0.41,1,N,237,237,1,4.0,1.0,0.5,1.16,0.0,0.3,6.96, +4126,1,2018-12-21 18:57:55,2018-12-21 19:09:35,1,1.0,1,N,249,79,1,8.5,1.0,0.5,2.05,0.0,0.3,12.35, +1566,1,2018-12-16 15:38:10,2018-12-16 15:55:05,3,1.4,1,N,236,141,1,11.5,0.0,0.5,1.85,0.0,0.3,14.15, +4857,2,2018-12-06 17:26:50,2018-12-06 17:39:34,2,0.93,1,N,142,239,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, +304,1,2018-12-27 16:00:34,2018-12-27 16:34:26,2,5.7,1,N,163,209,2,24.0,1.0,0.5,0.0,0.0,0.3,25.8, +8159,2,2018-12-05 18:32:06,2018-12-05 18:41:03,1,1.38,1,N,68,90,1,8.0,1.0,0.5,2.94,0.0,0.3,12.74, +6575,4,2018-12-30 23:53:05,2018-12-30 23:55:40,1,0.39,1,N,256,256,1,4.0,0.5,0.5,1.06,0.0,0.3,6.36, +8327,2,2018-12-14 18:42:59,2018-12-14 18:51:49,1,1.26,1,N,163,236,1,8.0,1.0,0.5,1.5,0.0,0.3,11.3, +5245,1,2018-12-23 16:00:21,2018-12-23 16:12:44,1,1.6,1,N,141,142,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, +3521,2,2018-12-05 00:12:07,2018-12-05 00:30:25,2,3.15,1,N,164,45,1,14.0,0.5,0.5,3.06,0.0,0.3,18.36, +9442,2,2018-12-14 08:58:16,2018-12-14 09:06:42,1,0.67,1,N,239,238,1,6.0,0.0,0.5,0.8,0.0,0.3,7.6, +922,1,2018-12-09 03:45:39,2018-12-09 03:52:09,1,1.8,1,N,90,170,1,7.5,0.5,0.5,1.0,0.0,0.3,9.8, +807,2,2018-12-26 17:30:45,2018-12-26 17:53:04,1,5.34,1,N,164,261,2,19.0,1.0,0.5,0.0,0.0,0.3,20.8, +6354,2,2018-12-31 09:38:01,2018-12-31 09:46:53,1,2.2,1,N,24,236,1,9.5,0.0,0.5,2.0,0.0,0.3,12.3, +7329,2,2018-12-09 02:54:29,2018-12-09 03:00:14,5,1.19,1,N,164,246,1,6.5,0.5,0.5,1.95,0.0,0.3,9.75, +6227,1,2018-12-01 18:32:30,2018-12-01 19:06:38,1,4.2,1,N,90,263,1,22.0,0.0,0.5,4.55,0.0,0.3,27.35, +9796,1,2018-12-17 18:18:12,2018-12-17 18:32:05,1,1.4,1,N,68,107,1,10.0,1.0,0.5,1.2,0.0,0.3,13.0, diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-01.csv new file mode 100644 index 000000000000..288e8ac8a023 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-01.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +4565,2,2019-01-15 02:04:57,2019-01-15 02:17:53,1,4.92,1,N,255,226,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36, +714,2,2019-01-08 00:38:49,2019-01-08 00:52:30,1,7.07,1,N,132,197,1,21.0,0.5,0.5,4.46,0.0,0.3,26.76, +2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0 +5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95, +4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56, +1505,2,2019-01-29 17:58:08,2019-01-29 18:11:19,1,0.72,1,N,13,13,2,9.0,1.0,0.5,0.0,0.0,0.3,10.8,0.0 +8131,1,2019-01-13 16:22:03,2019-01-13 16:33:06,2,1.4,1,N,161,170,1,8.5,0.0,0.5,3.0,0.0,0.3,12.3, +9601,2,2019-01-23 12:07:55,2019-01-23 12:11:19,5,0.57,1,N,239,239,1,4.0,0.0,0.5,1.0,0.0,0.3,5.8,0.0 +4522,2,2019-01-25 15:42:39,2019-01-25 15:50:54,1,1.26,1,N,170,229,1,7.5,0.0,0.5,1.0,0.0,0.3,9.3,0.0 +9008,1,2019-01-13 19:40:27,2019-01-13 19:50:54,3,2.1,1,N,239,263,1,9.5,0.0,0.5,2.05,0.0,0.3,12.35, +7012,2,2019-01-19 20:44:35,2019-01-19 21:05:13,1,3.16,1,N,142,107,1,14.5,0.5,0.5,3.16,0.0,0.3,18.96, +7316,2,2019-01-07 22:38:10,2019-01-07 22:44:31,2,2.4,1,N,233,263,1,8.5,0.5,0.5,1.47,0.0,0.3,11.27, +6384,2,2019-01-25 07:35:59,2019-01-25 07:57:44,1,5.12,1,N,143,231,1,18.5,0.0,0.5,2.5,0.0,0.3,21.8,0.0 +3652,1,2019-01-26 20:43:57,2019-01-26 21:03:51,1,4.4,1,N,13,230,2,17.0,0.5,0.5,0.0,0.0,0.3,18.3,0.0 +6409,2,2019-01-06 22:27:16,2019-01-06 22:33:58,1,1.52,1,N,161,234,2,7.0,0.5,0.5,0.0,0.0,0.3,8.3, +1130,2,2019-01-15 20:05:59,2019-01-15 20:22:11,1,3.96,1,N,233,41,2,14.5,0.5,0.5,0.0,0.0,0.3,15.8, +5618,2,2019-01-14 14:47:18,2019-01-14 14:54:44,1,0.5,1,N,140,237,1,6.0,0.0,0.5,1.0,0.0,0.3,7.8, +8515,1,2019-01-16 16:26:17,2019-01-16 16:41:36,2,1.4,1,N,233,163,1,10.5,1.0,0.5,2.45,0.0,0.3,14.75, +5397,1,2019-01-23 08:31:58,2019-01-23 08:55:43,1,2.6,1,N,246,163,1,16.0,0.0,0.5,5.0,0.0,0.3,21.8,0.0 +4648,1,2019-01-30 19:59:30,2019-01-30 20:10:34,1,1.9,1,N,186,79,1,9.0,0.5,0.5,2.05,0.0,0.3,12.35,0.0 diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-02.csv new file mode 100644 index 000000000000..573017273621 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-02.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +9873,2,2019-02-14 14:06:26,2019-02-14 14:24:36,2,1.24,1,N,68,161,2,11.5,0.0,0.5,0.0,0.0,0.3,14.8,2.5 +9976,1,2019-02-08 14:20:27,2019-02-08 14:25:30,0,0.7,1,N,186,249,2,5.0,2.5,0.5,0.0,0.0,0.3,8.3,2.5 +4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5 +1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5 +6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5 +1633,2,2019-02-09 12:14:12,2019-02-10 11:50:00,3,3.02,1,N,48,236,1,12.5,0.0,0.5,1.58,0.0,0.3,17.38,2.5 +4450,1,2019-02-05 08:56:21,2019-02-05 09:02:06,1,0.8,1,N,262,75,1,5.5,2.5,0.5,1.7,0.0,0.3,10.5,2.5 +8929,2,2019-02-13 11:44:00,2019-02-13 11:57:32,1,1.13,1,N,100,170,2,9.0,0.0,0.5,0.0,0.0,0.3,12.3,2.5 +6968,2,2019-02-27 20:52:49,2019-02-27 21:18:37,1,4.64,1,N,246,125,1,18.5,0.5,0.5,4.46,0.0,0.3,26.76,2.5 +7969,2,2019-02-01 11:12:35,2019-02-01 11:19:39,1,0.54,1,N,236,263,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8,0.0 +7695,1,2019-02-28 00:51:50,2019-02-28 00:55:37,1,1.3,1,N,263,74,2,6.0,0.5,0.5,0.0,0.0,0.3,7.3,0.0 +9904,1,2019-02-16 17:45:00,2019-02-16 17:51:09,1,1.3,1,N,170,162,1,6.5,2.5,0.5,1.95,0.0,0.3,11.75,2.5 +1580,2,2019-02-07 10:14:39,2019-02-07 10:51:38,1,17.49,2,N,186,132,2,52.0,0.0,0.5,0.0,5.76,0.3,61.06,2.5 +699,2,2019-02-18 11:01:16,2019-02-18 11:11:06,2,2.31,1,N,249,170,1,10.0,0.0,0.5,2.66,0.0,0.3,15.96,2.5 +5181,2,2019-02-28 22:39:19,2019-02-28 22:42:13,1,0.49,1,N,237,162,1,4.0,0.5,0.5,1.56,0.0,0.3,9.36,2.5 +2631,2,2019-02-01 00:55:00,2019-02-01 00:55:20,1,0.0,5,N,265,265,1,78.0,0.0,0.0,10.0,10.5,0.3,98.8,0.0 +4018,2,2019-02-11 16:59:03,2019-02-11 17:05:01,1,1.35,1,N,143,238,1,6.5,1.0,0.5,2.16,0.0,0.3,12.96,2.5 +7582,2,2019-02-08 17:29:00,2019-02-08 17:31:14,1,0.47,1,N,236,236,2,3.5,1.0,0.5,0.0,0.0,0.3,7.8,2.5 +1810,1,2019-02-17 13:10:12,2019-02-17 13:16:03,1,0.5,1,N,161,230,2,5.5,2.5,0.5,0.0,0.0,0.3,8.8,2.5 +5095,2,2019-02-09 14:30:05,2019-02-09 14:42:22,0,1.18,1,N,170,234,1,9.0,0.0,0.5,2.46,0.0,0.3,14.76,2.5 diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-03.csv new file mode 100644 index 000000000000..3d254ce261c2 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-03.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +7797,2,2019-03-20 15:40:46,2019-03-20 16:15:38,1,5.8,1,N,164,13,1,26.0,0.0,0.5,5.86,0.0,0.3,35.16,2.5 +671,1,2019-03-14 17:21:03,2019-03-14 17:29:45,1,1.5,1,N,239,166,1,8.0,3.5,0.5,2.45,0.0,0.3,14.75,2.5 +7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0 +9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5 +2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0 +5030,2,2019-03-13 20:22:28,2019-03-13 20:42:11,2,9.81,1,N,138,244,1,28.0,0.5,0.5,7.01,5.76,0.3,42.07,0.0 +7848,4,2019-03-21 13:22:08,2019-03-21 13:31:36,1,0.86,1,N,229,229,1,7.5,0.0,0.5,2.16,0.0,0.3,12.96,2.5 +167,1,2019-03-15 22:37:06,2019-03-15 22:38:34,1,0.0,1,N,264,145,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0 +7197,2,2019-03-14 20:38:43,2019-03-14 20:45:04,5,1.04,1,N,151,166,1,6.0,0.5,0.5,1.1,0.0,0.3,8.4,0.0 +568,1,2019-03-09 09:29:44,2019-03-09 09:40:36,0,2.7,1,N,24,142,1,10.5,2.5,0.5,2.75,0.0,0.3,16.55,2.5 +2198,2,2019-03-11 18:32:36,2019-03-11 18:48:27,1,2.02,1,N,234,162,2,11.5,1.0,0.5,0.0,0.0,0.3,15.8,2.5 +5249,1,2019-03-15 09:35:22,2019-03-15 09:46:33,1,2.4,1,N,140,161,1,10.0,2.5,0.5,1.33,0.0,0.3,14.63,2.5 +5531,1,2019-03-04 19:44:33,2019-03-04 20:14:34,0,15.8,1,N,132,145,1,43.5,1.0,0.5,11.3,0.0,0.3,56.6,0.0 +4943,1,2019-03-03 15:12:19,2019-03-03 15:16:06,1,1.2,1,N,237,236,1,5.5,2.5,0.5,1.1,0.0,0.3,9.9,2.5 +617,2,2019-03-20 14:48:47,2019-03-20 14:52:43,1,0.59,1,N,142,163,1,4.5,0.0,0.5,1.0,0.0,0.3,8.8,2.5 +7014,2,2019-03-03 23:56:36,2019-03-04 00:22:25,1,7.91,1,N,132,258,2,25.0,0.5,0.5,0.0,0.0,0.3,26.3,0.0 +2138,2,2019-03-23 14:47:31,2019-03-23 14:54:11,2,1.12,1,N,234,161,2,6.5,0.0,0.5,0.0,0.0,0.3,9.8,2.5 +5298,2,2019-03-20 15:47:08,2019-03-20 15:51:15,1,0.54,1,N,161,161,1,4.5,0.0,0.5,1.56,0.0,0.3,9.36,2.5 +9838,1,2019-03-13 20:42:59,2019-03-13 20:50:23,1,1.1,1,N,162,48,1,6.5,3.0,0.5,2.05,0.0,0.3,12.35,2.5 +2565,1,2019-03-13 10:28:36,2019-03-13 10:28:38,1,0.0,1,N,246,246,2,2.5,2.5,0.5,0.0,0.0,0.3,5.8,2.5 From c0051f647aa2989e21f446bb853f6e270b9efea1 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 20 Oct 2021 11:08:29 -0400 Subject: [PATCH 17/62] Rearrange test set directory structure and get basic tests functioning --- ...onfigure_a_configuredassetdataconnector.py | 4 ++-- ...how_to_configure_a_runtimedataconnector.py | 3 +-- ...configure_an_inferredassetdataconnector.py | 6 +++--- tests/integration/test_script_runner.py | 6 +++--- .../green/tripdata_2018-10.csv | 21 ------------------- .../green/tripdata_2018-11.csv | 21 ------------------- .../green/tripdata_2018-12.csv | 21 ------------------- .../green/tripdata_2019-01.csv | 21 ------------------- .../green/tripdata_2019-02.csv | 21 ------------------- .../green/tripdata_2019-03.csv | 21 ------------------- .../green/2018/tripdata-10.csv | 0 .../green/2018/tripdata-11.csv | 0 .../green/2018/tripdata-12.csv | 0 .../green/2019/tripdata-01.csv | 0 .../green/2019/tripdata-02.csv | 0 .../green/2019/tripdata-03.csv | 0 .../yellow/2018/10/tripdata.csv | 0 .../yellow/2018/11/tripdata.csv | 0 .../yellow/2018/12/tripdata.csv | 0 .../yellow/2019/01/tripdata.csv | 0 .../yellow/2019/02/tripdata.csv | 0 .../yellow/2019/03/tripdata.csv | 0 .../yellow_tripdata_2018-10.csv | 0 .../yellow_tripdata_2018-11.csv | 0 .../yellow_tripdata_2018-12.csv | 0 .../yellow_tripdata_2019-01.csv | 0 .../yellow_tripdata_2019-02.csv | 0 .../yellow_tripdata_2019-03.csv | 0 .../green_tripdata_2018-10.csv | 0 .../green_tripdata_2018-11.csv | 0 .../green_tripdata_2018-12.csv | 0 .../green_tripdata_2019-01.csv | 0 .../green_tripdata_2019-02.csv | 0 .../green_tripdata_2019-03.csv | 0 .../yellow_tripdata_2018-10.csv} | 0 .../yellow_tripdata_2018-11.csv} | 0 .../yellow_tripdata_2018-12.csv} | 0 .../yellow_tripdata_2019-01.csv} | 0 .../yellow_tripdata_2019-02.csv} | 0 .../yellow_tripdata_2019-03.csv} | 0 40 files changed, 9 insertions(+), 136 deletions(-) delete mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2018-10.csv delete mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2018-11.csv delete mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2018-12.csv delete mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2019-01.csv delete mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2019-02.csv delete mode 100644 tests/test_sets/dataconnector_docs/green/tripdata_2019-03.csv rename tests/test_sets/dataconnector_docs/{ => nested_directories}/green/2018/tripdata-10.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/green/2018/tripdata-11.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/green/2018/tripdata-12.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/green/2019/tripdata-01.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/green/2019/tripdata-02.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/green/2019/tripdata-03.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/yellow/2018/10/tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/yellow/2018/11/tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/yellow/2018/12/tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/yellow/2019/01/tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/yellow/2019/02/tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{ => nested_directories}/yellow/2019/03/tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_one_data_asset}/yellow_tripdata_2018-10.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_one_data_asset}/yellow_tripdata_2018-11.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_one_data_asset}/yellow_tripdata_2018-12.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_one_data_asset}/yellow_tripdata_2019-01.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_one_data_asset}/yellow_tripdata_2019-02.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_one_data_asset}/yellow_tripdata_2019-03.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_two_data_assets}/green_tripdata_2018-10.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_two_data_assets}/green_tripdata_2018-11.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_two_data_assets}/green_tripdata_2018-12.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_two_data_assets}/green_tripdata_2019-01.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_two_data_assets}/green_tripdata_2019-02.csv (100%) rename tests/test_sets/dataconnector_docs/{ => single_directory_two_data_assets}/green_tripdata_2019-03.csv (100%) rename tests/test_sets/dataconnector_docs/{yellow/tripdata_2018-10.csv => single_directory_two_data_assets/yellow_tripdata_2018-10.csv} (100%) rename tests/test_sets/dataconnector_docs/{yellow/tripdata_2018-11.csv => single_directory_two_data_assets/yellow_tripdata_2018-11.csv} (100%) rename tests/test_sets/dataconnector_docs/{yellow/tripdata_2018-12.csv => single_directory_two_data_assets/yellow_tripdata_2018-12.csv} (100%) rename tests/test_sets/dataconnector_docs/{yellow/tripdata_2019-01.csv => single_directory_two_data_assets/yellow_tripdata_2019-01.csv} (100%) rename tests/test_sets/dataconnector_docs/{yellow/tripdata_2019-02.csv => single_directory_two_data_assets/yellow_tripdata_2019-02.csv} (100%) rename tests/test_sets/dataconnector_docs/{yellow/tripdata_2019-03.csv => single_directory_two_data_assets/yellow_tripdata_2019-03.csv} (100%) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index 7a02ff4c2fc3..1cc48e6f280b 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -19,7 +19,7 @@ "base_directory": "my_directory/", "assets": { "taxi": { - "pattern": "yellow_trip_data_sample_(.*)\.csv", + "pattern": "yellow_tripdata_(.*)\.csv", "group_names": ["month"], } }, @@ -31,7 +31,7 @@ # In normal usage you'd set your path directly in the yaml above. datasource_config["data_connectors"]["default_configured_data_connector_name"][ "base_directory" -] = "../data/" +] = "../data/single_directory_one_data_asset/" context.test_yaml_config(yaml.dump(datasource_config)) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py index 5298e0b8df8a..a6b9bd2f4c46 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py @@ -35,7 +35,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the BatchRequest above. -batch_request.runtime_parameters["path"] = "./data/yellow_trip_data_sample_2019-01.csv" +batch_request.runtime_parameters["path"] = "./data/single_directory_one_data_asset/yellow_tripdata_2019-01.csv" context.create_expectation_suite( expectation_suite_name="test_suite", overwrite_existing=True @@ -44,7 +44,6 @@ validator = context.get_validator( batch_request=batch_request, expectation_suite_name="test_suite", - batch_identifiers={"month": "2019-02"}, ) print(validator.head()) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index eb60a8d64986..14947948f182 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -29,7 +29,7 @@ # In normal usage you'd set your path directly in the yaml above. datasource_config["data_connectors"]["default_inferred_data_connector_name"][ "base_directory" -] = "../data/" +] = "../data/single_directory_one_data_asset/" context.test_yaml_config(yaml.dump(datasource_config)) @@ -44,7 +44,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your data asset name directly in the BatchRequest above. -batch_request.data_asset_name = "yellow_trip_data_sample_2019-01" +batch_request.data_asset_name = "yellow_tripdata_2019-01" context.create_expectation_suite( expectation_suite_name="test_suite", overwrite_existing=True @@ -58,7 +58,7 @@ # NOTE: The following code is only for testing and can be ignored by users. assert isinstance(validator, ge.validator.validator.Validator) assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] -assert "yellow_trip_data_sample_2019-01" in set( +assert "yellow_tripdata_2019-01" in set( context.get_available_data_asset_names()["taxi_datasource"][ "default_inferred_data_connector_name" ] diff --git a/tests/integration/test_script_runner.py b/tests/integration/test_script_runner.py index 3222ca238ec3..b1021bc79312 100755 --- a/tests/integration/test_script_runner.py +++ b/tests/integration/test_script_runner.py @@ -301,21 +301,21 @@ class BackendDependencies(enum.Enum): { "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py", "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations", - "data_dir": "tests/test_sets/taxi_yellow_trip_data_samples/first_3_files", + "data_dir": "tests/test_sets/dataconnector_docs", "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", "extra_backend_dependencies": BackendDependencies.POSTGRESQL, }, { "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py", "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations", - "data_dir": "tests/test_sets/taxi_yellow_trip_data_samples/first_3_files", + "data_dir": "tests/test_sets/dataconnector_docs", "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", "extra_backend_dependencies": BackendDependencies.POSTGRESQL, }, { "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py", "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations", - "data_dir": "tests/test_sets/taxi_yellow_trip_data_samples/first_3_files", + "data_dir": "tests/test_sets/dataconnector_docs", "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", }, # { diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2018-10.csv deleted file mode 100644 index 2b518c3e4a83..000000000000 --- a/tests/test_sets/dataconnector_docs/green/tripdata_2018-10.csv +++ /dev/null @@ -1,21 +0,0 @@ -,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type -448278,2,2018-10-20 13:58:45,2018-10-20 14:05:35,N,1,112,256,1,1.03,6.5,0.0,0.5,0.0,0.0,,0.3,7.3,2,1.0 -520261,1,2018-10-23 17:48:10,2018-10-23 17:55:35,Y,1,181,181,1,0.9,6.5,1.0,0.5,1.5,0.0,,0.3,9.8,1,1.0 -520094,2,2018-10-23 17:39:25,2018-10-23 18:42:28,N,1,75,133,1,14.6,48.5,1.0,0.5,11.21,5.76,,0.3,67.27,1,1.0 -465246,2,2018-10-21 03:41:56,2018-10-21 03:45:23,N,1,95,95,2,0.86,5.0,0.5,0.5,1.0,0.0,,0.3,7.3,1,1.0 -652784,2,2018-10-29 12:03:25,2018-10-29 12:14:19,N,1,18,31,1,2.24,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,2,1.0 -234842,2,2018-10-11 15:24:48,2018-10-11 15:47:51,N,1,197,95,1,3.42,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1.0 -199443,2,2018-10-09 22:19:35,2018-10-09 22:23:38,N,1,41,42,1,0.63,5.0,0.5,0.5,1.26,0.0,,0.3,7.56,1,1.0 -478271,2,2018-10-21 18:16:09,2018-10-21 18:25:49,N,1,74,263,1,2.25,9.0,0.0,0.5,1.96,0.0,,0.3,11.76,1,1.0 -480009,2,2018-10-21 19:32:30,2018-10-21 19:54:38,N,1,95,260,2,4.31,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 -621419,2,2018-10-27 22:52:50,2018-10-27 22:55:15,N,1,136,136,1,0.01,3.5,0.5,0.5,0.0,0.0,,0.3,4.8,2,1.0 -60768,2,2018-10-03 19:47:13,2018-10-03 19:50:39,N,1,256,255,1,0.52,4.0,1.0,0.5,1.45,0.0,,0.3,7.25,1,1.0 -559361,2,2018-10-25 12:04:52,2018-10-25 12:39:25,N,5,65,76,1,5.8,21.86,0.0,0.5,0.0,0.0,,0.0,22.36,1,2.0 -226070,2,2018-10-11 08:52:56,2018-10-11 09:22:58,N,1,166,163,1,3.73,20.0,0.0,0.5,4.16,0.0,,0.3,24.96,1,1.0 -578687,2,2018-10-26 08:44:43,2018-10-26 09:20:34,N,1,49,114,5,3.58,23.0,0.0,0.5,4.76,0.0,,0.3,30.51,1,1.0 -133625,2,2018-10-06 19:59:11,2018-10-06 20:38:07,N,1,181,100,2,9.41,34.5,0.0,0.5,10.26,5.76,,0.3,51.32,1,1.0 -118040,2,2018-10-06 03:38:45,2018-10-07 03:18:41,N,1,256,256,1,0.65,4.5,0.5,0.5,0.0,0.0,,0.3,5.8,2,1.0 -199254,2,2018-10-09 22:23:05,2018-10-09 22:33:00,N,1,181,97,1,1.53,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 -508650,2,2018-10-23 08:21:15,2018-10-23 08:40:55,N,1,166,142,5,2.41,14.0,0.0,0.5,1.0,0.0,,0.3,15.8,1,1.0 -597703,2,2018-10-26 21:14:29,2018-10-26 21:22:49,N,1,41,239,5,1.8,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 -472658,2,2018-10-21 13:36:13,2018-10-21 13:41:32,N,1,7,179,1,0.97,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2018-11.csv deleted file mode 100644 index f68bd7715093..000000000000 --- a/tests/test_sets/dataconnector_docs/green/tripdata_2018-11.csv +++ /dev/null @@ -1,21 +0,0 @@ -,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type -206147,2,2018-11-09 21:58:22,2018-11-09 22:09:19,N,1,97,17,1,1.69,9.0,0.5,0.5,0.0,0.0,,0.3,10.3,2,1.0 -586398,2,2018-11-28 07:16:28,2018-11-28 07:40:30,N,1,81,250,1,4.39,20.5,0.0,0.5,0.0,0.0,,0.3,21.3,1,1.0 -410503,2,2018-11-19 08:36:21,2018-11-19 08:53:22,N,1,116,128,1,4.59,17.5,0.0,0.5,4.58,0.0,,0.3,22.88,1,1.0 -284524,2,2018-11-13 14:57:24,2018-11-13 15:23:48,N,1,53,121,1,5.38,21.5,0.0,0.5,0.0,0.0,,0.3,22.3,1,1.0 -652213,2,2018-11-30 20:18:38,2018-11-30 20:23:04,N,1,75,74,1,1.23,6.0,0.5,0.5,1.46,0.0,,0.3,8.76,1,1.0 -453632,2,2018-11-21 08:46:22,2018-11-21 09:12:23,N,5,17,35,1,3.68,16.39,0.0,0.5,0.0,0.0,,0.0,16.89,1,2.0 -514609,2,2018-11-24 14:03:10,2018-11-24 14:26:29,N,1,149,89,1,4.51,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 -570411,2,2018-11-27 12:59:00,2018-11-27 13:03:56,N,1,25,25,1,0.79,5.5,0.0,0.5,1.26,0.0,,0.3,7.56,1,1.0 -328751,2,2018-11-15 12:26:07,2018-11-15 12:58:31,N,5,52,37,1,6.39,21.25,0.0,0.5,0.0,0.0,,0.0,21.75,1,2.0 -290145,2,2018-11-13 18:54:48,2018-11-13 19:04:14,N,1,74,41,1,1.15,7.5,1.0,0.5,1.86,0.0,,0.3,11.16,1,1.0 -273210,2,2018-11-12 23:48:01,2018-11-12 23:48:03,N,5,166,166,1,0.0,20.0,0.0,0.0,4.0,0.0,,0.0,24.0,1,2.0 -598576,2,2018-11-28 16:55:56,2018-11-28 17:03:05,N,1,41,74,2,0.71,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 -19526,2,2018-11-01 19:20:33,2018-11-01 19:27:39,N,1,25,181,1,0.87,6.5,1.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 -645647,2,2018-11-30 16:50:09,2018-11-30 16:56:01,N,1,7,7,2,0.69,5.5,1.0,0.5,1.46,0.0,,0.3,10.71,1,1.0 -642343,2,2018-11-30 14:56:19,2018-11-30 15:06:20,N,1,33,97,1,1.08,7.5,0.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 -284366,2,2018-11-13 14:23:08,2018-11-13 14:53:25,N,1,81,20,1,6.94,26.0,0.0,0.5,0.0,0.0,,0.3,26.8,1,1.0 -608380,2,2018-11-29 03:50:30,2018-11-29 03:55:17,N,1,74,42,1,0.98,5.5,0.5,0.5,2.04,0.0,,0.3,8.84,1,1.0 -131427,2,2018-11-06 17:46:21,2018-11-06 17:50:42,N,1,41,42,1,0.85,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1.0 -368687,2,2018-11-17 10:13:10,2018-11-17 10:27:14,N,1,95,82,1,1.65,10.5,0.0,0.5,0.0,0.0,,0.3,11.3,2,1.0 -13155,1,2018-11-01 15:40:11,2018-11-01 15:48:58,N,1,43,236,1,0.8,7.0,0.0,0.5,1.17,0.0,,0.3,8.97,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2018-12.csv deleted file mode 100644 index 375b98f8a05a..000000000000 --- a/tests/test_sets/dataconnector_docs/green/tripdata_2018-12.csv +++ /dev/null @@ -1,21 +0,0 @@ -,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type -644743,2,2018-12-29 22:07:39,2018-12-29 22:23:38,N,1,255,7,1,4.2,15.0,0.5,0.5,3.26,0.0,,0.3,19.56,1,1.0 -241539,1,2018-12-11 13:27:48,2018-12-11 14:01:25,N,1,244,87,1,13.4,40.5,0.0,0.5,8.25,0.0,,0.3,49.55,1,1.0 -519266,2,2018-12-22 23:20:48,2018-12-22 23:42:53,N,1,255,225,1,4.03,16.5,0.5,0.5,3.56,0.0,,0.3,21.36,1,1.0 -419676,2,2018-12-18 20:30:48,2018-12-18 20:42:37,N,1,166,116,1,2.59,11.0,0.5,0.5,0.0,0.0,,0.3,12.3,1,1.0 -110768,2,2018-12-05 21:20:39,2018-12-05 21:45:12,N,1,255,61,1,4.08,18.5,0.5,0.5,5.94,0.0,,0.3,25.74,1,1.0 -591680,2,2018-12-27 11:21:47,2018-12-27 12:21:54,N,1,50,9,1,15.01,56.0,0.0,0.5,0.0,5.76,,0.3,62.56,1,1.0 -532284,2,2018-12-23 16:30:52,2018-12-23 16:39:42,N,1,74,75,1,0.57,7.0,0.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 -149369,2,2018-12-07 14:18:11,2018-12-07 14:40:06,N,1,179,95,1,6.96,23.0,0.0,0.5,4.76,0.0,,0.3,28.56,1,1.0 -40899,2,2018-12-02 19:07:00,2018-12-02 19:17:53,N,1,97,49,1,1.74,9.0,0.0,0.5,2.45,0.0,,0.3,12.25,1,1.0 -341430,2,2018-12-15 14:19:06,2018-12-15 14:32:47,N,5,11,29,1,4.9,15.06,0.0,0.5,0.0,0.0,,0.0,15.56,1,2.0 -400460,2,2018-12-18 06:09:49,2018-12-18 06:27:36,N,5,14,231,1,7.75,24.39,0.0,0.5,0.0,5.76,,0.0,30.65,1,2.0 -320076,2,2018-12-14 18:00:27,2018-12-14 18:09:47,N,1,7,7,1,0.83,7.0,1.0,0.5,0.0,0.0,,0.3,8.8,2,1.0 -263463,2,2018-12-12 12:06:59,2018-12-12 12:17:05,N,1,260,223,1,3.48,12.5,0.0,0.5,0.0,0.0,,0.3,13.3,2,1.0 -245734,2,2018-12-11 16:20:31,2018-12-11 16:30:51,N,1,75,151,1,1.58,8.5,1.0,0.5,0.0,0.0,,0.3,10.3,2,1.0 -173368,2,2018-12-08 11:34:24,2018-12-08 11:53:37,N,1,181,61,1,3.79,15.5,0.0,0.5,0.0,0.0,,0.3,16.3,1,1.0 -37580,2,2018-12-02 15:23:58,2018-12-02 15:45:49,N,1,82,28,1,3.73,16.5,0.0,0.5,0.0,0.0,,0.3,17.3,1,1.0 -82903,2,2018-12-04 19:07:55,2018-12-04 19:32:12,N,5,242,167,1,4.84,19.86,0.0,0.5,0.0,0.0,,0.0,20.36,1,2.0 -531182,2,2018-12-23 15:43:36,2018-12-23 16:16:42,N,1,82,173,1,2.87,20.0,0.0,0.5,0.0,0.0,,0.3,20.8,2,1.0 -532295,2,2018-12-23 16:09:21,2018-12-23 16:15:12,N,1,181,181,1,0.64,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 -112713,2,2018-12-05 23:07:24,2018-12-05 23:15:40,N,1,129,129,1,1.3,7.5,0.5,0.5,1.76,0.0,,0.3,10.56,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2019-01.csv deleted file mode 100644 index 07a92dc26d64..000000000000 --- a/tests/test_sets/dataconnector_docs/green/tripdata_2019-01.csv +++ /dev/null @@ -1,21 +0,0 @@ -,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge -198236,2,2019-01-11 01:12:46,2019-01-11 01:20:37,N,1,129,129,1,1.39,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, -33286,1,2019-01-02 23:41:23,2019-01-02 23:49:14,N,1,255,112,1,1.5,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, -517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1, -368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1, -155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1, -366694,2,2019-01-18 22:58:57,2019-01-18 23:15:37,N,1,75,185,1,8.71,24.5,0.5,0.5,0.0,0.0,,0.3,25.8,1,1, -474972,2,2019-01-24 17:59:00,2019-01-24 18:35:53,N,5,195,90,1,6.46,23.45,0.0,0.5,0.0,0.0,,0.0,23.95,1,2, -69932,1,2019-01-04 18:07:02,2019-01-04 18:34:37,N,1,97,35,2,4.7,20.0,1.0,0.5,0.0,0.0,,0.3,21.8,1,1, -244782,2,2019-01-13 02:16:18,2019-01-13 02:37:30,N,1,260,198,1,3.9,16.5,0.5,0.5,0.0,0.0,,0.3,17.8,2,1, -482363,2,2019-01-25 04:42:54,2019-01-25 04:47:06,N,1,82,56,1,0.74,5.0,0.5,0.5,0.0,0.0,,0.3,6.3,2,1, -573895,2,2019-01-29 11:36:48,2019-01-29 11:43:03,N,1,43,236,1,0.91,6.0,0.0,0.5,2.0,0.0,,0.3,8.8,1,1,0.0 -182986,2,2019-01-10 12:07:54,2019-01-10 12:19:31,N,1,29,155,1,5.49,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, -526993,1,2019-01-26 22:13:18,2019-01-26 22:13:51,N,1,33,33,1,0.1,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1, -490940,2,2019-01-25 13:09:50,2019-01-25 13:15:30,N,1,220,153,1,0.35,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,1,1, -145635,2,2019-01-08 16:01:58,2019-01-08 16:10:16,N,1,77,72,1,1.48,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,1,1, -242083,1,2019-01-12 23:43:28,2019-01-13 00:02:13,N,1,210,216,1,12.3,34.5,0.5,0.5,0.0,0.0,,0.3,35.8,2,1, -328924,2,2019-01-17 10:07:09,2019-01-17 10:25:59,N,1,32,242,1,4.71,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, -132260,1,2019-01-07 21:00:39,2019-01-07 21:18:13,N,1,52,188,1,3.6,14.5,0.5,0.5,0.0,0.0,,0.3,15.8,2,1, -568032,2,2019-01-29 07:01:03,2019-01-29 07:54:28,N,1,225,251,1,16.91,55.0,0.0,0.5,0.0,11.52,,0.3,67.32,1,1,0.0 -92205,2,2019-01-05 20:26:49,2019-01-05 20:44:48,N,1,29,108,1,3.93,15.5,0.5,0.5,0.0,0.0,,0.3,16.8,1,1, diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2019-02.csv deleted file mode 100644 index 9a6442e61899..000000000000 --- a/tests/test_sets/dataconnector_docs/green/tripdata_2019-02.csv +++ /dev/null @@ -1,21 +0,0 @@ -,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge -504629,2,2019-02-25 16:05:49,2019-02-25 16:11:15,N,1,179,179,1,0.89,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,2,1,0.0 -475628,1,2019-02-23 23:18:58,2019-02-23 23:31:00,N,1,210,55,1,4.2,15.0,0.5,0.5,0.0,0.0,,0.3,16.3,2,1,0.0 -150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75 -245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0 -151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0 -18916,2,2019-02-01 19:54:28,2019-02-01 19:58:41,N,1,74,262,1,1.04,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,1,1,0.0 -110154,2,2019-02-06 08:33:30,2019-02-06 08:48:52,N,1,66,87,1,2.14,12.0,0.0,0.5,4.66,0.0,,0.3,20.21,1,1,2.75 -335114,2,2019-02-16 18:40:24,2019-02-16 18:46:47,N,1,74,75,1,0.89,6.0,0.0,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 -453776,2,2019-02-22 22:38:18,2019-02-22 22:49:20,N,1,116,41,1,1.86,10.0,0.5,0.5,2.26,0.0,,0.3,13.56,1,1,0.0 -192685,2,2019-02-09 19:56:13,2019-02-09 20:13:56,N,1,167,235,1,2.59,13.0,0.0,0.5,0.0,0.0,,0.3,13.8,1,1,0.0 -574982,2,2019-02-28 22:42:50,2019-02-28 22:47:40,N,1,42,116,1,0.91,5.5,0.5,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 -155855,2,2019-02-08 09:02:21,2019-02-08 10:17:23,N,5,210,161,1,20.74,59.86,0.0,0.5,0.0,0.0,,0.0,60.36,1,2,0.0 -130109,2,2019-02-07 05:18:25,2019-02-07 05:24:30,N,1,62,188,4,1.54,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,1,1,0.0 -157145,2,2019-02-08 10:20:47,2019-02-08 10:47:36,N,1,74,31,2,7.73,25.5,0.0,0.5,5.26,0.0,,0.3,31.56,1,1,0.0 -411748,2,2019-02-20 22:31:32,2019-02-20 22:37:11,N,1,129,129,4,0.9,5.5,0.5,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 -470538,2,2019-02-23 18:15:38,2019-02-23 18:27:05,N,1,126,213,1,2.21,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,1,1,0.0 -262705,2,2019-02-13 13:07:02,2019-02-13 13:19:50,N,1,186,249,1,1.5,9.5,0.0,0.5,2.56,0.0,,0.3,15.36,1,1,2.5 -163160,2,2019-02-08 15:39:26,2019-02-08 16:06:31,N,1,33,17,1,2.9,17.5,0.0,0.5,0.0,0.0,,0.3,18.3,2,1,0.0 -263174,2,2019-02-13 15:06:57,2019-02-13 15:28:26,N,1,165,123,5,1.66,13.5,0.0,0.5,0.0,0.0,,0.3,14.3,1,1,0.0 -451270,2,2019-02-22 21:00:04,2019-02-22 21:00:24,N,1,82,82,1,0.06,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/green/tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/green/tripdata_2019-03.csv deleted file mode 100644 index 5104e10f24c5..000000000000 --- a/tests/test_sets/dataconnector_docs/green/tripdata_2019-03.csv +++ /dev/null @@ -1,21 +0,0 @@ -,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge -337596,1,2019-03-18 08:20:31,2019-03-18 09:12:55,N,1,49,35,1,6.1,33.5,0.0,0.5,0.0,0.0,,0.3,34.3,1,1,0.0 -378981,2,2019-03-20 13:03:10,2019-03-20 13:16:30,N,1,152,41,1,1.14,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 -6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0 -76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75 -282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0 -439163,2,2019-03-23 12:07:55,2019-03-23 12:11:20,N,1,166,166,1,0.75,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1,0.0 -518619,2,2019-03-27 18:33:46,2019-03-27 18:40:59,N,1,7,223,1,0.59,6.0,1.0,0.5,1.95,0.0,,0.3,9.75,1,1,0.0 -385766,2,2019-03-20 18:19:37,2019-03-20 18:28:31,N,1,129,129,1,1.21,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,2,1,0.0 -131793,2,2019-03-07 18:40:23,2019-03-07 18:58:26,N,1,51,32,1,3.35,14.0,1.0,0.5,0.0,0.0,,0.3,15.8,1,1,0.0 -203472,2,2019-03-11 12:46:20,2019-03-11 13:20:08,N,1,119,254,3,6.47,27.5,0.0,0.5,0.0,0.0,,0.3,28.3,1,1,0.0 -399106,2,2019-03-21 13:10:44,2019-03-21 13:21:19,N,1,74,75,1,1.27,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 -425811,2,2019-03-22 19:00:23,2019-03-22 19:02:51,N,1,95,95,1,0.51,4.0,1.0,0.5,2.0,0.0,,0.3,7.8,1,1,0.0 -36345,2,2019-03-02 19:27:41,2019-03-02 19:33:52,N,1,7,179,1,0.86,6.0,0.0,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 -323601,2,2019-03-17 10:19:28,2019-03-17 10:22:59,N,1,97,49,1,0.68,4.5,0.0,0.5,1.06,0.0,,0.3,6.36,1,1,0.0 -246980,2,2019-03-13 17:03:07,2019-03-13 17:12:04,N,1,75,263,2,1.32,8.0,1.0,0.5,2.51,0.0,,0.3,15.06,1,1,2.75 -269737,1,2019-03-14 18:03:35,2019-03-14 18:10:01,N,1,66,65,1,0.6,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1,0.0 -145416,2,2019-03-08 12:21:02,2019-03-08 12:22:57,N,1,75,75,1,0.5,3.5,0.0,0.5,0.0,0.0,,0.3,4.3,2,1,0.0 -142333,2,2019-03-08 09:39:09,2019-03-08 10:03:48,N,5,215,16,1,7.14,21.62,0.0,0.5,0.0,0.0,,0.0,22.12,1,2,0.0 -381567,2,2019-03-20 15:09:47,2019-03-20 15:22:12,N,1,89,188,1,1.75,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 -476898,1,2019-03-25 13:10:35,2019-03-25 13:20:23,N,1,244,243,1,1.1,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/green/2018/tripdata-10.csv b/tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-10.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green/2018/tripdata-10.csv rename to tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-10.csv diff --git a/tests/test_sets/dataconnector_docs/green/2018/tripdata-11.csv b/tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-11.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green/2018/tripdata-11.csv rename to tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-11.csv diff --git a/tests/test_sets/dataconnector_docs/green/2018/tripdata-12.csv b/tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-12.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green/2018/tripdata-12.csv rename to tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-12.csv diff --git a/tests/test_sets/dataconnector_docs/green/2019/tripdata-01.csv b/tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-01.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green/2019/tripdata-01.csv rename to tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-01.csv diff --git a/tests/test_sets/dataconnector_docs/green/2019/tripdata-02.csv b/tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-02.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green/2019/tripdata-02.csv rename to tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-02.csv diff --git a/tests/test_sets/dataconnector_docs/green/2019/tripdata-03.csv b/tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-03.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green/2019/tripdata-03.csv rename to tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-03.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/2018/10/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/10/tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/2018/10/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/10/tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/2018/11/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/11/tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/2018/11/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/11/tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/2018/12/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/12/tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/2018/12/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/12/tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/2019/01/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/01/tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/2019/01/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/01/tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/2019/02/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/02/tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/2019/02/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/02/tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/2019/03/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/03/tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/2019/03/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/03/tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-10.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow_tripdata_2018-10.csv rename to tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-10.csv diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-11.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow_tripdata_2018-11.csv rename to tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-11.csv diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-12.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow_tripdata_2018-12.csv rename to tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-12.csv diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-01.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow_tripdata_2019-01.csv rename to tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-01.csv diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-02.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow_tripdata_2019-02.csv rename to tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-02.csv diff --git a/tests/test_sets/dataconnector_docs/yellow_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-03.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow_tripdata_2019-03.csv rename to tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-03.csv diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-10.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green_tripdata_2018-10.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-10.csv diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-11.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green_tripdata_2018-11.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-11.csv diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-12.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green_tripdata_2018-12.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-12.csv diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-01.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green_tripdata_2019-01.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-01.csv diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-02.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green_tripdata_2019-02.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-02.csv diff --git a/tests/test_sets/dataconnector_docs/green_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-03.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/green_tripdata_2019-03.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-03.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-10.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/tripdata_2018-10.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-10.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-11.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/tripdata_2018-11.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-11.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-12.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/tripdata_2018-12.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-12.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-01.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/tripdata_2019-01.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-01.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-02.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/tripdata_2019-02.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-02.csv diff --git a/tests/test_sets/dataconnector_docs/yellow/tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-03.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/yellow/tripdata_2019-03.csv rename to tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-03.csv From bdf1b050c6fa22d08d02f7a2d4fe3c3d7e578eee Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 20 Oct 2021 11:29:20 -0400 Subject: [PATCH 18/62] YAML config example alongside Python --- ...onfigure_a_configuredassetdataconnector.py | 28 ++++++++++++++++++- ...how_to_configure_a_runtimedataconnector.py | 18 ++++++++++++ ...configure_an_inferredassetdataconnector.py | 27 +++++++++++++++++- 3 files changed, 71 insertions(+), 2 deletions(-) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index 1cc48e6f280b..b0787586de7f 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -5,6 +5,32 @@ context = ge.get_context() +# YAML +datasource_yaml = f""" +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_configured_data_connector_name: + class_name: ConfiguredAssetFilesystemDataConnector + base_directory: my_directory/ + assets: + taxi: + pattern: (.*)\.csv + group_names: + - month +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace("my_directory/", "../data/single_directory_one_data_asset/") + +context.test_yaml_config(datasource_yaml) + +# Python datasource_config = { "name": "taxi_datasource", "class_name": "Datasource", @@ -28,7 +54,7 @@ } # Please note this override is only to provide good UX for docs and tests. -# In normal usage you'd set your path directly in the yaml above. +# In normal usage you'd set your path directly in the code above. datasource_config["data_connectors"]["default_configured_data_connector_name"][ "base_directory" ] = "../data/single_directory_one_data_asset/" diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py index a6b9bd2f4c46..f19923a99016 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py @@ -5,6 +5,24 @@ context = ge.get_context() +# YAML +datasource_yaml = f""" +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_runtime_data_connector_name: + class_name: RuntimeDataConnector + batch_identifiers: + - default_identifier_name +""" + +context.test_yaml_config(datasource_yaml) + +# Python datasource_config = { "name": "taxi_datasource", "class_name": "Datasource", diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index 14947948f182..499c5bf7129e 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -5,6 +5,31 @@ context = ge.get_context() +# YAML +datasource_yaml = f""" +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_inferred_data_connector_name: + class_name: InferredAssetFilesystemDataConnector + base_directory: my_directory/ + default_regex: + group_names: + - data_asset_name + pattern: (.*)\.csv +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace("my_directory/", "../data/single_directory_one_data_asset/") + +context.test_yaml_config(datasource_yaml) + +# Python datasource_config = { "name": "taxi_datasource", "class_name": "Datasource", @@ -26,7 +51,7 @@ } # Please note this override is only to provide good UX for docs and tests. -# In normal usage you'd set your path directly in the yaml above. +# In normal usage you'd set your path directly in the code above. datasource_config["data_connectors"]["default_inferred_data_connector_name"][ "base_directory" ] = "../data/single_directory_one_data_asset/" From 2dde0ecc8ba15ca8d5473efc86f07d67c986d73a Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 20 Oct 2021 12:06:17 -0400 Subject: [PATCH 19/62] Assert that yaml and python configs are equivalent --- .../how_to_configure_a_configuredassetdataconnector.py | 6 ++++-- .../how_to_configure_a_runtimedataconnector.py | 6 ++++-- .../how_to_configure_an_inferredassetdataconnector.py | 6 ++++-- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index b0787586de7f..c25fdc2949bd 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -28,7 +28,7 @@ # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace("my_directory/", "../data/single_directory_one_data_asset/") -context.test_yaml_config(datasource_yaml) +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") # Python datasource_config = { @@ -59,7 +59,9 @@ "base_directory" ] = "../data/single_directory_one_data_asset/" -context.test_yaml_config(yaml.dump(datasource_config)) +test_python = context.test_yaml_config(yaml.dump(datasource_config), return_mode="report_object") + +assert test_yaml == test_python context.add_datasource(**datasource_config) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py index f19923a99016..568692d204ab 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py @@ -20,7 +20,7 @@ - default_identifier_name """ -context.test_yaml_config(datasource_yaml) +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") # Python datasource_config = { @@ -39,7 +39,9 @@ }, } -context.test_yaml_config(yaml.dump(datasource_config)) +test_python = context.test_yaml_config(yaml.dump(datasource_config), return_mode="report_object") + +assert test_yaml == test_python context.add_datasource(**datasource_config) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index 499c5bf7129e..c30485a0b17f 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -27,7 +27,7 @@ # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace("my_directory/", "../data/single_directory_one_data_asset/") -context.test_yaml_config(datasource_yaml) +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") # Python datasource_config = { @@ -56,7 +56,9 @@ "base_directory" ] = "../data/single_directory_one_data_asset/" -context.test_yaml_config(yaml.dump(datasource_config)) +test_python = context.test_yaml_config(yaml.dump(datasource_config), return_mode="report_object") + +assert test_yaml == test_python context.add_datasource(**datasource_config) From 440844647ee046c67b7e93947ceaca71325fcb6b Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 20 Oct 2021 12:43:04 -0400 Subject: [PATCH 20/62] Steps 1 and 2 with tabs --- ...onfigure_a_configuredassetdataconnector.md | 81 ++++++++++++++--- ...configure_an_inferredassetdataconnector.md | 89 ++++++++++++++++--- ...onfigure_a_configuredassetdataconnector.py | 2 +- ...how_to_configure_a_runtimedataconnector.py | 2 +- ...configure_an_inferredassetdataconnector.py | 2 +- 5 files changed, 148 insertions(+), 28 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 1ad45a723c77..916622357e08 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -2,6 +2,8 @@ title: How to configure a ConfiguredAssetDataConnector --- import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx' +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; This guide demonstrates how to configure a ConfiguredAssetDataConnector, and provides several examples you can use for configuration. @@ -20,28 +22,85 @@ but also S3 object stores, etc: If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md). -Set up a Datasource -------------------- +## Steps + +### 1. Instantiate your project's DataContext + +Import these necessary packages and modules: + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L3-L4 +``` + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L1-L4 +``` + + + + +### 2. Set up a Datasource All of the examples below assume you’re testing configuration using something like: + + + ```python -import great_expectations as ge -context = ge.get_context() -config = f""" +datasource_yaml = """ +name: taxi_datasource class_name: Datasource execution_engine: class_name: PandasExecutionEngine data_connectors: - my_filesystem_data_connector: - {data_connector configuration goes here} + default_configured_data_connector_name: + """ -context.test_yaml_config( - name="my_pandas_datasource", - yaml_config=config -) +context.test_yaml_config(yaml_config=datasource_config) +``` + + + + +```python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_configured_data_connector_name": { + + }, + }, +} +context.test_yaml_config(yaml.dump(datasource_config)) ``` + + + + + If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) Choose a DataConnector diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 28ffbb2997e5..a6343ba3da04 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -2,6 +2,8 @@ title: How to configure an InferredAssetDataConnector --- import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx' +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; This guide demonstrates how to configure an InferredAssetDataConnector, and provides several examples you can use for configuration. @@ -23,26 +25,85 @@ InferredAssetDataConnector has fewer options, so it's simpler to set up. It’s If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md). -Set up a Datasource -------------------- +## Steps + +### 1. Instantiate your project's DataContext + +Import these necessary packages and modules: + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L3-L4 +``` + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L1-L4 +``` + + + + +### 2. Set up a Datasource All the examples below assume you’re testing configurations using something like: + + + ```python -import great_expectations as ge -context = ge.DataContext() - -context.test_yaml_config(""" -my_data_source: - class_name: Datasource - execution_engine: - class_name: PandasExecutionEngine - data_connectors: - my_filesystem_data_connector: - {data_connector configuration goes here} -""") +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +execution_engine: + class_name: PandasExecutionEngine +data_connectors: + default_inferred_data_connector_name: + +""" +context.test_yaml_config(yaml_config=datasource_config) ``` + + + +```python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_inferred_data_connector_name": { + + }, + }, +} +context.test_yaml_config(yaml.dump(datasource_config)) +``` + + + + + + If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) Choose a DataConnector diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index c25fdc2949bd..26034d6232ae 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -6,7 +6,7 @@ context = ge.get_context() # YAML -datasource_yaml = f""" +datasource_yaml = """ name: taxi_datasource class_name: Datasource module_name: great_expectations.datasource diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py index 568692d204ab..e465c24e37ce 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py @@ -6,7 +6,7 @@ context = ge.get_context() # YAML -datasource_yaml = f""" +datasource_yaml = """ name: taxi_datasource class_name: Datasource module_name: great_expectations.datasource diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index c30485a0b17f..e7e3a538b1db 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -6,7 +6,7 @@ context = ge.get_context() # YAML -datasource_yaml = f""" +datasource_yaml = """ name: taxi_datasource class_name: Datasource module_name: great_expectations.datasource From 53616a2a9d309500a1d635cacf39f52a263ecd45 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 20 Oct 2021 13:01:59 -0400 Subject: [PATCH 21/62] Remove test sets that are no longer needed --- .../yellow_tripdata_2018-10.csv | 21 ------------------- .../yellow_tripdata_2018-11.csv | 21 ------------------- .../yellow_tripdata_2018-12.csv | 21 ------------------- .../green_tripdata_2018-10.csv | 21 ------------------- .../green_tripdata_2018-11.csv | 21 ------------------- .../green_tripdata_2018-12.csv | 21 ------------------- .../yellow_tripdata_2018-10.csv | 21 ------------------- .../yellow_tripdata_2018-11.csv | 21 ------------------- .../yellow_tripdata_2018-12.csv | 21 ------------------- 9 files changed, 189 deletions(-) delete mode 100644 tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-10.csv delete mode 100644 tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-11.csv delete mode 100644 tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-12.csv delete mode 100644 tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-10.csv delete mode 100644 tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-11.csv delete mode 100644 tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-12.csv delete mode 100644 tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-10.csv delete mode 100644 tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-11.csv delete mode 100644 tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-12.csv diff --git a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-10.csv deleted file mode 100644 index 0ce6520c5822..000000000000 --- a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-10.csv +++ /dev/null @@ -1,21 +0,0 @@ -,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge -6984,2,2018-10-30 10:59:02,2018-10-30 11:04:30,1,0.62,1,N,48,163,1,5.5,0.0,0.5,1.26,0.0,0.3,7.56, -3030,2,2018-10-03 19:43:48,2018-10-03 20:01:51,1,3.93,1,N,137,239,1,15.0,1.0,0.5,2.5,0.0,0.3,19.3, -9267,1,2018-10-16 07:10:28,2018-10-16 07:19:21,1,1.4,1,N,142,236,1,8.0,0.0,0.5,0.88,0.0,0.3,9.68, -4342,1,2018-10-10 10:19:16,2018-10-10 10:33:52,1,2.1,1,N,142,50,2,11.5,0.0,0.5,0.0,0.0,0.3,12.3, -8099,2,2018-10-01 19:18:44,2018-10-01 19:31:48,1,2.51,1,N,140,239,1,11.0,1.0,0.5,3.2,0.0,0.3,16.0, -8972,2,2018-10-19 01:40:24,2018-10-19 01:51:42,2,2.54,1,N,249,164,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, -240,1,2018-10-11 19:54:10,2018-10-11 20:19:34,1,5.2,1,N,231,232,1,22.0,1.0,0.5,4.76,0.0,0.3,28.56, -7844,2,2018-10-30 08:58:08,2018-10-30 09:05:09,1,1.08,1,N,249,211,1,6.5,0.0,0.5,1.0,0.0,0.3,8.3, -7024,1,2018-10-11 04:56:32,2018-10-11 05:14:29,1,8.0,1,N,263,138,1,25.0,0.5,0.5,8.0,5.76,0.3,40.06, -7601,2,2018-10-27 00:44:06,2018-10-27 00:57:52,1,2.18,1,N,113,100,1,11.0,0.5,0.5,2.46,0.0,0.3,14.76, -7686,2,2018-10-01 17:13:29,2018-10-01 17:16:10,5,0.36,1,N,263,236,1,3.5,1.0,0.5,1.59,0.0,0.3,6.89, -1344,1,2018-10-03 20:31:19,2018-10-03 21:11:37,1,20.2,3,N,236,1,1,73.5,0.5,0.0,18.35,17.5,0.3,110.15, -2539,2,2018-10-24 00:46:47,2018-10-24 01:07:38,1,14.1,1,N,132,210,1,39.0,0.5,0.5,5.0,0.0,0.3,45.3, -2758,1,2018-10-17 22:25:54,2018-10-17 22:42:38,1,6.2,1,N,237,87,1,20.0,0.5,0.5,2.0,0.0,0.3,23.3, -567,1,2018-10-22 15:57:44,2018-10-22 16:25:52,1,3.1,1,N,264,264,1,18.5,0.0,0.5,3.85,0.0,0.3,23.15, -1994,1,2018-10-28 13:36:15,2018-10-28 13:47:38,1,1.8,1,N,79,232,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, -3549,2,2018-10-25 21:00:53,2018-10-25 21:17:23,1,2.58,1,N,170,48,1,12.0,0.5,0.5,1.0,0.0,0.3,14.3, -3867,2,2018-10-16 13:26:54,2018-10-16 13:51:57,3,1.8,1,N,230,158,1,16.0,0.0,0.5,3.36,0.0,0.3,20.16, -864,2,2018-10-20 10:53:46,2018-10-20 11:03:28,1,1.22,1,N,262,75,1,8.0,0.0,0.5,2.64,0.0,0.3,11.44, -9457,1,2018-10-01 18:19:51,2018-10-01 18:39:05,1,2.6,1,N,144,186,1,14.0,1.0,0.5,3.15,0.0,0.3,18.95, diff --git a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-11.csv deleted file mode 100644 index 87ff7e28e6fa..000000000000 --- a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-11.csv +++ /dev/null @@ -1,21 +0,0 @@ -,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge -7201,2,2018-11-03 20:17:25,2018-11-03 20:25:26,1,1.15,1,N,166,238,1,7.5,0.5,0.5,2.2,0.0,0.3,11.0, -2578,2,2018-11-14 09:03:52,2018-11-14 09:21:39,1,0.9,1,N,230,230,1,11.5,0.0,0.5,1.0,0.0,0.3,13.3, -9216,1,2018-11-16 13:12:09,2018-11-16 13:24:41,1,1.5,1,N,238,143,2,9.5,0.0,0.5,0.0,0.0,0.3,10.3, -5518,1,2018-11-15 20:44:20,2018-11-15 20:58:41,1,1.7,1,N,236,151,1,11.0,0.5,0.5,0.0,0.0,0.3,12.3, -104,2,2018-11-07 00:01:14,2018-11-07 00:05:56,5,0.85,1,N,234,170,1,5.5,0.5,0.5,1.36,0.0,0.3,8.16, -536,2,2018-11-11 12:43:04,2018-11-11 12:59:33,1,2.11,1,N,233,114,1,11.5,0.0,0.5,2.46,0.0,0.3,14.76, -2167,2,2018-11-15 14:14:45,2018-11-15 14:24:39,2,0.8,1,N,142,230,1,7.5,0.0,0.5,2.08,0.0,0.3,10.38, -5875,4,2018-11-07 07:24:55,2018-11-07 07:30:28,1,0.81,1,N,164,233,2,5.5,0.0,0.5,0.0,0.0,0.3,6.3, -8196,2,2018-11-05 13:46:25,2018-11-05 13:47:12,3,0.08,1,N,236,236,1,2.5,0.0,0.5,0.66,0.0,0.3,3.96, -8175,1,2018-11-13 20:46:11,2018-11-13 20:50:29,1,0.6,1,N,107,137,2,5.0,0.5,0.5,0.0,0.0,0.3,6.3, -6314,2,2018-11-25 19:36:38,2018-11-25 19:41:30,1,1.77,1,N,263,74,2,7.0,0.0,0.5,0.0,0.0,0.3,7.8, -7700,2,2018-11-18 21:33:49,2018-11-18 21:46:58,2,2.76,1,N,163,24,1,12.0,0.5,0.5,3.32,0.0,0.3,16.62, -9062,1,2018-11-03 18:39:31,2018-11-03 18:49:25,1,0.8,1,N,164,230,2,7.5,0.0,0.5,0.0,0.0,0.3,8.3, -6701,1,2018-11-20 06:12:00,2018-11-20 06:19:35,1,2.2,1,N,237,75,2,8.5,0.0,0.5,0.0,0.0,0.3,9.3, -399,2,2018-11-13 21:20:51,2018-11-13 21:34:45,1,1.18,1,N,162,230,1,9.5,0.5,0.5,5.0,0.0,0.3,15.8, -2745,2,2018-11-03 00:07:35,2018-11-03 00:29:31,1,2.55,1,N,68,4,2,15.0,0.5,0.5,0.0,0.0,0.3,16.3, -5363,2,2018-11-23 20:16:12,2018-11-23 20:20:46,1,1.05,1,N,237,162,2,5.5,0.5,0.5,0.0,0.0,0.3,6.8, -383,1,2018-11-10 22:31:50,2018-11-10 22:47:48,1,1.1,1,N,161,141,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, -1537,2,2018-11-17 01:05:40,2018-11-17 01:18:09,1,1.62,1,N,114,79,1,9.5,0.5,0.5,2.16,0.0,0.3,12.96, -1760,2,2018-11-01 13:52:35,2018-11-01 13:59:37,1,0.5,1,N,230,162,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8, diff --git a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-12.csv deleted file mode 100644 index 50eb34d13f83..000000000000 --- a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2018-12.csv +++ /dev/null @@ -1,21 +0,0 @@ -,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge -6701,1,2018-12-07 22:07:55,2018-12-07 22:35:18,1,3.5,1,N,237,249,1,18.5,0.5,0.5,2.0,0.0,0.3,21.8, -9645,2,2018-12-10 18:21:08,2018-12-10 18:33:12,1,1.38,1,N,114,158,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, -4105,2,2018-12-15 23:27:23,2018-12-15 23:44:02,1,2.62,1,N,232,137,2,13.0,0.5,0.5,0.0,0.0,0.3,14.3, -2743,2,2018-12-12 19:54:39,2018-12-12 19:57:53,1,0.41,1,N,237,237,1,4.0,1.0,0.5,1.16,0.0,0.3,6.96, -4126,1,2018-12-21 18:57:55,2018-12-21 19:09:35,1,1.0,1,N,249,79,1,8.5,1.0,0.5,2.05,0.0,0.3,12.35, -1566,1,2018-12-16 15:38:10,2018-12-16 15:55:05,3,1.4,1,N,236,141,1,11.5,0.0,0.5,1.85,0.0,0.3,14.15, -4857,2,2018-12-06 17:26:50,2018-12-06 17:39:34,2,0.93,1,N,142,239,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, -304,1,2018-12-27 16:00:34,2018-12-27 16:34:26,2,5.7,1,N,163,209,2,24.0,1.0,0.5,0.0,0.0,0.3,25.8, -8159,2,2018-12-05 18:32:06,2018-12-05 18:41:03,1,1.38,1,N,68,90,1,8.0,1.0,0.5,2.94,0.0,0.3,12.74, -6575,4,2018-12-30 23:53:05,2018-12-30 23:55:40,1,0.39,1,N,256,256,1,4.0,0.5,0.5,1.06,0.0,0.3,6.36, -8327,2,2018-12-14 18:42:59,2018-12-14 18:51:49,1,1.26,1,N,163,236,1,8.0,1.0,0.5,1.5,0.0,0.3,11.3, -5245,1,2018-12-23 16:00:21,2018-12-23 16:12:44,1,1.6,1,N,141,142,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, -3521,2,2018-12-05 00:12:07,2018-12-05 00:30:25,2,3.15,1,N,164,45,1,14.0,0.5,0.5,3.06,0.0,0.3,18.36, -9442,2,2018-12-14 08:58:16,2018-12-14 09:06:42,1,0.67,1,N,239,238,1,6.0,0.0,0.5,0.8,0.0,0.3,7.6, -922,1,2018-12-09 03:45:39,2018-12-09 03:52:09,1,1.8,1,N,90,170,1,7.5,0.5,0.5,1.0,0.0,0.3,9.8, -807,2,2018-12-26 17:30:45,2018-12-26 17:53:04,1,5.34,1,N,164,261,2,19.0,1.0,0.5,0.0,0.0,0.3,20.8, -6354,2,2018-12-31 09:38:01,2018-12-31 09:46:53,1,2.2,1,N,24,236,1,9.5,0.0,0.5,2.0,0.0,0.3,12.3, -7329,2,2018-12-09 02:54:29,2018-12-09 03:00:14,5,1.19,1,N,164,246,1,6.5,0.5,0.5,1.95,0.0,0.3,9.75, -6227,1,2018-12-01 18:32:30,2018-12-01 19:06:38,1,4.2,1,N,90,263,1,22.0,0.0,0.5,4.55,0.0,0.3,27.35, -9796,1,2018-12-17 18:18:12,2018-12-17 18:32:05,1,1.4,1,N,68,107,1,10.0,1.0,0.5,1.2,0.0,0.3,13.0, diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-10.csv deleted file mode 100644 index 2b518c3e4a83..000000000000 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-10.csv +++ /dev/null @@ -1,21 +0,0 @@ -,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type -448278,2,2018-10-20 13:58:45,2018-10-20 14:05:35,N,1,112,256,1,1.03,6.5,0.0,0.5,0.0,0.0,,0.3,7.3,2,1.0 -520261,1,2018-10-23 17:48:10,2018-10-23 17:55:35,Y,1,181,181,1,0.9,6.5,1.0,0.5,1.5,0.0,,0.3,9.8,1,1.0 -520094,2,2018-10-23 17:39:25,2018-10-23 18:42:28,N,1,75,133,1,14.6,48.5,1.0,0.5,11.21,5.76,,0.3,67.27,1,1.0 -465246,2,2018-10-21 03:41:56,2018-10-21 03:45:23,N,1,95,95,2,0.86,5.0,0.5,0.5,1.0,0.0,,0.3,7.3,1,1.0 -652784,2,2018-10-29 12:03:25,2018-10-29 12:14:19,N,1,18,31,1,2.24,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,2,1.0 -234842,2,2018-10-11 15:24:48,2018-10-11 15:47:51,N,1,197,95,1,3.42,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1.0 -199443,2,2018-10-09 22:19:35,2018-10-09 22:23:38,N,1,41,42,1,0.63,5.0,0.5,0.5,1.26,0.0,,0.3,7.56,1,1.0 -478271,2,2018-10-21 18:16:09,2018-10-21 18:25:49,N,1,74,263,1,2.25,9.0,0.0,0.5,1.96,0.0,,0.3,11.76,1,1.0 -480009,2,2018-10-21 19:32:30,2018-10-21 19:54:38,N,1,95,260,2,4.31,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 -621419,2,2018-10-27 22:52:50,2018-10-27 22:55:15,N,1,136,136,1,0.01,3.5,0.5,0.5,0.0,0.0,,0.3,4.8,2,1.0 -60768,2,2018-10-03 19:47:13,2018-10-03 19:50:39,N,1,256,255,1,0.52,4.0,1.0,0.5,1.45,0.0,,0.3,7.25,1,1.0 -559361,2,2018-10-25 12:04:52,2018-10-25 12:39:25,N,5,65,76,1,5.8,21.86,0.0,0.5,0.0,0.0,,0.0,22.36,1,2.0 -226070,2,2018-10-11 08:52:56,2018-10-11 09:22:58,N,1,166,163,1,3.73,20.0,0.0,0.5,4.16,0.0,,0.3,24.96,1,1.0 -578687,2,2018-10-26 08:44:43,2018-10-26 09:20:34,N,1,49,114,5,3.58,23.0,0.0,0.5,4.76,0.0,,0.3,30.51,1,1.0 -133625,2,2018-10-06 19:59:11,2018-10-06 20:38:07,N,1,181,100,2,9.41,34.5,0.0,0.5,10.26,5.76,,0.3,51.32,1,1.0 -118040,2,2018-10-06 03:38:45,2018-10-07 03:18:41,N,1,256,256,1,0.65,4.5,0.5,0.5,0.0,0.0,,0.3,5.8,2,1.0 -199254,2,2018-10-09 22:23:05,2018-10-09 22:33:00,N,1,181,97,1,1.53,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 -508650,2,2018-10-23 08:21:15,2018-10-23 08:40:55,N,1,166,142,5,2.41,14.0,0.0,0.5,1.0,0.0,,0.3,15.8,1,1.0 -597703,2,2018-10-26 21:14:29,2018-10-26 21:22:49,N,1,41,239,5,1.8,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 -472658,2,2018-10-21 13:36:13,2018-10-21 13:41:32,N,1,7,179,1,0.97,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-11.csv deleted file mode 100644 index f68bd7715093..000000000000 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-11.csv +++ /dev/null @@ -1,21 +0,0 @@ -,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type -206147,2,2018-11-09 21:58:22,2018-11-09 22:09:19,N,1,97,17,1,1.69,9.0,0.5,0.5,0.0,0.0,,0.3,10.3,2,1.0 -586398,2,2018-11-28 07:16:28,2018-11-28 07:40:30,N,1,81,250,1,4.39,20.5,0.0,0.5,0.0,0.0,,0.3,21.3,1,1.0 -410503,2,2018-11-19 08:36:21,2018-11-19 08:53:22,N,1,116,128,1,4.59,17.5,0.0,0.5,4.58,0.0,,0.3,22.88,1,1.0 -284524,2,2018-11-13 14:57:24,2018-11-13 15:23:48,N,1,53,121,1,5.38,21.5,0.0,0.5,0.0,0.0,,0.3,22.3,1,1.0 -652213,2,2018-11-30 20:18:38,2018-11-30 20:23:04,N,1,75,74,1,1.23,6.0,0.5,0.5,1.46,0.0,,0.3,8.76,1,1.0 -453632,2,2018-11-21 08:46:22,2018-11-21 09:12:23,N,5,17,35,1,3.68,16.39,0.0,0.5,0.0,0.0,,0.0,16.89,1,2.0 -514609,2,2018-11-24 14:03:10,2018-11-24 14:26:29,N,1,149,89,1,4.51,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 -570411,2,2018-11-27 12:59:00,2018-11-27 13:03:56,N,1,25,25,1,0.79,5.5,0.0,0.5,1.26,0.0,,0.3,7.56,1,1.0 -328751,2,2018-11-15 12:26:07,2018-11-15 12:58:31,N,5,52,37,1,6.39,21.25,0.0,0.5,0.0,0.0,,0.0,21.75,1,2.0 -290145,2,2018-11-13 18:54:48,2018-11-13 19:04:14,N,1,74,41,1,1.15,7.5,1.0,0.5,1.86,0.0,,0.3,11.16,1,1.0 -273210,2,2018-11-12 23:48:01,2018-11-12 23:48:03,N,5,166,166,1,0.0,20.0,0.0,0.0,4.0,0.0,,0.0,24.0,1,2.0 -598576,2,2018-11-28 16:55:56,2018-11-28 17:03:05,N,1,41,74,2,0.71,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 -19526,2,2018-11-01 19:20:33,2018-11-01 19:27:39,N,1,25,181,1,0.87,6.5,1.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 -645647,2,2018-11-30 16:50:09,2018-11-30 16:56:01,N,1,7,7,2,0.69,5.5,1.0,0.5,1.46,0.0,,0.3,10.71,1,1.0 -642343,2,2018-11-30 14:56:19,2018-11-30 15:06:20,N,1,33,97,1,1.08,7.5,0.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 -284366,2,2018-11-13 14:23:08,2018-11-13 14:53:25,N,1,81,20,1,6.94,26.0,0.0,0.5,0.0,0.0,,0.3,26.8,1,1.0 -608380,2,2018-11-29 03:50:30,2018-11-29 03:55:17,N,1,74,42,1,0.98,5.5,0.5,0.5,2.04,0.0,,0.3,8.84,1,1.0 -131427,2,2018-11-06 17:46:21,2018-11-06 17:50:42,N,1,41,42,1,0.85,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1.0 -368687,2,2018-11-17 10:13:10,2018-11-17 10:27:14,N,1,95,82,1,1.65,10.5,0.0,0.5,0.0,0.0,,0.3,11.3,2,1.0 -13155,1,2018-11-01 15:40:11,2018-11-01 15:48:58,N,1,43,236,1,0.8,7.0,0.0,0.5,1.17,0.0,,0.3,8.97,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-12.csv deleted file mode 100644 index 375b98f8a05a..000000000000 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2018-12.csv +++ /dev/null @@ -1,21 +0,0 @@ -,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type -644743,2,2018-12-29 22:07:39,2018-12-29 22:23:38,N,1,255,7,1,4.2,15.0,0.5,0.5,3.26,0.0,,0.3,19.56,1,1.0 -241539,1,2018-12-11 13:27:48,2018-12-11 14:01:25,N,1,244,87,1,13.4,40.5,0.0,0.5,8.25,0.0,,0.3,49.55,1,1.0 -519266,2,2018-12-22 23:20:48,2018-12-22 23:42:53,N,1,255,225,1,4.03,16.5,0.5,0.5,3.56,0.0,,0.3,21.36,1,1.0 -419676,2,2018-12-18 20:30:48,2018-12-18 20:42:37,N,1,166,116,1,2.59,11.0,0.5,0.5,0.0,0.0,,0.3,12.3,1,1.0 -110768,2,2018-12-05 21:20:39,2018-12-05 21:45:12,N,1,255,61,1,4.08,18.5,0.5,0.5,5.94,0.0,,0.3,25.74,1,1.0 -591680,2,2018-12-27 11:21:47,2018-12-27 12:21:54,N,1,50,9,1,15.01,56.0,0.0,0.5,0.0,5.76,,0.3,62.56,1,1.0 -532284,2,2018-12-23 16:30:52,2018-12-23 16:39:42,N,1,74,75,1,0.57,7.0,0.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 -149369,2,2018-12-07 14:18:11,2018-12-07 14:40:06,N,1,179,95,1,6.96,23.0,0.0,0.5,4.76,0.0,,0.3,28.56,1,1.0 -40899,2,2018-12-02 19:07:00,2018-12-02 19:17:53,N,1,97,49,1,1.74,9.0,0.0,0.5,2.45,0.0,,0.3,12.25,1,1.0 -341430,2,2018-12-15 14:19:06,2018-12-15 14:32:47,N,5,11,29,1,4.9,15.06,0.0,0.5,0.0,0.0,,0.0,15.56,1,2.0 -400460,2,2018-12-18 06:09:49,2018-12-18 06:27:36,N,5,14,231,1,7.75,24.39,0.0,0.5,0.0,5.76,,0.0,30.65,1,2.0 -320076,2,2018-12-14 18:00:27,2018-12-14 18:09:47,N,1,7,7,1,0.83,7.0,1.0,0.5,0.0,0.0,,0.3,8.8,2,1.0 -263463,2,2018-12-12 12:06:59,2018-12-12 12:17:05,N,1,260,223,1,3.48,12.5,0.0,0.5,0.0,0.0,,0.3,13.3,2,1.0 -245734,2,2018-12-11 16:20:31,2018-12-11 16:30:51,N,1,75,151,1,1.58,8.5,1.0,0.5,0.0,0.0,,0.3,10.3,2,1.0 -173368,2,2018-12-08 11:34:24,2018-12-08 11:53:37,N,1,181,61,1,3.79,15.5,0.0,0.5,0.0,0.0,,0.3,16.3,1,1.0 -37580,2,2018-12-02 15:23:58,2018-12-02 15:45:49,N,1,82,28,1,3.73,16.5,0.0,0.5,0.0,0.0,,0.3,17.3,1,1.0 -82903,2,2018-12-04 19:07:55,2018-12-04 19:32:12,N,5,242,167,1,4.84,19.86,0.0,0.5,0.0,0.0,,0.0,20.36,1,2.0 -531182,2,2018-12-23 15:43:36,2018-12-23 16:16:42,N,1,82,173,1,2.87,20.0,0.0,0.5,0.0,0.0,,0.3,20.8,2,1.0 -532295,2,2018-12-23 16:09:21,2018-12-23 16:15:12,N,1,181,181,1,0.64,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 -112713,2,2018-12-05 23:07:24,2018-12-05 23:15:40,N,1,129,129,1,1.3,7.5,0.5,0.5,1.76,0.0,,0.3,10.56,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-10.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-10.csv deleted file mode 100644 index 0ce6520c5822..000000000000 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-10.csv +++ /dev/null @@ -1,21 +0,0 @@ -,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge -6984,2,2018-10-30 10:59:02,2018-10-30 11:04:30,1,0.62,1,N,48,163,1,5.5,0.0,0.5,1.26,0.0,0.3,7.56, -3030,2,2018-10-03 19:43:48,2018-10-03 20:01:51,1,3.93,1,N,137,239,1,15.0,1.0,0.5,2.5,0.0,0.3,19.3, -9267,1,2018-10-16 07:10:28,2018-10-16 07:19:21,1,1.4,1,N,142,236,1,8.0,0.0,0.5,0.88,0.0,0.3,9.68, -4342,1,2018-10-10 10:19:16,2018-10-10 10:33:52,1,2.1,1,N,142,50,2,11.5,0.0,0.5,0.0,0.0,0.3,12.3, -8099,2,2018-10-01 19:18:44,2018-10-01 19:31:48,1,2.51,1,N,140,239,1,11.0,1.0,0.5,3.2,0.0,0.3,16.0, -8972,2,2018-10-19 01:40:24,2018-10-19 01:51:42,2,2.54,1,N,249,164,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, -240,1,2018-10-11 19:54:10,2018-10-11 20:19:34,1,5.2,1,N,231,232,1,22.0,1.0,0.5,4.76,0.0,0.3,28.56, -7844,2,2018-10-30 08:58:08,2018-10-30 09:05:09,1,1.08,1,N,249,211,1,6.5,0.0,0.5,1.0,0.0,0.3,8.3, -7024,1,2018-10-11 04:56:32,2018-10-11 05:14:29,1,8.0,1,N,263,138,1,25.0,0.5,0.5,8.0,5.76,0.3,40.06, -7601,2,2018-10-27 00:44:06,2018-10-27 00:57:52,1,2.18,1,N,113,100,1,11.0,0.5,0.5,2.46,0.0,0.3,14.76, -7686,2,2018-10-01 17:13:29,2018-10-01 17:16:10,5,0.36,1,N,263,236,1,3.5,1.0,0.5,1.59,0.0,0.3,6.89, -1344,1,2018-10-03 20:31:19,2018-10-03 21:11:37,1,20.2,3,N,236,1,1,73.5,0.5,0.0,18.35,17.5,0.3,110.15, -2539,2,2018-10-24 00:46:47,2018-10-24 01:07:38,1,14.1,1,N,132,210,1,39.0,0.5,0.5,5.0,0.0,0.3,45.3, -2758,1,2018-10-17 22:25:54,2018-10-17 22:42:38,1,6.2,1,N,237,87,1,20.0,0.5,0.5,2.0,0.0,0.3,23.3, -567,1,2018-10-22 15:57:44,2018-10-22 16:25:52,1,3.1,1,N,264,264,1,18.5,0.0,0.5,3.85,0.0,0.3,23.15, -1994,1,2018-10-28 13:36:15,2018-10-28 13:47:38,1,1.8,1,N,79,232,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, -3549,2,2018-10-25 21:00:53,2018-10-25 21:17:23,1,2.58,1,N,170,48,1,12.0,0.5,0.5,1.0,0.0,0.3,14.3, -3867,2,2018-10-16 13:26:54,2018-10-16 13:51:57,3,1.8,1,N,230,158,1,16.0,0.0,0.5,3.36,0.0,0.3,20.16, -864,2,2018-10-20 10:53:46,2018-10-20 11:03:28,1,1.22,1,N,262,75,1,8.0,0.0,0.5,2.64,0.0,0.3,11.44, -9457,1,2018-10-01 18:19:51,2018-10-01 18:39:05,1,2.6,1,N,144,186,1,14.0,1.0,0.5,3.15,0.0,0.3,18.95, diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-11.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-11.csv deleted file mode 100644 index 87ff7e28e6fa..000000000000 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-11.csv +++ /dev/null @@ -1,21 +0,0 @@ -,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge -7201,2,2018-11-03 20:17:25,2018-11-03 20:25:26,1,1.15,1,N,166,238,1,7.5,0.5,0.5,2.2,0.0,0.3,11.0, -2578,2,2018-11-14 09:03:52,2018-11-14 09:21:39,1,0.9,1,N,230,230,1,11.5,0.0,0.5,1.0,0.0,0.3,13.3, -9216,1,2018-11-16 13:12:09,2018-11-16 13:24:41,1,1.5,1,N,238,143,2,9.5,0.0,0.5,0.0,0.0,0.3,10.3, -5518,1,2018-11-15 20:44:20,2018-11-15 20:58:41,1,1.7,1,N,236,151,1,11.0,0.5,0.5,0.0,0.0,0.3,12.3, -104,2,2018-11-07 00:01:14,2018-11-07 00:05:56,5,0.85,1,N,234,170,1,5.5,0.5,0.5,1.36,0.0,0.3,8.16, -536,2,2018-11-11 12:43:04,2018-11-11 12:59:33,1,2.11,1,N,233,114,1,11.5,0.0,0.5,2.46,0.0,0.3,14.76, -2167,2,2018-11-15 14:14:45,2018-11-15 14:24:39,2,0.8,1,N,142,230,1,7.5,0.0,0.5,2.08,0.0,0.3,10.38, -5875,4,2018-11-07 07:24:55,2018-11-07 07:30:28,1,0.81,1,N,164,233,2,5.5,0.0,0.5,0.0,0.0,0.3,6.3, -8196,2,2018-11-05 13:46:25,2018-11-05 13:47:12,3,0.08,1,N,236,236,1,2.5,0.0,0.5,0.66,0.0,0.3,3.96, -8175,1,2018-11-13 20:46:11,2018-11-13 20:50:29,1,0.6,1,N,107,137,2,5.0,0.5,0.5,0.0,0.0,0.3,6.3, -6314,2,2018-11-25 19:36:38,2018-11-25 19:41:30,1,1.77,1,N,263,74,2,7.0,0.0,0.5,0.0,0.0,0.3,7.8, -7700,2,2018-11-18 21:33:49,2018-11-18 21:46:58,2,2.76,1,N,163,24,1,12.0,0.5,0.5,3.32,0.0,0.3,16.62, -9062,1,2018-11-03 18:39:31,2018-11-03 18:49:25,1,0.8,1,N,164,230,2,7.5,0.0,0.5,0.0,0.0,0.3,8.3, -6701,1,2018-11-20 06:12:00,2018-11-20 06:19:35,1,2.2,1,N,237,75,2,8.5,0.0,0.5,0.0,0.0,0.3,9.3, -399,2,2018-11-13 21:20:51,2018-11-13 21:34:45,1,1.18,1,N,162,230,1,9.5,0.5,0.5,5.0,0.0,0.3,15.8, -2745,2,2018-11-03 00:07:35,2018-11-03 00:29:31,1,2.55,1,N,68,4,2,15.0,0.5,0.5,0.0,0.0,0.3,16.3, -5363,2,2018-11-23 20:16:12,2018-11-23 20:20:46,1,1.05,1,N,237,162,2,5.5,0.5,0.5,0.0,0.0,0.3,6.8, -383,1,2018-11-10 22:31:50,2018-11-10 22:47:48,1,1.1,1,N,161,141,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, -1537,2,2018-11-17 01:05:40,2018-11-17 01:18:09,1,1.62,1,N,114,79,1,9.5,0.5,0.5,2.16,0.0,0.3,12.96, -1760,2,2018-11-01 13:52:35,2018-11-01 13:59:37,1,0.5,1,N,230,162,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8, diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-12.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-12.csv deleted file mode 100644 index 50eb34d13f83..000000000000 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2018-12.csv +++ /dev/null @@ -1,21 +0,0 @@ -,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge -6701,1,2018-12-07 22:07:55,2018-12-07 22:35:18,1,3.5,1,N,237,249,1,18.5,0.5,0.5,2.0,0.0,0.3,21.8, -9645,2,2018-12-10 18:21:08,2018-12-10 18:33:12,1,1.38,1,N,114,158,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, -4105,2,2018-12-15 23:27:23,2018-12-15 23:44:02,1,2.62,1,N,232,137,2,13.0,0.5,0.5,0.0,0.0,0.3,14.3, -2743,2,2018-12-12 19:54:39,2018-12-12 19:57:53,1,0.41,1,N,237,237,1,4.0,1.0,0.5,1.16,0.0,0.3,6.96, -4126,1,2018-12-21 18:57:55,2018-12-21 19:09:35,1,1.0,1,N,249,79,1,8.5,1.0,0.5,2.05,0.0,0.3,12.35, -1566,1,2018-12-16 15:38:10,2018-12-16 15:55:05,3,1.4,1,N,236,141,1,11.5,0.0,0.5,1.85,0.0,0.3,14.15, -4857,2,2018-12-06 17:26:50,2018-12-06 17:39:34,2,0.93,1,N,142,239,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, -304,1,2018-12-27 16:00:34,2018-12-27 16:34:26,2,5.7,1,N,163,209,2,24.0,1.0,0.5,0.0,0.0,0.3,25.8, -8159,2,2018-12-05 18:32:06,2018-12-05 18:41:03,1,1.38,1,N,68,90,1,8.0,1.0,0.5,2.94,0.0,0.3,12.74, -6575,4,2018-12-30 23:53:05,2018-12-30 23:55:40,1,0.39,1,N,256,256,1,4.0,0.5,0.5,1.06,0.0,0.3,6.36, -8327,2,2018-12-14 18:42:59,2018-12-14 18:51:49,1,1.26,1,N,163,236,1,8.0,1.0,0.5,1.5,0.0,0.3,11.3, -5245,1,2018-12-23 16:00:21,2018-12-23 16:12:44,1,1.6,1,N,141,142,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, -3521,2,2018-12-05 00:12:07,2018-12-05 00:30:25,2,3.15,1,N,164,45,1,14.0,0.5,0.5,3.06,0.0,0.3,18.36, -9442,2,2018-12-14 08:58:16,2018-12-14 09:06:42,1,0.67,1,N,239,238,1,6.0,0.0,0.5,0.8,0.0,0.3,7.6, -922,1,2018-12-09 03:45:39,2018-12-09 03:52:09,1,1.8,1,N,90,170,1,7.5,0.5,0.5,1.0,0.0,0.3,9.8, -807,2,2018-12-26 17:30:45,2018-12-26 17:53:04,1,5.34,1,N,164,261,2,19.0,1.0,0.5,0.0,0.0,0.3,20.8, -6354,2,2018-12-31 09:38:01,2018-12-31 09:46:53,1,2.2,1,N,24,236,1,9.5,0.0,0.5,2.0,0.0,0.3,12.3, -7329,2,2018-12-09 02:54:29,2018-12-09 03:00:14,5,1.19,1,N,164,246,1,6.5,0.5,0.5,1.95,0.0,0.3,9.75, -6227,1,2018-12-01 18:32:30,2018-12-01 19:06:38,1,4.2,1,N,90,263,1,22.0,0.0,0.5,4.55,0.0,0.3,27.35, -9796,1,2018-12-17 18:18:12,2018-12-17 18:32:05,1,1.4,1,N,68,107,1,10.0,1.0,0.5,1.2,0.0,0.3,13.0, From aaed2ef2a226caeb31c6ee4f76083645ac51d01e Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 20 Oct 2021 13:02:26 -0400 Subject: [PATCH 22/62] WIP step 3 --- ...onfigure_a_configuredassetdataconnector.md | 53 ++++++++++--------- ...configure_an_inferredassetdataconnector.md | 51 +++++++++--------- 2 files changed, 54 insertions(+), 50 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 916622357e08..1761f2df7347 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -99,12 +99,9 @@ context.test_yaml_config(yaml.dump(datasource_config)) - - If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) -Choose a DataConnector ----------------------- +### 3. Choose a DataConnector ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require `DataAsset`s to be explicitly named. Each `DataAsset` can have their own regex `pattern` and `group_names`, and if configured, will override any @@ -113,36 +110,40 @@ explicitly named. Each `DataAsset` can have their own regex `pattern` and `group Imagine you have the following files in `my_directory/`: ``` -my_directory/alpha-1.csv -my_directory/alpha-2.csv -my_directory/alpha-3.csv +my_directory/yellow_tripdata_2019-01.csv +my_directory/yellow_tripdata_2019-02.csv +my_directory/yellow_tripdata_2019-03.csv ``` -We could create a DataAsset `alpha` that contains 3 data_references (`alpha-1.csv`, `alpha-2.csv`, and `alpha-3.csv`). +We could create a DataAsset `yellow_tripdata` that contains 3 data_references (`yellow_tripdata_2019-01.csv`, `yellow_tripdata_2019-02.csv`, and `yellow_tripdata_2019-03.csv`). In that case, the configuration would look like the following: -```yaml - my_data_source: - class_name: Datasource - execution_engine: - class_name: PandasExecutionEngine - data_connectors: - my_filesystem_data_connector: - class_name: ConfiguredAssetFilesystemDataConnector - base_directory: my_directory/ - default_regex: - assets: - alpha: - pattern: alpha-(.*)\.csv - group_names: - - index + + + +```python file=./../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L9-L25 +``` + + + + +```python file=./../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L34-L54 ``` -Notice that we have specified a pattern that captures the number after `alpha-` in the filename and assigns it to the `group_name` `index`. + + + +Notice that we have specified a pattern that captures the year-month combination after `yellow_tripdata_` in the filename and assigns it to the `group_name` `month`. -The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the index on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. +The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the `DataAsset`. -Later on we could retrieve the data in `alpha-2.csv` of `alpha` as its own batch using `context.get_batch()` by specifying `{"index": "2"}` as the `batch_identifier`. +Later on we could retrieve the data in `yellow_tripdata_2019-02.csv` of `yellow_tripdata` as its own batch using `context.get_batch()` by specifying `{"month": "2019-02"}` as the `batch_identifier`. ```python my_batch = context.get_batch( diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index a6343ba3da04..2d9e595bd3a7 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -102,12 +102,9 @@ context.test_yaml_config(yaml.dump(datasource_config)) - - If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) -Choose a DataConnector ----------------------- +### 3. Choose a DataConnector InferredAssetDataConnectors like `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector` require a `default_regex` parameter, with a configured regex `pattern` and capture `group_names`. @@ -115,38 +112,44 @@ require a `default_regex` parameter, with a configured regex `pattern` and captu Imagine you have the following files in `my_directory/`: ``` -my_directory/alpha-2020-01-01.csv -my_directory/alpha-2020-01-02.csv -my_directory/alpha-2020-01-03.csv +my_directory/yellow_tripdata_2019-01.csv +my_directory/yellow_tripdata_2019-02.csv +my_directory/yellow_tripdata_2019-03.csv ``` We can imagine two approaches to loading the data into GE. The simplest approach would be to consider each file to be its own DataAsset. In that case, the configuration would look like the following: -```yaml -class_name: Datasource -execution_engine: - class_name: PandasExecutionEngine -data_connectors: - my_filesystem_data_connector: - class_name: InferredAssetFilesystemDataConnector - datasource_name: my_data_source - base_directory: my_directory/ - default_regex: - group_names: - - data_asset_name - pattern: (.*)\.csv + + + +```python file=./../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L9-L24 ``` + + + +```python file=./../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L33-L51 +``` + + + + Notice that the `default_regex` is configured to have one capture group (`(.*)`) which captures the entire filename. That capture group is assigned to `data_asset_name` under `group_names`. -Running `test_yaml_config()` would result in 3 DataAssets : `alpha-2020-01-01`, `alpha-2020-01-02` and `alpha-2020-01-03`. +Running `test_yaml_config()` would result in 3 DataAssets : `yellow_tripdata_2019-01`, `yellow_tripdata_2019-02` and `yellow_tripdata_2019-03`. -However, a closer look at the filenames reveals a pattern that is common to the 3 files. Each have `alpha-` in the name, and have date information afterwards. These are the types of patterns that InferredAssetDataConnectors allow you to take advantage of. +However, a closer look at the filenames reveals a pattern that is common to the 3 files. Each have `yellow_tripdata_` in the name, and have date information afterwards. These are the types of patterns that InferredAssetDataConnectors allow you to take advantage of. -We could treat `alpha-*.csv` files as batches within the `alpha` DataAsset with a more specific regex `pattern` and adding `group_names` for `year`, `month` and `day`. +We could treat `yellow_tripdata_*.csv` files as batches within the `yellow_tripdata` `DataAsset` with a more specific regex `pattern` and adding `group_names` for `year` and `month`. -**Note: ** We have chosen to be more specific in the capture groups for the `year` `month` and `day` by specifying the integer value (using `\d`) and the number of digits, but a simpler capture group like `(.*)` would also work. +**Note: ** We have chosen to be more specific in the capture groups for the `year` and `month` by specifying the integer value (using `\d`) and the number of digits, but a simpler capture group like `(.*)` would also work. ```yaml class_name: Datasource From d05d4403a0a4981d1734fd829dfa30d1ba10bd3d Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 20 Oct 2021 13:03:39 -0400 Subject: [PATCH 23/62] Linting --- .../how_to_configure_a_configuredassetdataconnector.py | 8 ++++++-- .../how_to_configure_a_runtimedataconnector.py | 8 ++++++-- .../how_to_configure_an_inferredassetdataconnector.py | 8 ++++++-- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index 26034d6232ae..ee4b6691fec3 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -26,7 +26,9 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. -datasource_yaml = datasource_yaml.replace("my_directory/", "../data/single_directory_one_data_asset/") +datasource_yaml = datasource_yaml.replace( + "my_directory/", "../data/single_directory_one_data_asset/" +) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -59,7 +61,9 @@ "base_directory" ] = "../data/single_directory_one_data_asset/" -test_python = context.test_yaml_config(yaml.dump(datasource_config), return_mode="report_object") +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) assert test_yaml == test_python diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py index e465c24e37ce..45ded2f4a309 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py @@ -39,7 +39,9 @@ }, } -test_python = context.test_yaml_config(yaml.dump(datasource_config), return_mode="report_object") +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) assert test_yaml == test_python @@ -55,7 +57,9 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the BatchRequest above. -batch_request.runtime_parameters["path"] = "./data/single_directory_one_data_asset/yellow_tripdata_2019-01.csv" +batch_request.runtime_parameters[ + "path" +] = "./data/single_directory_one_data_asset/yellow_tripdata_2019-01.csv" context.create_expectation_suite( expectation_suite_name="test_suite", overwrite_existing=True diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index e7e3a538b1db..8b96d8a49985 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -25,7 +25,9 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. -datasource_yaml = datasource_yaml.replace("my_directory/", "../data/single_directory_one_data_asset/") +datasource_yaml = datasource_yaml.replace( + "my_directory/", "../data/single_directory_one_data_asset/" +) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -56,7 +58,9 @@ "base_directory" ] = "../data/single_directory_one_data_asset/" -test_python = context.test_yaml_config(yaml.dump(datasource_config), return_mode="report_object") +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) assert test_yaml == test_python From c4ab4c69044686dce77025a2bd761623cae28667 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 20 Oct 2021 14:33:20 -0400 Subject: [PATCH 24/62] WIP step 3 --- ...onfigure_a_configuredassetdataconnector.md | 16 ++-- ...configure_an_inferredassetdataconnector.md | 41 +++++---- ...onfigure_a_configuredassetdataconnector.py | 9 +- ...configure_an_inferredassetdataconnector.py | 86 +++++++++++++++++-- 4 files changed, 109 insertions(+), 43 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 1761f2df7347..d1879d6c216a 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -127,13 +127,13 @@ In that case, the configuration would look like the following: ]}> -```python file=./../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L9-L25 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L9-L25 ``` -```python file=./../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L34-L54 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L34-L54 ``` @@ -143,18 +143,12 @@ Notice that we have specified a pattern that captures the year-month combination The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the `DataAsset`. -Later on we could retrieve the data in `yellow_tripdata_2019-02.csv` of `yellow_tripdata` as its own batch using `context.get_batch()` by specifying `{"month": "2019-02"}` as the `batch_identifier`. +Later on we could retrieve the data in `yellow_tripdata_2019-02.csv` of `yellow_tripdata` as its own batch using `context.get_validator()` by specifying `{"month": "2019-02"}` as the `batch_identifier`. -```python -my_batch = context.get_batch( - datasource_name="my_data_source", - data_connector_name="my_filesystem_data_connector", - data_asset_name="alpha", - batch_identifiers={"index": "2"} -) +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L72-L87 ``` -This ability to access specific Batches using `batch_identifiers` is very useful when validating DataAssets that span multiple files. +This ability to access specific Batches using `batch_identifiers` is very useful when validating `DataAsset`s that span multiple files. For more information on `batches` and `batch_identifiers`, please refer to the [Core Concepts document](../../reference/dividing_data_assets_into_batches.md). A corresponding configuration for `ConfiguredAssetS3DataConnector` would look similar but would require `bucket` and `prefix` values instead of `base_directory`. diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 2d9e595bd3a7..30329e95dae9 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -130,13 +130,13 @@ The simplest approach would be to consider each file to be its own DataAsset. In ]}> -```python file=./../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L9-L24 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L9-L24 ``` -```python file=./../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L33-L51 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L33-L51 ``` @@ -151,25 +151,28 @@ We could treat `yellow_tripdata_*.csv` files as batches within the `yellow_tripd **Note: ** We have chosen to be more specific in the capture groups for the `year` and `month` by specifying the integer value (using `\d`) and the number of digits, but a simpler capture group like `(.*)` would also work. -```yaml -class_name: Datasource -execution_engine: - class_name: PandasExecutionEngine -data_connectors: - my_filesystem_data_connector: - class_name: InferredAssetFilesystemDataConnector - datasource_name: my_data_source - base_directory: my_directory/ - default_regex: - group_names: - - data_asset_name - - year - - month - - day - pattern: (.*)-(\d{4})-(\d{2})-(\d{2})\.csv + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L77-L94 ``` -Running `test_yaml_config()` would result in 1 DataAsset `alpha` with 3 associated data_references: `alpha-2020-01-01.csv`, `alpha-2020-01-02.csv` and `alpha-2020-01-03.csv`, seen also in Example 1 below. + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L105-L123 +``` + + + + +Running `test_yaml_config()` would result in 1 DataAsset `yellow_tripdata` with 3 associated data_references: `yellow_tripdata_2019-01.csv`, `yellow_tripdata_2019-02.csv` and `yellow_tripdata_2019-03.csv`, seen also in Example 1 below. A corresponding configuration for `InferredAssetS3DataConnector` would look similar but would require `bucket` and `prefix` values instead of `base_directory`. diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index ee4b6691fec3..6a0006ffa897 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -18,7 +18,7 @@ class_name: ConfiguredAssetFilesystemDataConnector base_directory: my_directory/ assets: - taxi: + yellow_tripdata: pattern: (.*)\.csv group_names: - month @@ -46,7 +46,7 @@ "class_name": "ConfiguredAssetFilesystemDataConnector", "base_directory": "my_directory/", "assets": { - "taxi": { + "yellow_tripdata": { "pattern": "yellow_tripdata_(.*)\.csv", "group_names": ["month"], } @@ -69,11 +69,10 @@ context.add_datasource(**datasource_config) -# Here is a BatchRequest using a path to a single CSV file batch_request = BatchRequest( datasource_name="taxi_datasource", data_connector_name="default_configured_data_connector_name", - data_asset_name="taxi", + data_asset_name="yellow_tripdata", ) context.create_expectation_suite( @@ -90,7 +89,7 @@ # NOTE: The following code is only for testing and can be ignored by users. assert isinstance(validator, ge.validator.validator.Validator) assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] -assert "taxi" in set( +assert "yellow_tripdata" in set( context.get_available_data_asset_names()["taxi_datasource"][ "default_configured_data_connector_name" ] diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index 8b96d8a49985..8e578740e1e3 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -66,7 +66,83 @@ context.add_datasource(**datasource_config) -# Here is a BatchRequest using a path to a single CSV file +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata_2019-01" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_inferred_data_connector_name: + class_name: InferredAssetFilesystemDataConnector + base_directory: my_directory/ + default_regex: + group_names: + - data_asset_name + - year + - month + pattern: (.*)_(\d{4})-(\d{2})\.csv +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "my_directory/", "../data/single_directory_one_data_asset/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_inferred_data_connector_name": { + "class_name": "InferredAssetFilesystemDataConnector", + "base_directory": "my_directory/", + "default_regex": { + "group_names": ["data_asset_name", "year", "month"], + "pattern": "(.*)_(\d{4})-(\d{2})\.csv", + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "base_directory" +] = "../data/single_directory_one_data_asset/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +assert test_yaml == test_python + +context.add_datasource(**datasource_config) + +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) + batch_request = BatchRequest( datasource_name="taxi_datasource", data_connector_name="default_inferred_data_connector_name", @@ -75,7 +151,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your data asset name directly in the BatchRequest above. -batch_request.data_asset_name = "yellow_tripdata_2019-01" +batch_request.data_asset_name = "yellow_tripdata" context.create_expectation_suite( expectation_suite_name="test_suite", overwrite_existing=True @@ -88,9 +164,3 @@ # NOTE: The following code is only for testing and can be ignored by users. assert isinstance(validator, ge.validator.validator.Validator) -assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] -assert "yellow_tripdata_2019-01" in set( - context.get_available_data_asset_names()["taxi_datasource"][ - "default_inferred_data_connector_name" - ] -) From 95d7f9098a37c4a13a085b9f524d750c916f351a Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Thu, 21 Oct 2021 12:49:27 -0400 Subject: [PATCH 25/62] Add S3 examples --- ...onfigure_a_configuredassetdataconnector.md | 6 +- ...configure_an_inferredassetdataconnector.md | 42 ++++---- ...configure_an_inferredassetdataconnector.py | 102 ++++++++++++++---- 3 files changed, 105 insertions(+), 45 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index d1879d6c216a..16a1d9c7dfe8 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -110,9 +110,9 @@ explicitly named. Each `DataAsset` can have their own regex `pattern` and `group Imagine you have the following files in `my_directory/`: ``` -my_directory/yellow_tripdata_2019-01.csv -my_directory/yellow_tripdata_2019-02.csv -my_directory/yellow_tripdata_2019-03.csv +/yellow_tripdata_2019-01.csv +/yellow_tripdata_2019-02.csv +/yellow_tripdata_2019-03.csv ``` We could create a DataAsset `yellow_tripdata` that contains 3 data_references (`yellow_tripdata_2019-01.csv`, `yellow_tripdata_2019-02.csv`, and `yellow_tripdata_2019-03.csv`). diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 30329e95dae9..95db61abfd42 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -112,9 +112,9 @@ require a `default_regex` parameter, with a configured regex `pattern` and captu Imagine you have the following files in `my_directory/`: ``` -my_directory/yellow_tripdata_2019-01.csv -my_directory/yellow_tripdata_2019-02.csv -my_directory/yellow_tripdata_2019-03.csv +/yellow_tripdata_2019-01.csv +/yellow_tripdata_2019-02.csv +/yellow_tripdata_2019-03.csv ``` We can imagine two approaches to loading the data into GE. @@ -176,25 +176,27 @@ Running `test_yaml_config()` would result in 1 DataAsset `yellow_tripdata` with A corresponding configuration for `InferredAssetS3DataConnector` would look similar but would require `bucket` and `prefix` values instead of `base_directory`. -```yaml -class_name: Datasource -execution_engine: - class_name: PandasExecutionEngine -data_connectors: - my_filesystem_data_connector: - class_name: InferredAssetS3DataConnector - datasource_name: my_data_source - bucket: MY_S3_BUCKET - prefix: MY_S3_BUCKET_PREFIX - default_regex: - group_names: - - data_asset_name - - year - - month - - day - pattern: (.*)-(\d{4})-(\d{2})-(\d{2})\.csv + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L147-L165 ``` + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L180-L199 +``` + + + + The following examples will show scenarios that InferredAssetDataConnectors can help you analyze, using `InferredAssetFilesystemDataConnector` as an example and only show the configuration under `data_connectors` for simplicity. diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index 8e578740e1e3..6f1450e8cdc9 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -16,7 +16,7 @@ data_connectors: default_inferred_data_connector_name: class_name: InferredAssetFilesystemDataConnector - base_directory: my_directory/ + base_directory: default_regex: group_names: - data_asset_name @@ -26,7 +26,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace( - "my_directory/", "../data/single_directory_one_data_asset/" + "", "../data/single_directory_one_data_asset/" ) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -43,7 +43,7 @@ "data_connectors": { "default_inferred_data_connector_name": { "class_name": "InferredAssetFilesystemDataConnector", - "base_directory": "my_directory/", + "base_directory": "", "default_regex": { "group_names": ["data_asset_name"], "pattern": "(.*)\.csv", @@ -84,7 +84,7 @@ data_connectors: default_inferred_data_connector_name: class_name: InferredAssetFilesystemDataConnector - base_directory: my_directory/ + base_directory: default_regex: group_names: - data_asset_name @@ -96,7 +96,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace( - "my_directory/", "../data/single_directory_one_data_asset/" + "", "../data/single_directory_one_data_asset/" ) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -113,7 +113,7 @@ "data_connectors": { "default_inferred_data_connector_name": { "class_name": "InferredAssetFilesystemDataConnector", - "base_directory": "my_directory/", + "base_directory": "", "default_regex": { "group_names": ["data_asset_name", "year", "month"], "pattern": "(.*)_(\d{4})-(\d{2})\.csv", @@ -143,24 +143,82 @@ ] ) -batch_request = BatchRequest( - datasource_name="taxi_datasource", - data_connector_name="default_inferred_data_connector_name", - data_asset_name="", -) +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_inferred_data_connector_name: + class_name: InferredAssetS3DataConnector + bucket: + prefix: + default_regex: + group_names: + - data_asset_name + - year + - month + pattern: (.*)_(\d{4})-(\d{2})\.csv +""" # Please note this override is only to provide good UX for docs and tests. -# In normal usage you'd set your data asset name directly in the BatchRequest above. -batch_request.data_asset_name = "yellow_tripdata" - -context.create_expectation_suite( - expectation_suite_name="test_suite", overwrite_existing=True +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "", "superconductive-public" ) - -validator = context.get_validator( - batch_request=batch_request, expectation_suite_name="test_suite" +datasource_yaml = datasource_yaml.replace( + "", "data/taxi_yellow_trip_data_samples/" ) -print(validator.head()) -# NOTE: The following code is only for testing and can be ignored by users. -assert isinstance(validator, ge.validator.validator.Validator) +# TODO: Uncomment once S3 testing in Azure Pipelines is re-enabled +# test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_inferred_data_connector_name": { + "class_name": "InferredAssetFilesystemDataConnector", + "bucket": "", + "prefix": "", + "default_regex": { + "group_names": ["data_asset_name", "year", "month"], + "pattern": "(.*)_(\d{4})-(\d{2})\.csv", + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "bucket" +] = "superconductive-public" +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "prefix" +] = "data/taxi_yellow_trip_data_samples/" + +# TODO: Uncomment once S3 testing in Azure Pipelines is re-enabled +# test_python = context.test_yaml_config( +# yaml.dump(datasource_config), return_mode="report_object" +# ) +# +# assert test_yaml == test_python +# +# context.add_datasource(**datasource_config) +# +# assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +# assert "yellow_tripdata" in set( +# context.get_available_data_asset_names()["taxi_datasource"][ +# "default_inferred_data_connector_name" +# ] +# ) From 0e589642317f9679eb49730a65704bfe618a260f Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Thu, 21 Oct 2021 12:56:56 -0400 Subject: [PATCH 26/62] Add S3 examples --- ...onfigure_a_configuredassetdataconnector.md | 29 ++++--- ...onfigure_a_configuredassetdataconnector.py | 78 +++++++++++++++++++ 2 files changed, 97 insertions(+), 10 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 16a1d9c7dfe8..956a9efb8a83 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -153,18 +153,27 @@ For more information on `batches` and `batch_identifiers`, please refer to the [ A corresponding configuration for `ConfiguredAssetS3DataConnector` would look similar but would require `bucket` and `prefix` values instead of `base_directory`. -```yaml -class_name: ConfiguredAssetS3DataConnector -bucket: MY_S3_BUCKET -prefix: MY_S3_BUCKET_PREFIX -default_regex: -assets: - alpha: - pattern: alpha-(.*)\.csv - group_names: - - index + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L99-L115 +``` + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L130-L149 ``` + + + The following examples will show scenarios that ConfiguredAssetDataConnectors can help you analyze, using `ConfiguredAssetFilesystemDataConnector`. **Note**: The examples will only show the configuration for `data_connectors` for simplicity. diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index 6a0006ffa897..dc686f01adc0 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -94,3 +94,81 @@ "default_configured_data_connector_name" ] ) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_inferred_data_connector_name: + class_name: InferredAssetS3DataConnector + bucket: + prefix: + default_regex: + group_names: + - month + pattern: yellow_tripdata_(.*)\.csv +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "", "superconductive-public" +) +datasource_yaml = datasource_yaml.replace( + "", "data/taxi_yellow_trip_data_samples/" +) + +# TODO: Uncomment once S3 testing in Azure Pipelines is re-enabled +# test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_inferred_data_connector_name": { + "class_name": "InferredAssetFilesystemDataConnector", + "bucket": "", + "prefix": "", + "default_regex": { + "group_names": ["month"], + "pattern": "yellow_tripdata_(.*)\.csv", + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "bucket" +] = "superconductive-public" +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "prefix" +] = "data/taxi_yellow_trip_data_samples/" + +# TODO: Uncomment once S3 testing in Azure Pipelines is re-enabled +# test_python = context.test_yaml_config( +# yaml.dump(datasource_config), return_mode="report_object" +# ) +# +# assert test_yaml == test_python +# +# context.add_datasource(**datasource_config) +# +# assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +# assert "yellow_tripdata" in set( +# context.get_available_data_asset_names()["taxi_datasource"][ +# "default_inferred_data_connector_name" +# ] +# ) \ No newline at end of file From 82cde752ca5b85f67d8249a3eefe8eb055282fef Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Thu, 21 Oct 2021 15:10:03 -0400 Subject: [PATCH 27/62] Linting --- .../how_to_configure_an_inferredassetdataconnector.md | 2 +- .../how_to_configure_a_configuredassetdataconnector.py | 6 ++---- .../how_to_configure_an_inferredassetdataconnector.py | 4 +--- 3 files changed, 4 insertions(+), 8 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 95db61abfd42..5d68715ce367 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -191,7 +191,7 @@ A corresponding configuration for `InferredAssetS3DataConnector` would look simi -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L180-L199 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L178-L197 ``` diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index dc686f01adc0..c773207553e4 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -116,9 +116,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. -datasource_yaml = datasource_yaml.replace( - "", "superconductive-public" -) +datasource_yaml = datasource_yaml.replace("", "superconductive-public") datasource_yaml = datasource_yaml.replace( "", "data/taxi_yellow_trip_data_samples/" ) @@ -171,4 +169,4 @@ # context.get_available_data_asset_names()["taxi_datasource"][ # "default_inferred_data_connector_name" # ] -# ) \ No newline at end of file +# ) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index 6f1450e8cdc9..cbab3daf15dd 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -166,9 +166,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. -datasource_yaml = datasource_yaml.replace( - "", "superconductive-public" -) +datasource_yaml = datasource_yaml.replace("", "superconductive-public") datasource_yaml = datasource_yaml.replace( "", "data/taxi_yellow_trip_data_samples/" ) From 3fe6ac46928a9442c21cbc60312590523f214251 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Thu, 21 Oct 2021 15:11:15 -0400 Subject: [PATCH 28/62] Misalignment of line numbers --- .../how_to_configure_a_configuredassetdataconnector.md | 2 +- .../how_to_configure_an_inferredassetdataconnector.py | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 956a9efb8a83..bf79ee80da12 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -168,7 +168,7 @@ A corresponding configuration for `ConfiguredAssetS3DataConnector` would look si -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L130-L149 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L128-L147 ``` diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index cbab3daf15dd..cda2582ce991 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -220,3 +220,4 @@ # "default_inferred_data_connector_name" # ] # ) + From 9e5a52acc6f79cd50f7958960668d0a47ddcbbc9 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Thu, 21 Oct 2021 16:04:36 -0400 Subject: [PATCH 29/62] Example 1 --- ...onfigure_a_configuredassetdataconnector.md | 95 +++++------ ...configure_an_inferredassetdataconnector.md | 60 ++++--- ...onfigure_a_configuredassetdataconnector.py | 149 +++++++++++++++++- ...configure_an_inferredassetdataconnector.py | 80 ++++++++++ 4 files changed, 307 insertions(+), 77 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index bf79ee80da12..6a8fb4d89a62 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -176,80 +176,85 @@ A corresponding configuration for `ConfiguredAssetS3DataConnector` would look si The following examples will show scenarios that ConfiguredAssetDataConnectors can help you analyze, using `ConfiguredAssetFilesystemDataConnector`. -**Note**: The examples will only show the configuration for `data_connectors` for simplicity. +### Example 1: Basic Configuration for a single DataAsset -Example 1: Basic Configuration for a single DataAsset ------------------------------------------------------ - -Continuing the example above, imagine you have the following files in the directory `my_directory/`: +Continuing the example above, imagine you have the following files in the directory ``: ``` -test/alpha-1.csv -test/alpha-2.csv -test/alpha-3.csv +/yellow_tripdata_2019-01.csv +/yellow_tripdata_2019-02.csv +/yellow_tripdata_2019-03.csv ``` Then this configuration... -```yaml -class_name: ConfiguredAssetFilesystemDataConnector -base_directory: test/ -default_regex: -assets: - alpha: - pattern: alpha-(.*)\.csv - group_names: - - index + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L175-L191 ``` -...will make available `alpha` as a single DataAsset with the following data_references: + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L202-L222 +``` + + + + +...will make available `yelow_tripdata` as a single DataAsset with the following data_references: ```bash Available data_asset_names (1 of 1): - alpha (3 of 3): [ - 'alpha-1.csv', - 'alpha-2.csv', - 'alpha-3.csv' - ] + yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] + +Unmatched data_references (0 of 0):[] ``` Once configured, you can get a `Validator` from the `Data Context` as follows: -```python -my_validator = context.get_validator( - datasource_name="my_data_source", - data_connector_name="my_filesystem_data_connector", - data_asset_name="alpha", - batch_identifiers={ - "index": "2" - }, - expectation_suite_name="my_expectation_suite" # the suite with this name must exist by the time of this call -) +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L238-L248 ``` But what if the regex does not match any files in the directory? Then this configuration... -```yaml -class_name: ConfiguredAssetFilesystemDataConnector -base_directory: test/ -default_regex: -assets: - alpha: - pattern: beta-(.*)\.csv - group_names: - - index + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L260-L276 ``` + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L287-L307 +``` + + + + ...will give you this output ```bash -Successfully instantiated ConfiguredAssetFilesystemDataConnector Available data_asset_names (1 of 1): - alpha (0 of 0): [] + yellow_tripdata (0 of 0): [] -Unmatched data_references (3 of 3): ['alpha-1.csv', 'alpha-2.csv', 'alpha-3.csv'] +Unmatched data_references (3 of 3):['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] ``` Notice that `alpha` has 0 data_references, and there are 3 `Unmatched data_references` listed. diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 5d68715ce367..a6a05a7b0e0e 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -197,56 +197,54 @@ A corresponding configuration for `InferredAssetS3DataConnector` would look simi -The following examples will show scenarios that InferredAssetDataConnectors can help you analyze, using `InferredAssetFilesystemDataConnector` as an example and only show the configuration under `data_connectors` for simplicity. +The following examples will show scenarios that InferredAssetDataConnectors can help you analyze, using `InferredAssetFilesystemDataConnector`. -Example 1: Basic configuration for a single DataAsset ------------------------------------------------------ +### Example 1: Basic configuration for a single DataAsset -Continuing the example above, imagine you have the following files in the directory `my_directory/`: +Continuing the example above, imagine you have the following files in the directory ``: ``` -my_directory/alpha-2020-01-01.csv -my_directory/alpha-2020-01-02.csv -my_directory/alpha-2020-01-03.csv +/yellow_tripdata_2019-01.csv +/yellow_tripdata_2019-02.csv +/yellow_tripdata_2019-03.csv ``` Then this configuration... -```yaml -class_name: InferredAssetFilesystemDataConnector -base_directory: my_directory/ -default_regex: - group_names: - - data_asset_name - - year - - month - - day - pattern: (.*)-(\d{4})-(\d{2})-(\d{2})\.csv + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L225-L242 ``` -...will make available the following data_references: + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L253-L271 +``` + + + + +...will make available `yelow_tripdata` as a single DataAsset with the following data_references: ```bash Available data_asset_names (1 of 1): - alpha (3 of 3): [ - 'alpha-2020-01-01.csv', - 'alpha-2020-01-02.csv', - 'alpha-2020-01-03.csv' - ] + yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] -Unmatched data_references (0 of 0): [] +Unmatched data_references (0 of 0):[] ``` Once configured, you can get `Validators` from the `Data Context` as follows: -```python -my_validator = my_context.get_validator( - execution_engine_name="my_execution_engine", - data_connector_name="my_data_connector", - data_asset_name="alpha", - create_expectation_suite_with_name="my_expectation_suite", -) +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L294-L303 ``` Example 2: Basic configuration with more than one DataAsset diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index c773207553e4..b7fa99a6b463 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -19,7 +19,7 @@ base_directory: my_directory/ assets: yellow_tripdata: - pattern: (.*)\.csv + pattern: yellow_tripdata_(.*)\.csv group_names: - month """ @@ -170,3 +170,150 @@ # "default_inferred_data_connector_name" # ] # ) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_configured_data_connector_name: + class_name: ConfiguredAssetFilesystemDataConnector + base_directory: my_directory/ + assets: + yellow_tripdata: + pattern: (.*)\.csv + group_names: + - month +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "my_directory/", "../data/single_directory_one_data_asset/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_configured_data_connector_name": { + "class_name": "ConfiguredAssetFilesystemDataConnector", + "base_directory": "my_directory/", + "assets": { + "yellow_tripdata": { + "pattern": "yellow_tripdata_(.*)\.csv", + "group_names": ["month"], + } + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_configured_data_connector_name"][ + "base_directory" +] = "../data/single_directory_one_data_asset/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +assert test_yaml == test_python + +context.add_datasource(**datasource_config) + +batch_request = BatchRequest( + datasource_name="taxi_datasource", + data_connector_name="default_configured_data_connector_name", + data_asset_name="yellow_tripdata", +) + +validator = context.get_validator( + batch_request=batch_request, + expectation_suite_name="test_suite", + batch_identifiers={"month": "2019-02"}, +) + +# NOTE: The following code is only for testing and can be ignored by users. +assert isinstance(validator, ge.validator.validator.Validator) +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_configured_data_connector_name: + class_name: ConfiguredAssetFilesystemDataConnector + base_directory: my_directory/ + assets: + yellow_tripdata: + pattern: green_tripdata_(.*)\.csv + group_names: + - month +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "my_directory/", "../data/single_directory_one_data_asset/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_configured_data_connector_name": { + "class_name": "ConfiguredAssetFilesystemDataConnector", + "base_directory": "my_directory/", + "assets": { + "yellow_tripdata": { + "pattern": "green_tripdata_(.*)\.csv", + "group_names": ["month"], + } + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_configured_data_connector_name"][ + "base_directory" +] = "../data/single_directory_one_data_asset/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +assert test_yaml == test_python \ No newline at end of file diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index cda2582ce991..533557ca3e02 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -221,3 +221,83 @@ # ] # ) +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_inferred_data_connector_name: + class_name: InferredAssetFilesystemDataConnector + base_directory: + default_regex: + group_names: + - data_asset_name + - year + - month + pattern: (.*)_(\d{4})-(\d{2})\.csv +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "", "../data/single_directory_one_data_asset/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_inferred_data_connector_name": { + "class_name": "InferredAssetFilesystemDataConnector", + "base_directory": "", + "default_regex": { + "group_names": ["data_asset_name", "year", "month"], + "pattern": "(.*)_(\d{4})-(\d{2})\.csv", + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "base_directory" +] = "../data/single_directory_one_data_asset/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +assert test_yaml == test_python + +context.add_datasource(**datasource_config) + +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) + +batch_request = BatchRequest( + datasource_name="taxi_datasource", + data_connector_name="default_inferred_data_connector_name", + data_asset_name="yellow_tripdata", +) + +validator = context.get_validator( + batch_request=batch_request, + create_expectation_suite_with_name="", +) From 174c0287a95e5184de15f98df4af77fd6ca746d5 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Fri, 22 Oct 2021 09:05:28 -0400 Subject: [PATCH 30/62] Linting --- .../how_to_configure_a_configuredassetdataconnector.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index b7fa99a6b463..f91c5e7ae0c3 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -316,4 +316,4 @@ yaml.dump(datasource_config), return_mode="report_object" ) -assert test_yaml == test_python \ No newline at end of file +assert test_yaml == test_python From 0586292c1aad8521726cff88b9d1019c6bcced84 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Fri, 22 Oct 2021 09:17:53 -0400 Subject: [PATCH 31/62] WIP example 2 --- ...onfigure_a_configuredassetdataconnector.md | 21 +++++++++---------- ...configure_an_inferredassetdataconnector.md | 17 +++++++-------- 2 files changed, 18 insertions(+), 20 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 6a8fb4d89a62..a844435dba32 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -257,25 +257,24 @@ Available data_asset_names (1 of 1): Unmatched data_references (3 of 3):['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] ``` -Notice that `alpha` has 0 data_references, and there are 3 `Unmatched data_references` listed. +Notice that `yellow_tripdata` has 0 data_references, and there are 3 `Unmatched data_references` listed. This would indicate that some part of the configuration is incorrect and would need to be reviewed. -In our case, changing `pattern` to : `alpha-(.*)\\.csv` will fix our problem and give the same output to above. +In our case, changing `pattern` to : `yellow_tripdata_(.*)\\.csv` will fix our problem and give the same output to above. -Example 2: Basic configuration with more than one DataAsset ------------------------------------------------------------ +### Example 2: Basic configuration with more than one DataAsset -Here’s a similar example, but this time two data_assets are mixed together in one folder. +Here’s a similar example, but this time two Data Assets are mixed together in one folder. **Note**: For an equivalent configuration using `InferredAssetFileSystemDataConnector`, please see Example 2 in [How to configure an InferredAssetDataConnector](./how_to_configure_an_inferredassetdataconnector). ``` -test_data/alpha-2020-01-01.csv -test_data/beta-2020-01-01.csv -test_data/alpha-2020-01-02.csv -test_data/beta-2020-01-02.csv -test_data/alpha-2020-01-03.csv -test_data/beta-2020-01-03.csv +/yellow_tripdata_2019-01.csv +/green_tripdata_2019-01.csv +/yellow_tripdata_2019-02.csv +/green_tripdata_2019-02.csv +/yellow_tripdata_2019-03.csv +/green_tripdata_2019-03.csv ``` Then this configuration... diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index a6a05a7b0e0e..9d25d24108ff 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -247,21 +247,20 @@ Once configured, you can get `Validators` from the `Data Context` as follows: ```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L294-L303 ``` -Example 2: Basic configuration with more than one DataAsset ------------------------------------------------------------ +### Example 2: Basic configuration with more than one DataAsset -Here’s a similar example, but this time two data_assets are mixed together in one folder. +Here’s a similar example, but this time two Data Assets are mixed together in one folder. **Note**: For an equivalent configuration using `ConfiguredAssetFilesSystemDataconnector`, please see Example 2 in [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_configuredassetdataconnector). ``` -test_data/alpha-2020-01-01.csv -test_data/beta-2020-01-01.csv -test_data/alpha-2020-01-02.csv -test_data/beta-2020-01-02.csv -test_data/alpha-2020-01-03.csv -test_data/beta-2020-01-03.csv +/yellow_tripdata_2019-01.csv +/green_tripdata_2019-01.csv +/yellow_tripdata_2019-02.csv +/green_tripdata_2019-02.csv +/yellow_tripdata_2019-03.csv +/green_tripdata_2019-03.csv ``` The same configuration as Example 1... From 8d0046a71bee01da4743a2fec8431e8e1372b602 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Fri, 22 Oct 2021 10:45:44 -0400 Subject: [PATCH 32/62] Working example 2 --- ...onfigure_a_configuredassetdataconnector.md | 57 ++++---- ...configure_an_inferredassetdataconnector.md | 49 ++++--- ...onfigure_a_configuredassetdataconnector.py | 122 +++++++++++++++--- ...configure_an_inferredassetdataconnector.py | 30 ++--- 4 files changed, 170 insertions(+), 88 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index a844435dba32..a82bcc0814ab 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -279,47 +279,38 @@ Here’s a similar example, but this time two Data Assets are mixed together in Then this configuration... -```yaml -class_name: ConfiguredAssetFilesystemDataConnector -base_directory: test_data/ -assets: - alpha: - group_names: - - name - - year - - month - - day - pattern: alpha-(\d{4})-(\d{2})-(\d{2})\.csv - beta: - group_names: - - name - - year - - month - - day - pattern: beta-(\d{4})-(\d{2})-(\d{2})\.csv + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L329-L351 ``` -...will now make `alpha` and `beta` both available a DataAssets, with the following data_references: + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L362-L386 +``` + + + + +...will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references: ```bash Available data_asset_names (2 of 2): - alpha (3 of 3): [ - 'alpha-2020-01-01.csv', - 'alpha-2020-01-02.csv', - 'alpha-2020-01-03.csv' - ] - - beta (3 of 3): [ - 'beta-2020-01-01.csv', - 'beta-2020-01-02.csv', - 'beta-2020-01-03.csv' - ] + green_tripdata (3 of 3): ['green_tripdata_2019-01.csv', 'green_tripdata_2019-02.csv', 'green_tripdata_2019-03.csv'] + yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] -Unmatched data_references (0 of 0): [] +Unmatched data_references (0 of 0):[] ``` -Example 3: Example with Nested Folders --------------------------------------------------- +### Example 3: Example with Nested Folders In the following example, files are placed folders that match the `data_asset_names` we want: `A`, `B`, `C`, and `D`. diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 9d25d24108ff..7ee458e4e7e9 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -265,40 +265,39 @@ in [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_config The same configuration as Example 1... -```yaml -class_name: InferredAssetFilesystemDataConnector -base_directory: test_data/ -default_regex: - group_names: - - data_asset_name - - year - - month - - day -pattern: (.*)-(\d{4})-(\d{2})-(\d{2})\.csv + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L225-L242 ``` -...will now make `alpha` and `beta` both available a DataAssets, with the following data_references: + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L253-L271 +``` + + + + +...will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references: ```bash Available data_asset_names (2 of 2): - alpha (3 of 3): [ - 'alpha-2020-01-01.csv', - 'alpha-2020-01-02.csv', - 'alpha-2020-01-03.csv' - ] - - beta (3 of 3): [ - 'beta-2020-01-01.csv', - 'beta-2020-01-02.csv', - 'beta-2020-01-03.csv' - ] + green_tripdata (3 of 3): ['green_tripdata_2019-01.csv', 'green_tripdata_2019-02.csv', 'green_tripdata_2019-03.csv'] + yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] -Unmatched data_references (0 of 0): [] +Unmatched data_references (0 of 0):[] ``` -Example 3: Nested directory structure with the data_asset_name on the inside ----------------------------------------------------------------------------- +### Example 3: Nested directory structure with the data_asset_name on the inside Here’s a similar example, with a nested directory structure... diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index f91c5e7ae0c3..98ac53d0b41f 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -16,7 +16,7 @@ data_connectors: default_configured_data_connector_name: class_name: ConfiguredAssetFilesystemDataConnector - base_directory: my_directory/ + base_directory: / assets: yellow_tripdata: pattern: yellow_tripdata_(.*)\.csv @@ -27,7 +27,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace( - "my_directory/", "../data/single_directory_one_data_asset/" + "/", "../data/single_directory_one_data_asset/" ) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -44,7 +44,7 @@ "data_connectors": { "default_configured_data_connector_name": { "class_name": "ConfiguredAssetFilesystemDataConnector", - "base_directory": "my_directory/", + "base_directory": "/", "assets": { "yellow_tripdata": { "pattern": "yellow_tripdata_(.*)\.csv", @@ -106,8 +106,8 @@ data_connectors: default_inferred_data_connector_name: class_name: InferredAssetS3DataConnector - bucket: - prefix: + bucket: / + prefix: / default_regex: group_names: - month @@ -116,9 +116,9 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. -datasource_yaml = datasource_yaml.replace("", "superconductive-public") +datasource_yaml = datasource_yaml.replace("/", "superconductive-public") datasource_yaml = datasource_yaml.replace( - "", "data/taxi_yellow_trip_data_samples/" + "/", "data/taxi_yellow_trip_data_samples/" ) # TODO: Uncomment once S3 testing in Azure Pipelines is re-enabled @@ -136,8 +136,8 @@ "data_connectors": { "default_inferred_data_connector_name": { "class_name": "InferredAssetFilesystemDataConnector", - "bucket": "", - "prefix": "", + "bucket": "/", + "prefix": "/", "default_regex": { "group_names": ["month"], "pattern": "yellow_tripdata_(.*)\.csv", @@ -182,7 +182,7 @@ data_connectors: default_configured_data_connector_name: class_name: ConfiguredAssetFilesystemDataConnector - base_directory: my_directory/ + base_directory: / assets: yellow_tripdata: pattern: (.*)\.csv @@ -193,7 +193,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace( - "my_directory/", "../data/single_directory_one_data_asset/" + "/", "../data/single_directory_one_data_asset/" ) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -210,7 +210,7 @@ "data_connectors": { "default_configured_data_connector_name": { "class_name": "ConfiguredAssetFilesystemDataConnector", - "base_directory": "my_directory/", + "base_directory": "/", "assets": { "yellow_tripdata": { "pattern": "yellow_tripdata_(.*)\.csv", @@ -267,7 +267,7 @@ data_connectors: default_configured_data_connector_name: class_name: ConfiguredAssetFilesystemDataConnector - base_directory: my_directory/ + base_directory: / assets: yellow_tripdata: pattern: green_tripdata_(.*)\.csv @@ -278,7 +278,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace( - "my_directory/", "../data/single_directory_one_data_asset/" + "/", "../data/single_directory_one_data_asset/" ) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -295,7 +295,7 @@ "data_connectors": { "default_configured_data_connector_name": { "class_name": "ConfiguredAssetFilesystemDataConnector", - "base_directory": "my_directory/", + "base_directory": "/", "assets": { "yellow_tripdata": { "pattern": "green_tripdata_(.*)\.csv", @@ -316,4 +316,96 @@ yaml.dump(datasource_config), return_mode="report_object" ) +# NOTE: The following code is only for testing and can be ignored by users. assert test_yaml == test_python +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_configured_data_connector_name: + class_name: ConfiguredAssetFilesystemDataConnector + base_directory: / + assets: + yellow_tripdata: + pattern: yellow_tripdata_(\d{4})-(\d{2})\.csv + group_names: + - year + - month + green_tripdata: + pattern: green_tripdata_(\d{4})-(\d{2})\.csv + group_names: + - year + - month +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "/", "../data/single_directory_two_data_assets/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_configured_data_connector_name": { + "class_name": "ConfiguredAssetFilesystemDataConnector", + "base_directory": "/", + "assets": { + "yellow_tripdata": { + "pattern": "yellow_tripdata_(\d{4})-(\d{2})\.csv", + "group_names": ["year", "month"], + }, + "green_tripdata": { + "pattern": "green_tripdata_(\d{4})-(\d{2})\.csv", + "group_names": ["year", "month"], + }, + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_configured_data_connector_name"][ + "base_directory" +] = "../data/single_directory_two_data_assets/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +# NOTE: The following code is only for testing and can be ignored by users. +# TODO: Uncomment the line below once ISSUE #3589 (https://github.com/great-expectations/great_expectations/issues/3589) is resolved +# assert test_yaml == test_python +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) +assert "green_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index 533557ca3e02..5eabf7116b34 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -16,7 +16,7 @@ data_connectors: default_inferred_data_connector_name: class_name: InferredAssetFilesystemDataConnector - base_directory: + base_directory: / default_regex: group_names: - data_asset_name @@ -26,7 +26,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace( - "", "../data/single_directory_one_data_asset/" + "/", "../data/single_directory_one_data_asset/" ) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -43,7 +43,7 @@ "data_connectors": { "default_inferred_data_connector_name": { "class_name": "InferredAssetFilesystemDataConnector", - "base_directory": "", + "base_directory": "/", "default_regex": { "group_names": ["data_asset_name"], "pattern": "(.*)\.csv", @@ -84,7 +84,7 @@ data_connectors: default_inferred_data_connector_name: class_name: InferredAssetFilesystemDataConnector - base_directory: + base_directory: / default_regex: group_names: - data_asset_name @@ -96,7 +96,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace( - "", "../data/single_directory_one_data_asset/" + "/", "../data/single_directory_one_data_asset/" ) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -113,7 +113,7 @@ "data_connectors": { "default_inferred_data_connector_name": { "class_name": "InferredAssetFilesystemDataConnector", - "base_directory": "", + "base_directory": "/", "default_regex": { "group_names": ["data_asset_name", "year", "month"], "pattern": "(.*)_(\d{4})-(\d{2})\.csv", @@ -154,8 +154,8 @@ data_connectors: default_inferred_data_connector_name: class_name: InferredAssetS3DataConnector - bucket: - prefix: + bucket: / + prefix: / default_regex: group_names: - data_asset_name @@ -166,9 +166,9 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. -datasource_yaml = datasource_yaml.replace("", "superconductive-public") +datasource_yaml = datasource_yaml.replace("/", "superconductive-public") datasource_yaml = datasource_yaml.replace( - "", "data/taxi_yellow_trip_data_samples/" + "/", "data/taxi_yellow_trip_data_samples/" ) # TODO: Uncomment once S3 testing in Azure Pipelines is re-enabled @@ -186,8 +186,8 @@ "data_connectors": { "default_inferred_data_connector_name": { "class_name": "InferredAssetFilesystemDataConnector", - "bucket": "", - "prefix": "", + "bucket": "/", + "prefix": "/", "default_regex": { "group_names": ["data_asset_name", "year", "month"], "pattern": "(.*)_(\d{4})-(\d{2})\.csv", @@ -232,7 +232,7 @@ data_connectors: default_inferred_data_connector_name: class_name: InferredAssetFilesystemDataConnector - base_directory: + base_directory: / default_regex: group_names: - data_asset_name @@ -244,7 +244,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace( - "", "../data/single_directory_one_data_asset/" + "/", "../data/single_directory_one_data_asset/" ) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -261,7 +261,7 @@ "data_connectors": { "default_inferred_data_connector_name": { "class_name": "InferredAssetFilesystemDataConnector", - "base_directory": "", + "base_directory": "/", "default_regex": { "group_names": ["data_asset_name", "year", "month"], "pattern": "(.*)_(\d{4})-(\d{2})\.csv", From ea5eb838f77f4a2896cc3bef60d17631f4f9ee49 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Fri, 22 Oct 2021 11:39:36 -0400 Subject: [PATCH 33/62] WIP example 3 --- ...onfigure_a_configuredassetdataconnector.md | 2 +- ...configure_an_inferredassetdataconnector.md | 68 +++++++++------- ...configure_an_inferredassetdataconnector.py | 77 +++++++++++++++++++ .../10/green_tripdata.csv} | 0 .../10/yellow_tripdata.csv} | 0 .../11/green_tripdata.csv} | 0 .../11/yellow_tripdata.csv} | 0 .../12/green_tripdata.csv} | 0 .../12/yellow_tripdata.csv} | 0 .../01/green_tripdata.csv} | 0 .../01/yellow_tripdata.csv} | 0 .../02/green_tripdata.csv} | 0 .../02/yellow_tripdata.csv} | 0 .../03/green_tripdata.csv} | 0 .../03/yellow_tripdata.csv} | 0 15 files changed, 119 insertions(+), 28 deletions(-) rename tests/test_sets/dataconnector_docs/nested_directories/{green/2018/tripdata-10.csv => 2018/10/green_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{yellow/2018/10/tripdata.csv => 2018/10/yellow_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{green/2018/tripdata-11.csv => 2018/11/green_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{yellow/2018/11/tripdata.csv => 2018/11/yellow_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{green/2018/tripdata-12.csv => 2018/12/green_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{yellow/2018/12/tripdata.csv => 2018/12/yellow_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{green/2019/tripdata-01.csv => 2019/01/green_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{yellow/2019/01/tripdata.csv => 2019/01/yellow_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{green/2019/tripdata-02.csv => 2019/02/green_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{yellow/2019/02/tripdata.csv => 2019/02/yellow_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{green/2019/tripdata-03.csv => 2019/03/green_tripdata.csv} (100%) rename tests/test_sets/dataconnector_docs/nested_directories/{yellow/2019/03/tripdata.csv => 2019/03/yellow_tripdata.csv} (100%) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index a82bcc0814ab..a009f420bfd8 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -307,7 +307,7 @@ Available data_asset_names (2 of 2): green_tripdata (3 of 3): ['green_tripdata_2019-01.csv', 'green_tripdata_2019-02.csv', 'green_tripdata_2019-03.csv'] yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] -Unmatched data_references (0 of 0):[] +Unmatched data_references (0 of 0): [] ``` ### Example 3: Example with Nested Folders diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 7ee458e4e7e9..aac05b0762d4 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -293,7 +293,7 @@ Available data_asset_names (2 of 2): green_tripdata (3 of 3): ['green_tripdata_2019-01.csv', 'green_tripdata_2019-02.csv', 'green_tripdata_2019-03.csv'] yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] -Unmatched data_references (0 of 0):[] +Unmatched data_references (0 of 0): [] ``` @@ -302,48 +302,62 @@ Unmatched data_references (0 of 0):[] Here’s a similar example, with a nested directory structure... ``` -2020/01/01/alpha.csv -2020/01/02/alpha.csv -2020/01/03/alpha.csv -2020/01/04/alpha.csv -2020/01/04/beta.csv -2020/01/05/alpha.csv -2020/01/05/beta.csv +2018/10/yellow_tripdata.csv +2018/10/green_tripdata.csv +2018/11/yellow_tripdata.csv +2018/11/green_tripdata.csv +2018/12/yellow_tripdata.csv +2018/12/green_tripdata.csv +2019/01/yellow_tripdata.csv +2019/01/green_tripdata.csv +2019/02/yellow_tripdata.csv +2019/02/green_tripdata.csv +2019/03/yellow_tripdata.csv +2019/03/green_tripdata.csv ``` Then this configuration... -```yaml -class_name: InferredAssetFilesystemDataConnector -base_directory: my_directory/ -default_regex: - group_names: - - year - - month - - day - - data_asset_name - pattern: (\d{4})/(\d{2})/(\d{2})/(.*)\.csv + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L306-L323 +``` + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L334-L352 ``` -...will now make `alpha` and `beta` both available a DataAssets, with the following data_references: + + + +...will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references: ```bash Available data_asset_names (2 of 2): - alpha (3 of 5): [ - 'alpha-2020-01-01.csv', - 'alpha-2020-01-02.csv', - 'alpha-2020-01-03.csv' + yellow_tripdata (3 of 6): [ + 'yellow_tripdata-2018-10.csv', + 'yellow_tripdata-2018-11.csv', + 'yellow_tripdata-2018-12.csv' ] - beta (2 of 2): [ - 'beta-2020-01-04.csv', - 'beta-2020-01-05.csv', + green_tripdata (3 of 6): [ + 'green_tripdata-2018-10.csv', + 'green_tripdata-2018-11.csv', + 'green_tripdata-2018-12.csv' ] Unmatched data_references (0 of 0): [] ``` - Example 4: Nested directory structure with the data_asset_name on the outside ----------------------------------------------------------------------------- diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index 5eabf7116b34..5b5f7246d76b 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -301,3 +301,80 @@ batch_request=batch_request, create_expectation_suite_with_name="", ) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_inferred_data_connector_name: + class_name: InferredAssetFilesystemDataConnector + base_directory: / + default_regex: + group_names: + - year + - month + - data_asset_name + pattern: (\d{4})/(\d{2})/(.*)\.csv +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "/", "../data/nested_directories/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_inferred_data_connector_name": { + "class_name": "InferredAssetFilesystemDataConnector", + "base_directory": "/", + "default_regex": { + "group_names": ["year", "month", "data_asset_name"], + "pattern": "(\d{4})/(\d{2})/(.*)\.csv", + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "base_directory" +] = "../data/nested_directories/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +# NOTE: The following code is only for testing and can be ignored by users. +assert test_yaml == test_python + +context.add_datasource(**datasource_config) + +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +# TODO: Uncomment after resolution of ISSUE # () +# assert "yellow_tripdata" in set( +# context.get_available_data_asset_names()["taxi_datasource"][ +# "default_inferred_data_connector_name" +# ] +# ) +# assert "green_tripdata" in set( +# context.get_available_data_asset_names()["taxi_datasource"][ +# "default_inferred_data_connector_name" +# ] +# ) diff --git a/tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-10.csv b/tests/test_sets/dataconnector_docs/nested_directories/2018/10/green_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-10.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2018/10/green_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/10/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/2018/10/yellow_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/10/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2018/10/yellow_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-11.csv b/tests/test_sets/dataconnector_docs/nested_directories/2018/11/green_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-11.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2018/11/green_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/11/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/2018/11/yellow_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/11/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2018/11/yellow_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-12.csv b/tests/test_sets/dataconnector_docs/nested_directories/2018/12/green_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/green/2018/tripdata-12.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2018/12/green_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/12/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/2018/12/yellow_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/yellow/2018/12/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2018/12/yellow_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-01.csv b/tests/test_sets/dataconnector_docs/nested_directories/2019/01/green_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-01.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2019/01/green_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/01/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/2019/01/yellow_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/01/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2019/01/yellow_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-02.csv b/tests/test_sets/dataconnector_docs/nested_directories/2019/02/green_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-02.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2019/02/green_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/02/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/2019/02/yellow_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/02/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2019/02/yellow_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-03.csv b/tests/test_sets/dataconnector_docs/nested_directories/2019/03/green_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/green/2019/tripdata-03.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2019/03/green_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/03/tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories/2019/03/yellow_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/yellow/2019/03/tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories/2019/03/yellow_tripdata.csv From 80d26b82e4e91876e053ad14a8f7b4d2c58e9d95 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Mon, 25 Oct 2021 09:49:06 -0400 Subject: [PATCH 34/62] Working example 3 --- ...onfigure_a_configuredassetdataconnector.md | 95 +++++++------------ ...configure_an_inferredassetdataconnector.md | 15 +-- ...onfigure_a_configuredassetdataconnector.py | 88 +++++++++++++++++ ...configure_an_inferredassetdataconnector.py | 28 +++--- .../green_tripdata/2019-01.csv} | 0 .../green_tripdata/2019-02.csv} | 0 .../green_tripdata/2019-03.csv} | 0 .../yellow_tripdata_2019-01.csv} | 0 .../yellow_tripdata_2019-02.csv} | 0 .../yellow_tripdata_2019-03.csv} | 0 .../2018/10/green_tripdata.csv | 0 .../2018/10/yellow_tripdata.csv | 0 .../2018/11/green_tripdata.csv | 0 .../2018/11/yellow_tripdata.csv | 0 .../2018/12/green_tripdata.csv | 0 .../2018/12/yellow_tripdata.csv | 0 .../2019/01/green_tripdata.csv | 21 ++++ .../2019/01/yellow_tripdata.csv | 21 ++++ .../2019/02/green_tripdata.csv | 21 ++++ .../2019/02/yellow_tripdata.csv | 21 ++++ .../2019/03/green_tripdata.csv | 21 ++++ .../2019/03/yellow_tripdata.csv | 21 ++++ 22 files changed, 264 insertions(+), 88 deletions(-) rename tests/test_sets/dataconnector_docs/{nested_directories/2019/01/green_tripdata.csv => nested_directories_data_asset/green_tripdata/2019-01.csv} (100%) rename tests/test_sets/dataconnector_docs/{nested_directories/2019/02/green_tripdata.csv => nested_directories_data_asset/green_tripdata/2019-02.csv} (100%) rename tests/test_sets/dataconnector_docs/{nested_directories/2019/03/green_tripdata.csv => nested_directories_data_asset/green_tripdata/2019-03.csv} (100%) rename tests/test_sets/dataconnector_docs/{nested_directories/2019/01/yellow_tripdata.csv => nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-01.csv} (100%) rename tests/test_sets/dataconnector_docs/{nested_directories/2019/02/yellow_tripdata.csv => nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-02.csv} (100%) rename tests/test_sets/dataconnector_docs/{nested_directories/2019/03/yellow_tripdata.csv => nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-03.csv} (100%) rename tests/test_sets/dataconnector_docs/{nested_directories => nested_directories_time}/2018/10/green_tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{nested_directories => nested_directories_time}/2018/10/yellow_tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{nested_directories => nested_directories_time}/2018/11/green_tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{nested_directories => nested_directories_time}/2018/11/yellow_tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{nested_directories => nested_directories_time}/2018/12/green_tripdata.csv (100%) rename tests/test_sets/dataconnector_docs/{nested_directories => nested_directories_time}/2018/12/yellow_tripdata.csv (100%) create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index a009f420bfd8..f8c006d2bb0f 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -312,77 +312,46 @@ Unmatched data_references (0 of 0): [] ### Example 3: Example with Nested Folders -In the following example, files are placed folders that match the `data_asset_names` we want: `A`, `B`, `C`, and `D`. +In the following example, files are placed folders that match the `data_asset_names` we want (`yellow_tripdata` and `green_tripdata`), but the filenames follow different formats. ``` -test_dir/A/A-1.csv -test_dir/A/A-2.csv -test_dir/A/A-3.csv -test_dir/B/B-1.txt -test_dir/B/B-2.txt -test_dir/B/B-3.txt -test_dir/C/C-2017.csv -test_dir/C/C-2018.csv -test_dir/C/C-2019.csv -test_dir/D/D-aaa.csv -test_dir/D/D-bbb.csv -test_dir/D/D-ccc.csv -test_dir/D/D-ddd.csv -test_dir/D/D-eee.csv +yellow_tripdata/yellow_tripdata_2019-01.csv +yellow_tripdata/yellow_tripdata_2019-02.csv +yellow_tripdata/yellow_tripdata_2019-03.csv +green_tripdata/2019-01.csv +green_tripdata/2019-02.csv +green_tripdata/2019-03.csv ``` -```yaml -module_name: great_expectations.datasource.data_connector -class_name: ConfiguredAssetFilesystemDataConnector -base_directory: test_dir/ -assets: - A: - base_directory: A/ - B: - base_directory: B/ - pattern: (.*)-(.*)\.txt - group_names: - - part_1 - - part_2 - C: - glob_directive: "*" - base_directory: C/ - D: - glob_directive: "*" - base_directory: D/ -default_regex: - pattern: (.*)-(.*)\.csv - group_names: - - part_1 - - part_2 + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L414-L438 +``` + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L449-L475 ``` -...will now make `A`, `B`, `C` and `D` available a DataAssets, with the following data_references: + + + +...will now make `yellow_tripdata` and `green_tripdata` available a DataAssets, with the following data_references: ```bash -Available data_asset_names (4 of 4): - A (3 of 3): [ - 'A-1.csv', - 'A-2.csv', - 'A-3.csv', - ] - B (3 of 3): [ - 'B-1', - 'B-2', - 'B-3', - ] - C (3 of 3): [ - 'C-2017', - 'C-2018', - 'C-2019', - ] - D (5 of 5): [ - 'D-aaa.csv', - 'D-bbb.csv', - 'D-ccc.csv', - 'D-ddd.csv', - 'D-eee.csv', - ] +Available data_asset_names (2 of 2): + green_tripdata (3 of 3): ['2019-01.csv', '2019-02.csv', '2019-03.csv'] + yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] + +Unmatched data_references (0 of 0):[] ``` Example 4: Example with Explicit data_asset_names and more complex nesting diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index aac05b0762d4..dfb620d93692 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -343,19 +343,10 @@ Then this configuration... ```bash Available data_asset_names (2 of 2): - yellow_tripdata (3 of 6): [ - 'yellow_tripdata-2018-10.csv', - 'yellow_tripdata-2018-11.csv', - 'yellow_tripdata-2018-12.csv' - ] - - green_tripdata (3 of 6): [ - 'green_tripdata-2018-10.csv', - 'green_tripdata-2018-11.csv', - 'green_tripdata-2018-12.csv' - ] + green_tripdata (3 of 6): ['2018/10/green_tripdata.csv', '2018/11/green_tripdata.csv', '2018/12/green_tripdata.csv'] + yellow_tripdata (3 of 6): ['2018/10/yellow_tripdata.csv', '2018/11/yellow_tripdata.csv', '2018/12/yellow_tripdata.csv'] -Unmatched data_references (0 of 0): [] +Unmatched data_references (0 of 0):[] ``` Example 4: Nested directory structure with the data_asset_name on the outside diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index 98ac53d0b41f..6febe2c6b186 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -409,3 +409,91 @@ "default_configured_data_connector_name" ] ) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_configured_data_connector_name: + class_name: ConfiguredAssetFilesystemDataConnector + base_directory: / + assets: + yellow_tripdata: + base_directory: yellow_tripdata/ + pattern: yellow_tripdata_(\d{4})-(\d{2})\.csv + group_names: + - year + - month + green_tripdata: + base_directory: green_tripdata/ + pattern: (\d{4})-(\d{2})\.csv + group_names: + - year + - month +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "/", "../data/nested_directories_data_asset/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_configured_data_connector_name": { + "class_name": "ConfiguredAssetFilesystemDataConnector", + "base_directory": "/", + "assets": { + "yellow_tripdata": { + "base_directory": "yellow_tripdata/", + "pattern": "yellow_tripdata_(\d{4})-(\d{2})\.csv", + "group_names": ["year", "month"], + }, + "green_tripdata": { + "base_directory": "green_tripdata/", + "pattern": "(\d{4})-(\d{2})\.csv", + "group_names": ["year", "month"], + }, + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_configured_data_connector_name"][ + "base_directory" +] = "../data/nested_directories_data_asset/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +# NOTE: The following code is only for testing and can be ignored by users. +assert test_yaml == test_python +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) +assert "green_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index 5b5f7246d76b..0cca1bbaa35a 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -314,6 +314,7 @@ default_inferred_data_connector_name: class_name: InferredAssetFilesystemDataConnector base_directory: / + glob_directive: "*/*/*.csv" default_regex: group_names: - year @@ -325,7 +326,7 @@ # Please note this override is only to provide good UX for docs and tests. # In normal usage you'd set your path directly in the yaml above. datasource_yaml = datasource_yaml.replace( - "/", "../data/nested_directories/" + "/", "../data/nested_directories_time/" ) test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") @@ -343,6 +344,7 @@ "default_inferred_data_connector_name": { "class_name": "InferredAssetFilesystemDataConnector", "base_directory": "/", + "glob_directive": "*/*/*.csv", "default_regex": { "group_names": ["year", "month", "data_asset_name"], "pattern": "(\d{4})/(\d{2})/(.*)\.csv", @@ -355,7 +357,7 @@ # In normal usage you'd set your path directly in the code above. datasource_config["data_connectors"]["default_inferred_data_connector_name"][ "base_directory" -] = "../data/nested_directories/" +] = "../data/nested_directories_time/" test_python = context.test_yaml_config( yaml.dump(datasource_config), return_mode="report_object" @@ -367,14 +369,14 @@ context.add_datasource(**datasource_config) assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] -# TODO: Uncomment after resolution of ISSUE # () -# assert "yellow_tripdata" in set( -# context.get_available_data_asset_names()["taxi_datasource"][ -# "default_inferred_data_connector_name" -# ] -# ) -# assert "green_tripdata" in set( -# context.get_available_data_asset_names()["taxi_datasource"][ -# "default_inferred_data_connector_name" -# ] -# ) +# TODO: Uncomment the lines below once ISSUE # () is resolved +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) +assert "green_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2019/01/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-01.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2019/01/green_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-01.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2019/02/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-02.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2019/02/green_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-02.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2019/03/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-03.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2019/03/green_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-03.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2019/01/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-01.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2019/01/yellow_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-01.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2019/02/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-02.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2019/02/yellow_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-02.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2019/03/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-03.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2019/03/yellow_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-03.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2018/10/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/green_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2018/10/green_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/green_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2018/10/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/yellow_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2018/10/yellow_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/yellow_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2018/11/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/green_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2018/11/green_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/green_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2018/11/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/yellow_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2018/11/yellow_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/yellow_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2018/12/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/green_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2018/12/green_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/green_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories/2018/12/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/yellow_tripdata.csv similarity index 100% rename from tests/test_sets/dataconnector_docs/nested_directories/2018/12/yellow_tripdata.csv rename to tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/yellow_tripdata.csv diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv new file mode 100644 index 000000000000..07a92dc26d64 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +198236,2,2019-01-11 01:12:46,2019-01-11 01:20:37,N,1,129,129,1,1.39,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, +33286,1,2019-01-02 23:41:23,2019-01-02 23:49:14,N,1,255,112,1,1.5,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, +517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1, +368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1, +155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1, +366694,2,2019-01-18 22:58:57,2019-01-18 23:15:37,N,1,75,185,1,8.71,24.5,0.5,0.5,0.0,0.0,,0.3,25.8,1,1, +474972,2,2019-01-24 17:59:00,2019-01-24 18:35:53,N,5,195,90,1,6.46,23.45,0.0,0.5,0.0,0.0,,0.0,23.95,1,2, +69932,1,2019-01-04 18:07:02,2019-01-04 18:34:37,N,1,97,35,2,4.7,20.0,1.0,0.5,0.0,0.0,,0.3,21.8,1,1, +244782,2,2019-01-13 02:16:18,2019-01-13 02:37:30,N,1,260,198,1,3.9,16.5,0.5,0.5,0.0,0.0,,0.3,17.8,2,1, +482363,2,2019-01-25 04:42:54,2019-01-25 04:47:06,N,1,82,56,1,0.74,5.0,0.5,0.5,0.0,0.0,,0.3,6.3,2,1, +573895,2,2019-01-29 11:36:48,2019-01-29 11:43:03,N,1,43,236,1,0.91,6.0,0.0,0.5,2.0,0.0,,0.3,8.8,1,1,0.0 +182986,2,2019-01-10 12:07:54,2019-01-10 12:19:31,N,1,29,155,1,5.49,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, +526993,1,2019-01-26 22:13:18,2019-01-26 22:13:51,N,1,33,33,1,0.1,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1, +490940,2,2019-01-25 13:09:50,2019-01-25 13:15:30,N,1,220,153,1,0.35,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,1,1, +145635,2,2019-01-08 16:01:58,2019-01-08 16:10:16,N,1,77,72,1,1.48,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,1,1, +242083,1,2019-01-12 23:43:28,2019-01-13 00:02:13,N,1,210,216,1,12.3,34.5,0.5,0.5,0.0,0.0,,0.3,35.8,2,1, +328924,2,2019-01-17 10:07:09,2019-01-17 10:25:59,N,1,32,242,1,4.71,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, +132260,1,2019-01-07 21:00:39,2019-01-07 21:18:13,N,1,52,188,1,3.6,14.5,0.5,0.5,0.0,0.0,,0.3,15.8,2,1, +568032,2,2019-01-29 07:01:03,2019-01-29 07:54:28,N,1,225,251,1,16.91,55.0,0.0,0.5,0.0,11.52,,0.3,67.32,1,1,0.0 +92205,2,2019-01-05 20:26:49,2019-01-05 20:44:48,N,1,29,108,1,3.93,15.5,0.5,0.5,0.0,0.0,,0.3,16.8,1,1, diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv new file mode 100644 index 000000000000..288e8ac8a023 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +4565,2,2019-01-15 02:04:57,2019-01-15 02:17:53,1,4.92,1,N,255,226,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36, +714,2,2019-01-08 00:38:49,2019-01-08 00:52:30,1,7.07,1,N,132,197,1,21.0,0.5,0.5,4.46,0.0,0.3,26.76, +2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0 +5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95, +4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56, +1505,2,2019-01-29 17:58:08,2019-01-29 18:11:19,1,0.72,1,N,13,13,2,9.0,1.0,0.5,0.0,0.0,0.3,10.8,0.0 +8131,1,2019-01-13 16:22:03,2019-01-13 16:33:06,2,1.4,1,N,161,170,1,8.5,0.0,0.5,3.0,0.0,0.3,12.3, +9601,2,2019-01-23 12:07:55,2019-01-23 12:11:19,5,0.57,1,N,239,239,1,4.0,0.0,0.5,1.0,0.0,0.3,5.8,0.0 +4522,2,2019-01-25 15:42:39,2019-01-25 15:50:54,1,1.26,1,N,170,229,1,7.5,0.0,0.5,1.0,0.0,0.3,9.3,0.0 +9008,1,2019-01-13 19:40:27,2019-01-13 19:50:54,3,2.1,1,N,239,263,1,9.5,0.0,0.5,2.05,0.0,0.3,12.35, +7012,2,2019-01-19 20:44:35,2019-01-19 21:05:13,1,3.16,1,N,142,107,1,14.5,0.5,0.5,3.16,0.0,0.3,18.96, +7316,2,2019-01-07 22:38:10,2019-01-07 22:44:31,2,2.4,1,N,233,263,1,8.5,0.5,0.5,1.47,0.0,0.3,11.27, +6384,2,2019-01-25 07:35:59,2019-01-25 07:57:44,1,5.12,1,N,143,231,1,18.5,0.0,0.5,2.5,0.0,0.3,21.8,0.0 +3652,1,2019-01-26 20:43:57,2019-01-26 21:03:51,1,4.4,1,N,13,230,2,17.0,0.5,0.5,0.0,0.0,0.3,18.3,0.0 +6409,2,2019-01-06 22:27:16,2019-01-06 22:33:58,1,1.52,1,N,161,234,2,7.0,0.5,0.5,0.0,0.0,0.3,8.3, +1130,2,2019-01-15 20:05:59,2019-01-15 20:22:11,1,3.96,1,N,233,41,2,14.5,0.5,0.5,0.0,0.0,0.3,15.8, +5618,2,2019-01-14 14:47:18,2019-01-14 14:54:44,1,0.5,1,N,140,237,1,6.0,0.0,0.5,1.0,0.0,0.3,7.8, +8515,1,2019-01-16 16:26:17,2019-01-16 16:41:36,2,1.4,1,N,233,163,1,10.5,1.0,0.5,2.45,0.0,0.3,14.75, +5397,1,2019-01-23 08:31:58,2019-01-23 08:55:43,1,2.6,1,N,246,163,1,16.0,0.0,0.5,5.0,0.0,0.3,21.8,0.0 +4648,1,2019-01-30 19:59:30,2019-01-30 20:10:34,1,1.9,1,N,186,79,1,9.0,0.5,0.5,2.05,0.0,0.3,12.35,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv new file mode 100644 index 000000000000..9a6442e61899 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +504629,2,2019-02-25 16:05:49,2019-02-25 16:11:15,N,1,179,179,1,0.89,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,2,1,0.0 +475628,1,2019-02-23 23:18:58,2019-02-23 23:31:00,N,1,210,55,1,4.2,15.0,0.5,0.5,0.0,0.0,,0.3,16.3,2,1,0.0 +150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75 +245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0 +151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0 +18916,2,2019-02-01 19:54:28,2019-02-01 19:58:41,N,1,74,262,1,1.04,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,1,1,0.0 +110154,2,2019-02-06 08:33:30,2019-02-06 08:48:52,N,1,66,87,1,2.14,12.0,0.0,0.5,4.66,0.0,,0.3,20.21,1,1,2.75 +335114,2,2019-02-16 18:40:24,2019-02-16 18:46:47,N,1,74,75,1,0.89,6.0,0.0,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 +453776,2,2019-02-22 22:38:18,2019-02-22 22:49:20,N,1,116,41,1,1.86,10.0,0.5,0.5,2.26,0.0,,0.3,13.56,1,1,0.0 +192685,2,2019-02-09 19:56:13,2019-02-09 20:13:56,N,1,167,235,1,2.59,13.0,0.0,0.5,0.0,0.0,,0.3,13.8,1,1,0.0 +574982,2,2019-02-28 22:42:50,2019-02-28 22:47:40,N,1,42,116,1,0.91,5.5,0.5,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 +155855,2,2019-02-08 09:02:21,2019-02-08 10:17:23,N,5,210,161,1,20.74,59.86,0.0,0.5,0.0,0.0,,0.0,60.36,1,2,0.0 +130109,2,2019-02-07 05:18:25,2019-02-07 05:24:30,N,1,62,188,4,1.54,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,1,1,0.0 +157145,2,2019-02-08 10:20:47,2019-02-08 10:47:36,N,1,74,31,2,7.73,25.5,0.0,0.5,5.26,0.0,,0.3,31.56,1,1,0.0 +411748,2,2019-02-20 22:31:32,2019-02-20 22:37:11,N,1,129,129,4,0.9,5.5,0.5,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 +470538,2,2019-02-23 18:15:38,2019-02-23 18:27:05,N,1,126,213,1,2.21,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,1,1,0.0 +262705,2,2019-02-13 13:07:02,2019-02-13 13:19:50,N,1,186,249,1,1.5,9.5,0.0,0.5,2.56,0.0,,0.3,15.36,1,1,2.5 +163160,2,2019-02-08 15:39:26,2019-02-08 16:06:31,N,1,33,17,1,2.9,17.5,0.0,0.5,0.0,0.0,,0.3,18.3,2,1,0.0 +263174,2,2019-02-13 15:06:57,2019-02-13 15:28:26,N,1,165,123,5,1.66,13.5,0.0,0.5,0.0,0.0,,0.3,14.3,1,1,0.0 +451270,2,2019-02-22 21:00:04,2019-02-22 21:00:24,N,1,82,82,1,0.06,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv new file mode 100644 index 000000000000..573017273621 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +9873,2,2019-02-14 14:06:26,2019-02-14 14:24:36,2,1.24,1,N,68,161,2,11.5,0.0,0.5,0.0,0.0,0.3,14.8,2.5 +9976,1,2019-02-08 14:20:27,2019-02-08 14:25:30,0,0.7,1,N,186,249,2,5.0,2.5,0.5,0.0,0.0,0.3,8.3,2.5 +4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5 +1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5 +6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5 +1633,2,2019-02-09 12:14:12,2019-02-10 11:50:00,3,3.02,1,N,48,236,1,12.5,0.0,0.5,1.58,0.0,0.3,17.38,2.5 +4450,1,2019-02-05 08:56:21,2019-02-05 09:02:06,1,0.8,1,N,262,75,1,5.5,2.5,0.5,1.7,0.0,0.3,10.5,2.5 +8929,2,2019-02-13 11:44:00,2019-02-13 11:57:32,1,1.13,1,N,100,170,2,9.0,0.0,0.5,0.0,0.0,0.3,12.3,2.5 +6968,2,2019-02-27 20:52:49,2019-02-27 21:18:37,1,4.64,1,N,246,125,1,18.5,0.5,0.5,4.46,0.0,0.3,26.76,2.5 +7969,2,2019-02-01 11:12:35,2019-02-01 11:19:39,1,0.54,1,N,236,263,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8,0.0 +7695,1,2019-02-28 00:51:50,2019-02-28 00:55:37,1,1.3,1,N,263,74,2,6.0,0.5,0.5,0.0,0.0,0.3,7.3,0.0 +9904,1,2019-02-16 17:45:00,2019-02-16 17:51:09,1,1.3,1,N,170,162,1,6.5,2.5,0.5,1.95,0.0,0.3,11.75,2.5 +1580,2,2019-02-07 10:14:39,2019-02-07 10:51:38,1,17.49,2,N,186,132,2,52.0,0.0,0.5,0.0,5.76,0.3,61.06,2.5 +699,2,2019-02-18 11:01:16,2019-02-18 11:11:06,2,2.31,1,N,249,170,1,10.0,0.0,0.5,2.66,0.0,0.3,15.96,2.5 +5181,2,2019-02-28 22:39:19,2019-02-28 22:42:13,1,0.49,1,N,237,162,1,4.0,0.5,0.5,1.56,0.0,0.3,9.36,2.5 +2631,2,2019-02-01 00:55:00,2019-02-01 00:55:20,1,0.0,5,N,265,265,1,78.0,0.0,0.0,10.0,10.5,0.3,98.8,0.0 +4018,2,2019-02-11 16:59:03,2019-02-11 17:05:01,1,1.35,1,N,143,238,1,6.5,1.0,0.5,2.16,0.0,0.3,12.96,2.5 +7582,2,2019-02-08 17:29:00,2019-02-08 17:31:14,1,0.47,1,N,236,236,2,3.5,1.0,0.5,0.0,0.0,0.3,7.8,2.5 +1810,1,2019-02-17 13:10:12,2019-02-17 13:16:03,1,0.5,1,N,161,230,2,5.5,2.5,0.5,0.0,0.0,0.3,8.8,2.5 +5095,2,2019-02-09 14:30:05,2019-02-09 14:42:22,0,1.18,1,N,170,234,1,9.0,0.0,0.5,2.46,0.0,0.3,14.76,2.5 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv new file mode 100644 index 000000000000..5104e10f24c5 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +337596,1,2019-03-18 08:20:31,2019-03-18 09:12:55,N,1,49,35,1,6.1,33.5,0.0,0.5,0.0,0.0,,0.3,34.3,1,1,0.0 +378981,2,2019-03-20 13:03:10,2019-03-20 13:16:30,N,1,152,41,1,1.14,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 +6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0 +76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75 +282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0 +439163,2,2019-03-23 12:07:55,2019-03-23 12:11:20,N,1,166,166,1,0.75,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1,0.0 +518619,2,2019-03-27 18:33:46,2019-03-27 18:40:59,N,1,7,223,1,0.59,6.0,1.0,0.5,1.95,0.0,,0.3,9.75,1,1,0.0 +385766,2,2019-03-20 18:19:37,2019-03-20 18:28:31,N,1,129,129,1,1.21,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,2,1,0.0 +131793,2,2019-03-07 18:40:23,2019-03-07 18:58:26,N,1,51,32,1,3.35,14.0,1.0,0.5,0.0,0.0,,0.3,15.8,1,1,0.0 +203472,2,2019-03-11 12:46:20,2019-03-11 13:20:08,N,1,119,254,3,6.47,27.5,0.0,0.5,0.0,0.0,,0.3,28.3,1,1,0.0 +399106,2,2019-03-21 13:10:44,2019-03-21 13:21:19,N,1,74,75,1,1.27,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 +425811,2,2019-03-22 19:00:23,2019-03-22 19:02:51,N,1,95,95,1,0.51,4.0,1.0,0.5,2.0,0.0,,0.3,7.8,1,1,0.0 +36345,2,2019-03-02 19:27:41,2019-03-02 19:33:52,N,1,7,179,1,0.86,6.0,0.0,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 +323601,2,2019-03-17 10:19:28,2019-03-17 10:22:59,N,1,97,49,1,0.68,4.5,0.0,0.5,1.06,0.0,,0.3,6.36,1,1,0.0 +246980,2,2019-03-13 17:03:07,2019-03-13 17:12:04,N,1,75,263,2,1.32,8.0,1.0,0.5,2.51,0.0,,0.3,15.06,1,1,2.75 +269737,1,2019-03-14 18:03:35,2019-03-14 18:10:01,N,1,66,65,1,0.6,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1,0.0 +145416,2,2019-03-08 12:21:02,2019-03-08 12:22:57,N,1,75,75,1,0.5,3.5,0.0,0.5,0.0,0.0,,0.3,4.3,2,1,0.0 +142333,2,2019-03-08 09:39:09,2019-03-08 10:03:48,N,5,215,16,1,7.14,21.62,0.0,0.5,0.0,0.0,,0.0,22.12,1,2,0.0 +381567,2,2019-03-20 15:09:47,2019-03-20 15:22:12,N,1,89,188,1,1.75,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 +476898,1,2019-03-25 13:10:35,2019-03-25 13:20:23,N,1,244,243,1,1.1,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv new file mode 100644 index 000000000000..3d254ce261c2 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +7797,2,2019-03-20 15:40:46,2019-03-20 16:15:38,1,5.8,1,N,164,13,1,26.0,0.0,0.5,5.86,0.0,0.3,35.16,2.5 +671,1,2019-03-14 17:21:03,2019-03-14 17:29:45,1,1.5,1,N,239,166,1,8.0,3.5,0.5,2.45,0.0,0.3,14.75,2.5 +7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0 +9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5 +2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0 +5030,2,2019-03-13 20:22:28,2019-03-13 20:42:11,2,9.81,1,N,138,244,1,28.0,0.5,0.5,7.01,5.76,0.3,42.07,0.0 +7848,4,2019-03-21 13:22:08,2019-03-21 13:31:36,1,0.86,1,N,229,229,1,7.5,0.0,0.5,2.16,0.0,0.3,12.96,2.5 +167,1,2019-03-15 22:37:06,2019-03-15 22:38:34,1,0.0,1,N,264,145,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0 +7197,2,2019-03-14 20:38:43,2019-03-14 20:45:04,5,1.04,1,N,151,166,1,6.0,0.5,0.5,1.1,0.0,0.3,8.4,0.0 +568,1,2019-03-09 09:29:44,2019-03-09 09:40:36,0,2.7,1,N,24,142,1,10.5,2.5,0.5,2.75,0.0,0.3,16.55,2.5 +2198,2,2019-03-11 18:32:36,2019-03-11 18:48:27,1,2.02,1,N,234,162,2,11.5,1.0,0.5,0.0,0.0,0.3,15.8,2.5 +5249,1,2019-03-15 09:35:22,2019-03-15 09:46:33,1,2.4,1,N,140,161,1,10.0,2.5,0.5,1.33,0.0,0.3,14.63,2.5 +5531,1,2019-03-04 19:44:33,2019-03-04 20:14:34,0,15.8,1,N,132,145,1,43.5,1.0,0.5,11.3,0.0,0.3,56.6,0.0 +4943,1,2019-03-03 15:12:19,2019-03-03 15:16:06,1,1.2,1,N,237,236,1,5.5,2.5,0.5,1.1,0.0,0.3,9.9,2.5 +617,2,2019-03-20 14:48:47,2019-03-20 14:52:43,1,0.59,1,N,142,163,1,4.5,0.0,0.5,1.0,0.0,0.3,8.8,2.5 +7014,2,2019-03-03 23:56:36,2019-03-04 00:22:25,1,7.91,1,N,132,258,2,25.0,0.5,0.5,0.0,0.0,0.3,26.3,0.0 +2138,2,2019-03-23 14:47:31,2019-03-23 14:54:11,2,1.12,1,N,234,161,2,6.5,0.0,0.5,0.0,0.0,0.3,9.8,2.5 +5298,2,2019-03-20 15:47:08,2019-03-20 15:51:15,1,0.54,1,N,161,161,1,4.5,0.0,0.5,1.56,0.0,0.3,9.36,2.5 +9838,1,2019-03-13 20:42:59,2019-03-13 20:50:23,1,1.1,1,N,162,48,1,6.5,3.0,0.5,2.05,0.0,0.3,12.35,2.5 +2565,1,2019-03-13 10:28:36,2019-03-13 10:28:38,1,0.0,1,N,246,246,2,2.5,2.5,0.5,0.0,0.0,0.3,5.8,2.5 From 2ba5dedb936d2a23b997b7a9e366109e722825a2 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Mon, 25 Oct 2021 10:12:58 -0400 Subject: [PATCH 35/62] WIP example 4 --- ...configure_an_inferredassetdataconnector.md | 67 +++++++--------- ...configure_an_inferredassetdataconnector.py | 80 ++++++++++++++++++- 2 files changed, 110 insertions(+), 37 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index dfb620d93692..b0a9f3ac5fa4 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -349,55 +349,50 @@ Available data_asset_names (2 of 2): Unmatched data_references (0 of 0):[] ``` -Example 4: Nested directory structure with the data_asset_name on the outside ------------------------------------------------------------------------------ +### Example 4: Nested directory structure with the data_asset_name on the outside -In the following example, files are placed in a folder structure with the `data_asset_name` defined by the folder name (A, B, C, or D) +In the following example, files are placed in a folder structure with the `data_asset_name` defined by the folder name (yellow_tripdata or green_tripdata) ``` -A/A-1.csv -A/A-2.csv -A/A-3.csv -B/B-1.csv -B/B-2.csv -B/B-3.csv -C/C-1.csv -C/C-2.csv -C/C-3.csv -D/D-1.csv -D/D-2.csv -D/D-3.csv +yellow_tripdata/yellow_tripdata_2019-01.csv +yellow_tripdata/yellow_tripdata_2019-02.csv +yellow_tripdata/yellow_tripdata_2019-03.csv +green_tripdata/2019-01.csv +green_tripdata/2019-02.csv +green_tripdata/2019-03.csv ``` Then this configuration... -```yaml -class_name: InferredAssetFilesystemDataConnector -base_directory: / + + -default_regex: - group_names: - - data_asset_name - - letter - - number - pattern: (\w{1})/(\w{1})-(\d{1})\.csv +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L384-L403 +``` + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L414-L433 ``` -...will now make `A` and `B` and `C` into data_assets, with each containing 3 data_references + + + +...will now make `yellow_tripdata` and `green_tripdata` into Data Assets, with each containing 3 data_references ```bash -Available data_asset_names (3 of 4): - A (3 of 3): ['test_dir_charlie/A/A-1.csv', - 'test_dir_charlie/A/A-2.csv', - 'test_dir_charlie/A/A-3.csv'] - B (3 of 3): ['test_dir_charlie/B/B-1.csv', - 'test_dir_charlie/B/B-2.csv', - 'test_dir_charlie/B/B-3.csv'] - C (3 of 3): ['test_dir_charlie/C/C-1.csv', - 'test_dir_charlie/C/C-2.csv', - 'test_dir_charlie/C/C-3.csv'] +Available data_asset_names (2 of 2): + green_tripdata (3 of 3): ['green_tripdata/2019-01.csv', 'green_tripdata/2019-02.csv', 'green_tripdata/2019-03.csv'] + yellow_tripdata (3 of 3): ['yellow_tripdata/yellow_tripdata_2019-01.csv', 'yellow_tripdata/yellow_tripdata_2019-02.csv', 'yellow_tripdata/yellow_tripdata_2019-03.csv'] -Unmatched data_references (0 of 0): [] +Unmatched data_references (0 of 0):[] ``` Example 5: Redundant information in the naming convention (S3 Bucket) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index 0cca1bbaa35a..a72c9ca4f343 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -369,7 +369,85 @@ context.add_datasource(**datasource_config) assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] -# TODO: Uncomment the lines below once ISSUE # () is resolved +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) +assert "green_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_inferred_data_connector_name: + class_name: InferredAssetFilesystemDataConnector + base_directory: / + glob_directive: "*/*.csv" + default_regex: + group_names: + - data_asset_name + - optional_data_asset_name + - year + - month + pattern: (.*)/(.*)(\d{4})-(\d{2})\.csv +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "/", "../data/nested_directories_data_asset/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_inferred_data_connector_name": { + "class_name": "InferredAssetFilesystemDataConnector", + "base_directory": "/", + "glob_directive": "*/*.csv", + "default_regex": { + "group_names": ["data_asset_name", "optional_data_asset_name", "year", "month"], + "pattern": "(.*)/(.*)(\d{4})-(\d{2})\.csv", + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "base_directory" +] = "../data/nested_directories_data_asset/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +# NOTE: The following code is only for testing and can be ignored by users. +assert test_yaml == test_python + +context.add_datasource(**datasource_config) + +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] assert "yellow_tripdata" in set( context.get_available_data_asset_names()["taxi_datasource"][ "default_inferred_data_connector_name" From 4faee0fc0c322a8a57beaf2da6bee6d452844048 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Mon, 25 Oct 2021 10:38:03 -0400 Subject: [PATCH 36/62] Working example 4 --- ...onfigure_a_configuredassetdataconnector.md | 102 +++++++----------- ...configure_an_inferredassetdataconnector.md | 2 +- ...onfigure_a_configuredassetdataconnector.py | 90 ++++++++++++++++ ...configure_an_inferredassetdataconnector.py | 7 +- .../green_tripdata/green_tripdata_2019-01.csv | 21 ++++ .../green_tripdata/green_tripdata_2019-02.csv | 21 ++++ .../green_tripdata/green_tripdata_2019-03.csv | 21 ++++ .../tripdata/yellow_tripdata_2019-01.txt | 21 ++++ .../tripdata/yellow_tripdata_2019-02.txt | 21 ++++ .../tripdata/yellow_tripdata_2019-03.txt | 21 ++++ 10 files changed, 264 insertions(+), 63 deletions(-) create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt create mode 100644 tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index f8c006d2bb0f..1805580ffe39 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -315,12 +315,12 @@ Unmatched data_references (0 of 0): [] In the following example, files are placed folders that match the `data_asset_names` we want (`yellow_tripdata` and `green_tripdata`), but the filenames follow different formats. ``` -yellow_tripdata/yellow_tripdata_2019-01.csv -yellow_tripdata/yellow_tripdata_2019-02.csv -yellow_tripdata/yellow_tripdata_2019-03.csv -green_tripdata/2019-01.csv -green_tripdata/2019-02.csv -green_tripdata/2019-03.csv +/yellow_tripdata/yellow_tripdata_2019-01.csv +/yellow_tripdata/yellow_tripdata_2019-02.csv +/yellow_tripdata/yellow_tripdata_2019-03.csv +/green_tripdata/2019-01.csv +/green_tripdata/2019-02.csv +/green_tripdata/2019-03.csv ``` /yellow/tripdata/2019-01.txt +/yellow/tripdata/2019-02.txt +/yellow/tripdata/2019-03.txt +/green_tripdata/2019-01.csv +/green_tripdata/2019-02.csv +/green_tripdata/2019-03.csv ``` The following configuration... -```yaml -class_name: ConfiguredAssetFilesystemDataConnector -base_directory: my_base_directory/ -default_regex: - pattern: ^(.+)-(\d{4})(\d{2})\.(csv|txt)$ - group_names: - - data_asset_name - - year_dir - - month_dir -assets: - alpha: - base_directory: my_base_directory/alpha/files/go/here/ - glob_directive: "*.csv" - beta: - base_directory: my_base_directory/beta_here/ - glob_directive: "*.txt" - gamma: - glob_directive: "*.csv" -``` - -...will make `alpha`, `beta` and `gamma` available a DataAssets, with the following data_references: + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L502-L526 +``` + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L537-L565 +``` + + + + +...will make `yellow_tripdata` and `green_tripdata` available as Data Assets, with the following data_references: ```bash -Available data_asset_names (3 of 3): - alpha (3 of 3): [ - 'alpha-202001.csv', - 'alpha-202002.csv', - 'alpha-202003.csv' - ] - beta (4 of 4): [ - 'beta-202001.txt', - 'beta-202002.txt', - 'beta-202003.txt', - 'beta-202004.txt' - ] - gamma (5 of 5): [ - 'gamma-202001.csv', - 'gamma-202002.csv', - 'gamma-202003.csv', - 'gamma-202004.csv', - 'gamma-202005.csv', - ] +Available data_asset_names (2 of 2): + green_tripdata (3 of 3): ['green_tripdata_2019-01.', 'green_tripdata_2019-02.', 'green_tripdata_2019-03.'] + yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.', 'yellow_tripdata_2019-02.', 'yellow_tripdata_2019-03.'] + +Unmatched data_references (0 of 0):[] ``` diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index b0a9f3ac5fa4..400abdfc3587 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -379,7 +379,7 @@ Then this configuration... -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L414-L433 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L414-L438 ``` diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index 6febe2c6b186..74ffd06845a2 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -497,3 +497,93 @@ "default_configured_data_connector_name" ] ) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_configured_data_connector_name: + class_name: ConfiguredAssetFilesystemDataConnector + base_directory: / + default_regex: + pattern: (.*)_(\d{4})-(\d{2})\.(csv|txt)$ + group_names: + - data_asset_name + - year + - month + assets: + yellow_tripdata: + base_directory: yellow/tripdata/ + glob_directive: "*.txt" + green_tripdata: + base_directory: green_tripdata/ + glob_directive: "*.csv" +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "/", "../data/nested_directories_complex/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_configured_data_connector_name": { + "class_name": "ConfiguredAssetFilesystemDataConnector", + "base_directory": "/", + "default_regex": { + "pattern": "(.*)_(\d{4})-(\d{2})\.(csv|txt)$", + "group_names": ["data_asset_name", "year", "month"] + }, + "assets": { + "yellow_tripdata": { + "base_directory": "yellow/tripdata/", + "glob_directive": "*.txt" + }, + "green_tripdata": { + "base_directory": "green_tripdata/", + "glob_directive": "*.csv" + }, + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_configured_data_connector_name"][ + "base_directory" +] = "../data/nested_directories_complex/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +# NOTE: The following code is only for testing and can be ignored by users. +assert test_yaml == test_python +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) +assert "green_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index a72c9ca4f343..cdeb62a30473 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -425,7 +425,12 @@ "base_directory": "/", "glob_directive": "*/*.csv", "default_regex": { - "group_names": ["data_asset_name", "optional_data_asset_name", "year", "month"], + "group_names": [ + "data_asset_name", + "optional_data_asset_name", + "year", + "month", + ], "pattern": "(.*)/(.*)(\d{4})-(\d{2})\.csv", }, }, diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv new file mode 100644 index 000000000000..07a92dc26d64 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +198236,2,2019-01-11 01:12:46,2019-01-11 01:20:37,N,1,129,129,1,1.39,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, +33286,1,2019-01-02 23:41:23,2019-01-02 23:49:14,N,1,255,112,1,1.5,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1, +517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1, +368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1, +155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1, +366694,2,2019-01-18 22:58:57,2019-01-18 23:15:37,N,1,75,185,1,8.71,24.5,0.5,0.5,0.0,0.0,,0.3,25.8,1,1, +474972,2,2019-01-24 17:59:00,2019-01-24 18:35:53,N,5,195,90,1,6.46,23.45,0.0,0.5,0.0,0.0,,0.0,23.95,1,2, +69932,1,2019-01-04 18:07:02,2019-01-04 18:34:37,N,1,97,35,2,4.7,20.0,1.0,0.5,0.0,0.0,,0.3,21.8,1,1, +244782,2,2019-01-13 02:16:18,2019-01-13 02:37:30,N,1,260,198,1,3.9,16.5,0.5,0.5,0.0,0.0,,0.3,17.8,2,1, +482363,2,2019-01-25 04:42:54,2019-01-25 04:47:06,N,1,82,56,1,0.74,5.0,0.5,0.5,0.0,0.0,,0.3,6.3,2,1, +573895,2,2019-01-29 11:36:48,2019-01-29 11:43:03,N,1,43,236,1,0.91,6.0,0.0,0.5,2.0,0.0,,0.3,8.8,1,1,0.0 +182986,2,2019-01-10 12:07:54,2019-01-10 12:19:31,N,1,29,155,1,5.49,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, +526993,1,2019-01-26 22:13:18,2019-01-26 22:13:51,N,1,33,33,1,0.1,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1, +490940,2,2019-01-25 13:09:50,2019-01-25 13:15:30,N,1,220,153,1,0.35,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,1,1, +145635,2,2019-01-08 16:01:58,2019-01-08 16:10:16,N,1,77,72,1,1.48,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,1,1, +242083,1,2019-01-12 23:43:28,2019-01-13 00:02:13,N,1,210,216,1,12.3,34.5,0.5,0.5,0.0,0.0,,0.3,35.8,2,1, +328924,2,2019-01-17 10:07:09,2019-01-17 10:25:59,N,1,32,242,1,4.71,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, +132260,1,2019-01-07 21:00:39,2019-01-07 21:18:13,N,1,52,188,1,3.6,14.5,0.5,0.5,0.0,0.0,,0.3,15.8,2,1, +568032,2,2019-01-29 07:01:03,2019-01-29 07:54:28,N,1,225,251,1,16.91,55.0,0.0,0.5,0.0,11.52,,0.3,67.32,1,1,0.0 +92205,2,2019-01-05 20:26:49,2019-01-05 20:44:48,N,1,29,108,1,3.93,15.5,0.5,0.5,0.0,0.0,,0.3,16.8,1,1, diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv new file mode 100644 index 000000000000..9a6442e61899 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +504629,2,2019-02-25 16:05:49,2019-02-25 16:11:15,N,1,179,179,1,0.89,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,2,1,0.0 +475628,1,2019-02-23 23:18:58,2019-02-23 23:31:00,N,1,210,55,1,4.2,15.0,0.5,0.5,0.0,0.0,,0.3,16.3,2,1,0.0 +150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75 +245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0 +151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0 +18916,2,2019-02-01 19:54:28,2019-02-01 19:58:41,N,1,74,262,1,1.04,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,1,1,0.0 +110154,2,2019-02-06 08:33:30,2019-02-06 08:48:52,N,1,66,87,1,2.14,12.0,0.0,0.5,4.66,0.0,,0.3,20.21,1,1,2.75 +335114,2,2019-02-16 18:40:24,2019-02-16 18:46:47,N,1,74,75,1,0.89,6.0,0.0,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 +453776,2,2019-02-22 22:38:18,2019-02-22 22:49:20,N,1,116,41,1,1.86,10.0,0.5,0.5,2.26,0.0,,0.3,13.56,1,1,0.0 +192685,2,2019-02-09 19:56:13,2019-02-09 20:13:56,N,1,167,235,1,2.59,13.0,0.0,0.5,0.0,0.0,,0.3,13.8,1,1,0.0 +574982,2,2019-02-28 22:42:50,2019-02-28 22:47:40,N,1,42,116,1,0.91,5.5,0.5,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 +155855,2,2019-02-08 09:02:21,2019-02-08 10:17:23,N,5,210,161,1,20.74,59.86,0.0,0.5,0.0,0.0,,0.0,60.36,1,2,0.0 +130109,2,2019-02-07 05:18:25,2019-02-07 05:24:30,N,1,62,188,4,1.54,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,1,1,0.0 +157145,2,2019-02-08 10:20:47,2019-02-08 10:47:36,N,1,74,31,2,7.73,25.5,0.0,0.5,5.26,0.0,,0.3,31.56,1,1,0.0 +411748,2,2019-02-20 22:31:32,2019-02-20 22:37:11,N,1,129,129,4,0.9,5.5,0.5,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 +470538,2,2019-02-23 18:15:38,2019-02-23 18:27:05,N,1,126,213,1,2.21,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,1,1,0.0 +262705,2,2019-02-13 13:07:02,2019-02-13 13:19:50,N,1,186,249,1,1.5,9.5,0.0,0.5,2.56,0.0,,0.3,15.36,1,1,2.5 +163160,2,2019-02-08 15:39:26,2019-02-08 16:06:31,N,1,33,17,1,2.9,17.5,0.0,0.5,0.0,0.0,,0.3,18.3,2,1,0.0 +263174,2,2019-02-13 15:06:57,2019-02-13 15:28:26,N,1,165,123,5,1.66,13.5,0.0,0.5,0.0,0.0,,0.3,14.3,1,1,0.0 +451270,2,2019-02-22 21:00:04,2019-02-22 21:00:24,N,1,82,82,1,0.06,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv new file mode 100644 index 000000000000..5104e10f24c5 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv @@ -0,0 +1,21 @@ +,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge +337596,1,2019-03-18 08:20:31,2019-03-18 09:12:55,N,1,49,35,1,6.1,33.5,0.0,0.5,0.0,0.0,,0.3,34.3,1,1,0.0 +378981,2,2019-03-20 13:03:10,2019-03-20 13:16:30,N,1,152,41,1,1.14,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 +6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0 +76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75 +282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0 +439163,2,2019-03-23 12:07:55,2019-03-23 12:11:20,N,1,166,166,1,0.75,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1,0.0 +518619,2,2019-03-27 18:33:46,2019-03-27 18:40:59,N,1,7,223,1,0.59,6.0,1.0,0.5,1.95,0.0,,0.3,9.75,1,1,0.0 +385766,2,2019-03-20 18:19:37,2019-03-20 18:28:31,N,1,129,129,1,1.21,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,2,1,0.0 +131793,2,2019-03-07 18:40:23,2019-03-07 18:58:26,N,1,51,32,1,3.35,14.0,1.0,0.5,0.0,0.0,,0.3,15.8,1,1,0.0 +203472,2,2019-03-11 12:46:20,2019-03-11 13:20:08,N,1,119,254,3,6.47,27.5,0.0,0.5,0.0,0.0,,0.3,28.3,1,1,0.0 +399106,2,2019-03-21 13:10:44,2019-03-21 13:21:19,N,1,74,75,1,1.27,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 +425811,2,2019-03-22 19:00:23,2019-03-22 19:02:51,N,1,95,95,1,0.51,4.0,1.0,0.5,2.0,0.0,,0.3,7.8,1,1,0.0 +36345,2,2019-03-02 19:27:41,2019-03-02 19:33:52,N,1,7,179,1,0.86,6.0,0.0,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 +323601,2,2019-03-17 10:19:28,2019-03-17 10:22:59,N,1,97,49,1,0.68,4.5,0.0,0.5,1.06,0.0,,0.3,6.36,1,1,0.0 +246980,2,2019-03-13 17:03:07,2019-03-13 17:12:04,N,1,75,263,2,1.32,8.0,1.0,0.5,2.51,0.0,,0.3,15.06,1,1,2.75 +269737,1,2019-03-14 18:03:35,2019-03-14 18:10:01,N,1,66,65,1,0.6,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1,0.0 +145416,2,2019-03-08 12:21:02,2019-03-08 12:22:57,N,1,75,75,1,0.5,3.5,0.0,0.5,0.0,0.0,,0.3,4.3,2,1,0.0 +142333,2,2019-03-08 09:39:09,2019-03-08 10:03:48,N,5,215,16,1,7.14,21.62,0.0,0.5,0.0,0.0,,0.0,22.12,1,2,0.0 +381567,2,2019-03-20 15:09:47,2019-03-20 15:22:12,N,1,89,188,1,1.75,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 +476898,1,2019-03-25 13:10:35,2019-03-25 13:20:23,N,1,244,243,1,1.1,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt new file mode 100644 index 000000000000..288e8ac8a023 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +4565,2,2019-01-15 02:04:57,2019-01-15 02:17:53,1,4.92,1,N,255,226,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36, +714,2,2019-01-08 00:38:49,2019-01-08 00:52:30,1,7.07,1,N,132,197,1,21.0,0.5,0.5,4.46,0.0,0.3,26.76, +2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0 +5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95, +4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56, +1505,2,2019-01-29 17:58:08,2019-01-29 18:11:19,1,0.72,1,N,13,13,2,9.0,1.0,0.5,0.0,0.0,0.3,10.8,0.0 +8131,1,2019-01-13 16:22:03,2019-01-13 16:33:06,2,1.4,1,N,161,170,1,8.5,0.0,0.5,3.0,0.0,0.3,12.3, +9601,2,2019-01-23 12:07:55,2019-01-23 12:11:19,5,0.57,1,N,239,239,1,4.0,0.0,0.5,1.0,0.0,0.3,5.8,0.0 +4522,2,2019-01-25 15:42:39,2019-01-25 15:50:54,1,1.26,1,N,170,229,1,7.5,0.0,0.5,1.0,0.0,0.3,9.3,0.0 +9008,1,2019-01-13 19:40:27,2019-01-13 19:50:54,3,2.1,1,N,239,263,1,9.5,0.0,0.5,2.05,0.0,0.3,12.35, +7012,2,2019-01-19 20:44:35,2019-01-19 21:05:13,1,3.16,1,N,142,107,1,14.5,0.5,0.5,3.16,0.0,0.3,18.96, +7316,2,2019-01-07 22:38:10,2019-01-07 22:44:31,2,2.4,1,N,233,263,1,8.5,0.5,0.5,1.47,0.0,0.3,11.27, +6384,2,2019-01-25 07:35:59,2019-01-25 07:57:44,1,5.12,1,N,143,231,1,18.5,0.0,0.5,2.5,0.0,0.3,21.8,0.0 +3652,1,2019-01-26 20:43:57,2019-01-26 21:03:51,1,4.4,1,N,13,230,2,17.0,0.5,0.5,0.0,0.0,0.3,18.3,0.0 +6409,2,2019-01-06 22:27:16,2019-01-06 22:33:58,1,1.52,1,N,161,234,2,7.0,0.5,0.5,0.0,0.0,0.3,8.3, +1130,2,2019-01-15 20:05:59,2019-01-15 20:22:11,1,3.96,1,N,233,41,2,14.5,0.5,0.5,0.0,0.0,0.3,15.8, +5618,2,2019-01-14 14:47:18,2019-01-14 14:54:44,1,0.5,1,N,140,237,1,6.0,0.0,0.5,1.0,0.0,0.3,7.8, +8515,1,2019-01-16 16:26:17,2019-01-16 16:41:36,2,1.4,1,N,233,163,1,10.5,1.0,0.5,2.45,0.0,0.3,14.75, +5397,1,2019-01-23 08:31:58,2019-01-23 08:55:43,1,2.6,1,N,246,163,1,16.0,0.0,0.5,5.0,0.0,0.3,21.8,0.0 +4648,1,2019-01-30 19:59:30,2019-01-30 20:10:34,1,1.9,1,N,186,79,1,9.0,0.5,0.5,2.05,0.0,0.3,12.35,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt new file mode 100644 index 000000000000..573017273621 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +9873,2,2019-02-14 14:06:26,2019-02-14 14:24:36,2,1.24,1,N,68,161,2,11.5,0.0,0.5,0.0,0.0,0.3,14.8,2.5 +9976,1,2019-02-08 14:20:27,2019-02-08 14:25:30,0,0.7,1,N,186,249,2,5.0,2.5,0.5,0.0,0.0,0.3,8.3,2.5 +4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5 +1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5 +6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5 +1633,2,2019-02-09 12:14:12,2019-02-10 11:50:00,3,3.02,1,N,48,236,1,12.5,0.0,0.5,1.58,0.0,0.3,17.38,2.5 +4450,1,2019-02-05 08:56:21,2019-02-05 09:02:06,1,0.8,1,N,262,75,1,5.5,2.5,0.5,1.7,0.0,0.3,10.5,2.5 +8929,2,2019-02-13 11:44:00,2019-02-13 11:57:32,1,1.13,1,N,100,170,2,9.0,0.0,0.5,0.0,0.0,0.3,12.3,2.5 +6968,2,2019-02-27 20:52:49,2019-02-27 21:18:37,1,4.64,1,N,246,125,1,18.5,0.5,0.5,4.46,0.0,0.3,26.76,2.5 +7969,2,2019-02-01 11:12:35,2019-02-01 11:19:39,1,0.54,1,N,236,263,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8,0.0 +7695,1,2019-02-28 00:51:50,2019-02-28 00:55:37,1,1.3,1,N,263,74,2,6.0,0.5,0.5,0.0,0.0,0.3,7.3,0.0 +9904,1,2019-02-16 17:45:00,2019-02-16 17:51:09,1,1.3,1,N,170,162,1,6.5,2.5,0.5,1.95,0.0,0.3,11.75,2.5 +1580,2,2019-02-07 10:14:39,2019-02-07 10:51:38,1,17.49,2,N,186,132,2,52.0,0.0,0.5,0.0,5.76,0.3,61.06,2.5 +699,2,2019-02-18 11:01:16,2019-02-18 11:11:06,2,2.31,1,N,249,170,1,10.0,0.0,0.5,2.66,0.0,0.3,15.96,2.5 +5181,2,2019-02-28 22:39:19,2019-02-28 22:42:13,1,0.49,1,N,237,162,1,4.0,0.5,0.5,1.56,0.0,0.3,9.36,2.5 +2631,2,2019-02-01 00:55:00,2019-02-01 00:55:20,1,0.0,5,N,265,265,1,78.0,0.0,0.0,10.0,10.5,0.3,98.8,0.0 +4018,2,2019-02-11 16:59:03,2019-02-11 17:05:01,1,1.35,1,N,143,238,1,6.5,1.0,0.5,2.16,0.0,0.3,12.96,2.5 +7582,2,2019-02-08 17:29:00,2019-02-08 17:31:14,1,0.47,1,N,236,236,2,3.5,1.0,0.5,0.0,0.0,0.3,7.8,2.5 +1810,1,2019-02-17 13:10:12,2019-02-17 13:16:03,1,0.5,1,N,161,230,2,5.5,2.5,0.5,0.0,0.0,0.3,8.8,2.5 +5095,2,2019-02-09 14:30:05,2019-02-09 14:42:22,0,1.18,1,N,170,234,1,9.0,0.0,0.5,2.46,0.0,0.3,14.76,2.5 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt new file mode 100644 index 000000000000..3d254ce261c2 --- /dev/null +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt @@ -0,0 +1,21 @@ +,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge +7797,2,2019-03-20 15:40:46,2019-03-20 16:15:38,1,5.8,1,N,164,13,1,26.0,0.0,0.5,5.86,0.0,0.3,35.16,2.5 +671,1,2019-03-14 17:21:03,2019-03-14 17:29:45,1,1.5,1,N,239,166,1,8.0,3.5,0.5,2.45,0.0,0.3,14.75,2.5 +7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0 +9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5 +2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0 +5030,2,2019-03-13 20:22:28,2019-03-13 20:42:11,2,9.81,1,N,138,244,1,28.0,0.5,0.5,7.01,5.76,0.3,42.07,0.0 +7848,4,2019-03-21 13:22:08,2019-03-21 13:31:36,1,0.86,1,N,229,229,1,7.5,0.0,0.5,2.16,0.0,0.3,12.96,2.5 +167,1,2019-03-15 22:37:06,2019-03-15 22:38:34,1,0.0,1,N,264,145,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0 +7197,2,2019-03-14 20:38:43,2019-03-14 20:45:04,5,1.04,1,N,151,166,1,6.0,0.5,0.5,1.1,0.0,0.3,8.4,0.0 +568,1,2019-03-09 09:29:44,2019-03-09 09:40:36,0,2.7,1,N,24,142,1,10.5,2.5,0.5,2.75,0.0,0.3,16.55,2.5 +2198,2,2019-03-11 18:32:36,2019-03-11 18:48:27,1,2.02,1,N,234,162,2,11.5,1.0,0.5,0.0,0.0,0.3,15.8,2.5 +5249,1,2019-03-15 09:35:22,2019-03-15 09:46:33,1,2.4,1,N,140,161,1,10.0,2.5,0.5,1.33,0.0,0.3,14.63,2.5 +5531,1,2019-03-04 19:44:33,2019-03-04 20:14:34,0,15.8,1,N,132,145,1,43.5,1.0,0.5,11.3,0.0,0.3,56.6,0.0 +4943,1,2019-03-03 15:12:19,2019-03-03 15:16:06,1,1.2,1,N,237,236,1,5.5,2.5,0.5,1.1,0.0,0.3,9.9,2.5 +617,2,2019-03-20 14:48:47,2019-03-20 14:52:43,1,0.59,1,N,142,163,1,4.5,0.0,0.5,1.0,0.0,0.3,8.8,2.5 +7014,2,2019-03-03 23:56:36,2019-03-04 00:22:25,1,7.91,1,N,132,258,2,25.0,0.5,0.5,0.0,0.0,0.3,26.3,0.0 +2138,2,2019-03-23 14:47:31,2019-03-23 14:54:11,2,1.12,1,N,234,161,2,6.5,0.0,0.5,0.0,0.0,0.3,9.8,2.5 +5298,2,2019-03-20 15:47:08,2019-03-20 15:51:15,1,0.54,1,N,161,161,1,4.5,0.0,0.5,1.56,0.0,0.3,9.36,2.5 +9838,1,2019-03-13 20:42:59,2019-03-13 20:50:23,1,1.1,1,N,162,48,1,6.5,3.0,0.5,2.05,0.0,0.3,12.35,2.5 +2565,1,2019-03-13 10:28:36,2019-03-13 10:28:38,1,0.0,1,N,246,246,2,2.5,2.5,0.5,0.0,0.0,0.3,5.8,2.5 From cf72978b68930ebb7ce9bccbdaf6c30407a1b2fd Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Mon, 25 Oct 2021 11:04:16 -0400 Subject: [PATCH 37/62] Working example 5 --- ...onfigure_a_configuredassetdataconnector.md | 12 +- ...configure_an_inferredassetdataconnector.md | 179 +++++------------- ...onfigure_a_configuredassetdataconnector.py | 6 +- ...configure_an_inferredassetdataconnector.py | 82 ++++++++ 4 files changed, 138 insertions(+), 141 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 1805580ffe39..db2f01d70e92 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -359,12 +359,12 @@ Unmatched data_references (0 of 0):[] In this example, the assets `yellow_tripdata` and `green_tripdata` are being explicitly defined in the configuration, and have a more complex nesting pattern. ``` -/yellow/tripdata/2019-01.txt -/yellow/tripdata/2019-02.txt -/yellow/tripdata/2019-03.txt -/green_tripdata/2019-01.csv -/green_tripdata/2019-02.csv -/green_tripdata/2019-03.csv +/yellow/tripdata/yellow_tripdata_2019-01.txt +/yellow/tripdata/yellow_tripdata_2019-02.txt +/yellow/tripdata/yellow_tripdata_2019-03.txt +/green_tripdata/green_tripdata_2019-01.csv +/green_tripdata/green_tripdata_2019-02.csv +/green_tripdata/green_tripdata_2019-03.csv ``` The following configuration... diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 400abdfc3587..afbf9512c57a 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -302,18 +302,18 @@ Unmatched data_references (0 of 0): [] Here’s a similar example, with a nested directory structure... ``` -2018/10/yellow_tripdata.csv -2018/10/green_tripdata.csv -2018/11/yellow_tripdata.csv -2018/11/green_tripdata.csv -2018/12/yellow_tripdata.csv -2018/12/green_tripdata.csv -2019/01/yellow_tripdata.csv -2019/01/green_tripdata.csv -2019/02/yellow_tripdata.csv -2019/02/green_tripdata.csv -2019/03/yellow_tripdata.csv -2019/03/green_tripdata.csv +/2018/10/yellow_tripdata.csv +/2018/10/green_tripdata.csv +/2018/11/yellow_tripdata.csv +/2018/11/green_tripdata.csv +/2018/12/yellow_tripdata.csv +/2018/12/green_tripdata.csv +/2019/01/yellow_tripdata.csv +/2019/01/green_tripdata.csv +/2019/02/yellow_tripdata.csv +/2019/02/green_tripdata.csv +/2019/03/yellow_tripdata.csv +/2019/03/green_tripdata.csv ``` Then this configuration... @@ -351,15 +351,15 @@ Unmatched data_references (0 of 0):[] ### Example 4: Nested directory structure with the data_asset_name on the outside -In the following example, files are placed in a folder structure with the `data_asset_name` defined by the folder name (yellow_tripdata or green_tripdata) +In the following example, files are placed in a folder structure with the `data_asset_name` defined by the folder name (`yellow_tripdata` or `green_tripdata`) ``` -yellow_tripdata/yellow_tripdata_2019-01.csv -yellow_tripdata/yellow_tripdata_2019-02.csv -yellow_tripdata/yellow_tripdata_2019-03.csv -green_tripdata/2019-01.csv -green_tripdata/2019-02.csv -green_tripdata/2019-03.csv +/yellow_tripdata/yellow_tripdata_2019-01.csv +/yellow_tripdata/yellow_tripdata_2019-02.csv +/yellow_tripdata/yellow_tripdata_2019-03.csv +/green_tripdata/2019-01.csv +/green_tripdata/2019-02.csv +/green_tripdata/2019-03.csv ``` Then this configuration... @@ -395,133 +395,48 @@ Available data_asset_names (2 of 2): Unmatched data_references (0 of 0):[] ``` -Example 5: Redundant information in the naming convention (S3 Bucket) ----------------------------------------------------------------------- +### Example 5: Redundant information in the naming convention -Here’s another example of a nested directory structure with data_asset_name defined in the bucket_name. +In the following example, files are placed in a folder structure with the `data_asset_name` defined by the folder name (`yellow_tripdata` or `green_tripdata`), but then the term `yellow_tripdata` is repeated in some filenames. ``` -my_bucket/2021/01/01/log_file-20210101.txt.gz, -my_bucket/2021/01/02/log_file-20210102.txt.gz, -my_bucket/2021/01/03/log_file-20210103.txt.gz, -my_bucket/2021/01/04/log_file-20210104.txt.gz, -my_bucket/2021/01/05/log_file-20210105.txt.gz, -my_bucket/2021/01/06/log_file-20210106.txt.gz, -my_bucket/2021/01/07/log_file-20210107.txt.gz, +/yellow_tripdata/yellow_tripdata_2019-01.csv +/yellow_tripdata/yellow_tripdata_2019-02.csv +/yellow_tripdata/yellow_tripdata_2019-03.csv +/green_tripdata/2019-01.csv +/green_tripdata/2019-02.csv +/green_tripdata/2019-03.csv ``` +Then this configuration... -Here’s a configuration that will allow all the log files in the bucket to be associated with a single data_asset, `my_bucket` - -```yaml -class_name: InferredAssetFilesystemDataConnector -base_directory: / - -default_regex: - group_names: - - year - - month - - day - - data_asset_name - pattern: (\w{11})/(\d{4})/(\d{2})/(\d{2})/log_file-.*\.csv -``` - -All the log files will be mapped to a single data_asset named `my_bucket`. - -```bash -Available data_asset_names (1 of 1): - my_bucket (3 of 7): [ - 'my_bucket/2021/01/03/log_file-*.csv', - 'my_bucket/2021/01/04/log_file-*.csv', - 'my_bucket/2021/01/05/log_file-*.csv' - ] - -Unmatched data_references (0 of 0): [] -``` - - -Example 6: Random information in the naming convention -------------------------------------------------------------------------------- - -In the following example, files are placed in folders according to the date of creation, and given a random hash value in their name. - -``` -2021/01/01/log_file-2f1e94b40f310274b485e72050daf591.txt.gz -2021/01/02/log_file-7f5d35d4f90bce5bf1fad680daac48a2.txt.gz -2021/01/03/log_file-99d5ed1123f877c714bbe9a2cfdffc4b.txt.gz -2021/01/04/log_file-885d40a5661bbbea053b2405face042f.txt.gz -2021/01/05/log_file-d8e478f817b608729cfc8fb750ebfc84.txt.gz -2021/01/06/log_file-b1ca8d1079c00fd4e210f7ef31549162.txt.gz -2021/01/07/log_file-d34b4818c52e74b7827504920af19a5c.txt.gz -``` - -Here’s a configuration that will allow all the log files to be associated with a single data_asset, `log_file` - -```yaml -class_name: InferredAssetFilesystemDataConnector -base_directory: / - -default_regex: - group_names: - - year - - month - - day - - data_asset_name - pattern: (\d{4})/(\d{2})/(\d{2})/(log_file)-.*\.txt\.gz -``` - -... will give you the following output - -```bash -Available data_asset_names (1 of 1): - log_file (3 of 7): [ - '2021/01/03/log_file-*.txt.gz', - '2021/01/04/log_file-*.txt.gz', - '2021/01/05/log_file-*.txt.gz' - ] + + -Unmatched data_references (0 of 0): [] +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L468-L486 ``` -Example 7: Redundant information in the naming convention (timestamp of file creation) --------------------------------------------------------------------------------------- - -In the following example, files are placed in a single folder, and the name includes a timestamp of when the files were created + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L497-L520 ``` -log_file-2021-01-01-035419.163324.txt.gz -log_file-2021-01-02-035513.905752.txt.gz -log_file-2021-01-03-035455.848839.txt.gz -log_file-2021-01-04-035251.47582.txt.gz -log_file-2021-01-05-033034.289789.txt.gz -log_file-2021-01-06-034958.505688.txt.gz -log_file-2021-01-07-033545.600898.txt.gz -``` - -Here’s a configuration that will allow all the log files to be associated with a single data_asset named `log_file`. -```yaml -class_name: InferredAssetFilesystemDataConnector -base_directory: / - -default_regex: - group_names: - - data_asset_name - - year - - month - - day - pattern: (log_file)-(\d{4})-(\d{2})-(\d{2})-.*\.*\.txt\.gz -``` + + -All the log files will be mapped to the data_asset `log_file`. +will not display the redundant information: ```bash -Available data_asset_names (1 of 1): - some_bucket (3 of 7): [ - 'some_bucket/2021/01/03/log_file-*.txt.gz', - 'some_bucket/2021/01/04/log_file-*.txt.gz', - 'some_bucket/2021/01/05/log_file-*.txt.gz' -] +Available data_asset_names (2 of 2): + green_tripdata (3 of 3): ['green_tripdata/*2019-01.csv', 'green_tripdata/*2019-02.csv', 'green_tripdata/*2019-03.csv'] + yellow_tripdata (3 of 3): ['yellow_tripdata/*2019-01.csv', 'yellow_tripdata/*2019-02.csv', 'yellow_tripdata/*2019-03.csv'] -Unmatched data_references (0 of 0): [] +Unmatched data_references (0 of 0):[] ``` diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py index 74ffd06845a2..c32cc923fea2 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py @@ -548,16 +548,16 @@ "base_directory": "/", "default_regex": { "pattern": "(.*)_(\d{4})-(\d{2})\.(csv|txt)$", - "group_names": ["data_asset_name", "year", "month"] + "group_names": ["data_asset_name", "year", "month"], }, "assets": { "yellow_tripdata": { "base_directory": "yellow/tripdata/", - "glob_directive": "*.txt" + "glob_directive": "*.txt", }, "green_tripdata": { "base_directory": "green_tripdata/", - "glob_directive": "*.csv" + "glob_directive": "*.csv", }, }, }, diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index cdeb62a30473..a2b098ef8937 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -463,3 +463,85 @@ "default_inferred_data_connector_name" ] ) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_inferred_data_connector_name: + class_name: InferredAssetFilesystemDataConnector + base_directory: / + glob_directive: "*/*.csv" + default_regex: + group_names: + - data_asset_name + - year + - month + pattern: (.*)/.*(\d{4})-(\d{2})\.csv +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "/", "../data/nested_directories_data_asset/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_inferred_data_connector_name": { + "class_name": "InferredAssetFilesystemDataConnector", + "base_directory": "/", + "glob_directive": "*/*.csv", + "default_regex": { + "group_names": [ + "data_asset_name", + "year", + "month", + ], + "pattern": "(.*)/.*(\d{4})-(\d{2})\.csv", + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "base_directory" +] = "../data/nested_directories_data_asset/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +# NOTE: The following code is only for testing and can be ignored by users. +assert test_yaml == test_python + +context.add_datasource(**datasource_config) + +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) +assert "green_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) From cab32ec39898e4d7564fbe2ecd6ed95bff5ba04b Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Mon, 25 Oct 2021 11:27:03 -0400 Subject: [PATCH 38/62] RuntimeDataConnector under test --- ...onfigure_a_configuredassetdataconnector.md | 2 +- ...how_to_configure_a_runtimedataconnector.md | 157 ++++++++++++------ ...configure_an_inferredassetdataconnector.md | 2 +- 3 files changed, 105 insertions(+), 56 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index db2f01d70e92..3e4d87865d0e 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -101,7 +101,7 @@ context.test_yaml_config(yaml.dump(datasource_config)) If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) -### 3. Choose a DataConnector +### 3. Add a ConfiguredAssetDataConnector to a Datasource configuration ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require `DataAsset`s to be explicitly named. Each `DataAsset` can have their own regex `pattern` and `group_names`, and if configured, will override any diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md index b7cb718765db..693728860184 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md @@ -14,70 +14,119 @@ This guide demonstrates how to configure a RuntimeDataConnector and only applies A RuntimeDataConnector is a special kind of [Data Connector](../../reference/datasources.md) that enables you to use a RuntimeBatchRequest to provide a [Batch's](../../reference/datasources.md#batches) data directly at runtime. The RuntimeBatchRequest can wrap an in-memory dataframe, a filepath, or a SQL query, and must include batch identifiers that uniquely identify the data (e.g. a `run_id` from an AirFlow DAG run). The batch identifiers that must be passed in at runtime are specified in the RuntimeDataConnector's configuration. -Add a RuntimeDataConnector to a Datasource configuration ---------------------------------------------------------- +## Steps -The following example uses `test_yaml_config` and `sanitize_yaml_and_save_datasource` to add a new SQL Datasource to a project's `great_expectations.yml`. If you already have configured Datasources, you can add an additional RuntimeDataConnector configuration directly to your `great_expectations.yml`. +### 1. Instantiate your project's DataContext -:::note -Currently, RuntimeDataConnector cannot be used with Datasources of type SimpleSqlalchemyDatasource. -::: +Import these necessary packages and modules: -```python -import great_expectations as ge -from great_expectations.cli.datasource import sanitize_yaml_and_save_datasource + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L3-L4 +``` + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L1-L4 +``` + + + + +### 2. Set up a Datasource -context = ge.get_context() -config = f""" -name: my_sqlite_datasource +All of the examples below assume you’re testing configuration using something like: + + + + +```python +datasource_yaml = """ +name: taxi_datasource class_name: Datasource execution_engine: - class_name: SqlAlchemyExecutionEngine - connection_string: sqlite:///my_db_file + class_name: PandasExecutionEngine data_connectors: - my_runtime_data_connector: - class_name: RuntimeDataConnector - batch_identifiers: - - pipeline_stage_name - - airflow_run_id + default_runtime_data_connector_name: + """ -context.test_yaml_config( - yaml_config=config -) -sanitize_yaml_and_save_datasource(context, config, overwrite_existing=False) +context.test_yaml_config(yaml_config=datasource_config) ``` -At runtime, you would get a Validator from the Data Context as follows: + + ```python -validator = context.get_validator( - batch_request=RuntimeBatchRequest( - datasource_name="my_sqlite_datasource", - data_connector_name="my_runtime_data_connector", - data_asset_name="my_data_asset_name", - runtime_parameters={ - "query": "SELECT * FROM table_partitioned_by_date_column__A" - }, - batch_identifiers={ - "pipeline_stage_name": "core_processing", - "airflow_run_id": 1234567890, - }, - ), - expectation_suite=my_expectation_suite, -) - - # Simplified call to get_validator - RuntimeBatchRequest is inferred under the hood - validator = context.get_validator( - datasource_name="my_sqlite_datasource", - data_connector_name="my_runtime_data_connector", - data_asset_name="my_data_asset_name", - runtime_parameters={ - "query": "SELECT * FROM table_partitioned_by_date_column__A" - }, - batch_identifiers={ - "pipeline_stage_name": "core_processing", - "airflow_run_id": 1234567890, - }, - expectation_suite=my_expectation_suite, - ) +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_runtime_data_connector_name": { + + }, + }, +} +context.test_yaml_config(yaml.dump(datasource_config)) +``` + + + + +If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) + +### 3. Add a RuntimeDataConnector to a Datasource configuration + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L9-L21 +``` + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L26-L40 +``` + + + + +Once the RuntimeDataConnector is configured you can add your datasource using: + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L48-L48 +``` + +At runtime, you would get a Validator from the Data Context by first defining a `RuntimeBatchRequest`: + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L50-L57 +``` + +and then passing that request into `context.get_validator`: + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L64-L72 ``` diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index afbf9512c57a..21bca941a28d 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -104,7 +104,7 @@ context.test_yaml_config(yaml.dump(datasource_config)) If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) -### 3. Choose a DataConnector +### 3. Add an InferredAssetDataConnector to a Datasource configuration InferredAssetDataConnectors like `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector` require a `default_regex` parameter, with a configured regex `pattern` and capture `group_names`. From 88dcc153abe6e13a4633806a7cd5166433d88fa8 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Mon, 25 Oct 2021 12:00:14 -0400 Subject: [PATCH 39/62] Clean up --- ...onfigure_a_configuredassetdataconnector.md | 48 +++++++++---------- ...how_to_configure_a_runtimedataconnector.md | 16 ++++--- ...configure_an_inferredassetdataconnector.md | 48 +++++++++---------- ...onfigure_a_configuredassetdataconnector.py | 6 +-- ...how_to_configure_a_runtimedataconnector.py | 6 +-- 5 files changed, 61 insertions(+), 63 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 3e4d87865d0e..7f3075ec3cb2 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -14,10 +14,10 @@ This guide demonstrates how to configure a ConfiguredAssetDataConnector, and pro -Great Expectations provides two `DataConnector` classes for connecting to `DataAsset`s stored as file-system-like data. This includes files on disk, +Great Expectations provides two `DataConnector` classes for connecting to Data Assets stored as file-system-like data. This includes files on disk, but also S3 object stores, etc: -- A ConfiguredAssetDataConnector allows you to specify that you have multiple `DataAsset`s in a `Datasource`, but also requires an explicit listing of each `DataAsset` you want to connect to. This allows more fine-tuning, but also requires more setup. +- A ConfiguredAssetDataConnector allows you to specify that you have multiple Data Assets in a `Datasource`, but also requires an explicit listing of each Data Asset you want to connect to. This allows more fine-tuning, but also requires more setup. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md). @@ -30,7 +30,7 @@ Import these necessary packages and modules: : """ context.test_yaml_config(yaml_config=datasource_config) @@ -88,8 +88,8 @@ datasource_config = { "class_name": "PandasExecutionEngine", }, "data_connectors": { - "default_configured_data_connector_name": { - + "": { + "" }, }, } @@ -103,8 +103,8 @@ If you’re not familiar with the `test_yaml_config` method, please check out: [ ### 3. Add a ConfiguredAssetDataConnector to a Datasource configuration -ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require `DataAsset`s to be -explicitly named. Each `DataAsset` can have their own regex `pattern` and `group_names`, and if configured, will override any +ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require Data Assets to be +explicitly named. Each Data Asset can have their own regex `pattern` and `group_names`, and if configured, will override any `pattern` or `group_names` under `default_regex`. Imagine you have the following files in `my_directory/`: @@ -115,12 +115,12 @@ Imagine you have the following files in `my_directory/`: /yellow_tripdata_2019-03.csv ``` -We could create a DataAsset `yellow_tripdata` that contains 3 data_references (`yellow_tripdata_2019-01.csv`, `yellow_tripdata_2019-02.csv`, and `yellow_tripdata_2019-03.csv`). +We could create a Data Asset `yellow_tripdata` that contains 3 data_references (`yellow_tripdata_2019-01.csv`, `yellow_tripdata_2019-02.csv`, and `yellow_tripdata_2019-03.csv`). In that case, the configuration would look like the following: -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L34-L54 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L36-L56 ``` @@ -141,21 +141,21 @@ In that case, the configuration would look like the following: Notice that we have specified a pattern that captures the year-month combination after `yellow_tripdata_` in the filename and assigns it to the `group_name` `month`. -The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the `DataAsset`. +The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. Later on we could retrieve the data in `yellow_tripdata_2019-02.csv` of `yellow_tripdata` as its own batch using `context.get_validator()` by specifying `{"month": "2019-02"}` as the `batch_identifier`. ```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L72-L87 ``` -This ability to access specific Batches using `batch_identifiers` is very useful when validating `DataAsset`s that span multiple files. +This ability to access specific Batches using `batch_identifiers` is very useful when validating Data Assets that span multiple files. For more information on `batches` and `batch_identifiers`, please refer to the [Core Concepts document](../../reference/dividing_data_assets_into_batches.md). A corresponding configuration for `ConfiguredAssetS3DataConnector` would look similar but would require `bucket` and `prefix` values instead of `base_directory`. `: @@ -190,7 +190,7 @@ Then this configuration... -...will make available `yelow_tripdata` as a single DataAsset with the following data_references: +...will make available `yelow_tripdata` as a single Data Asset with the following data_references: ```bash Available data_asset_names (1 of 1): @@ -229,7 +229,7 @@ Then this configuration... -...will now make `yellow_tripdata` and `green_tripdata` available a DataAssets, with the following data_references: +...will now make `yellow_tripdata` and `green_tripdata` available a Data Assets, with the following data_references: ```bash Available data_asset_names (2 of 2): @@ -371,7 +371,7 @@ The following configuration... : """ context.test_yaml_config(yaml_config=datasource_config) @@ -80,8 +82,8 @@ datasource_config = { "class_name": "PandasExecutionEngine", }, "data_connectors": { - "default_runtime_data_connector_name": { - + "": { + "" }, }, } @@ -97,7 +99,7 @@ If you’re not familiar with the `test_yaml_config` method, please check out: [ -Great Expectations provides two types of `DataConnector` classes for connecting to `DataAsset`s stored as file-system-like data. This includes files on disk, +Great Expectations provides two types of `DataConnector` classes for connecting to Data Assets stored as file-system-like data. This includes files on disk, but also S3 object stores, etc: -- A ConfiguredAssetDataConnector requires an explicit listing of each `DataAsset` you want to connect to. This allows more fine-tuning, but also requires more setup. +- A ConfiguredAssetDataConnector requires an explicit listing of each Data Asset you want to connect to. This allows more fine-tuning, but also requires more setup. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. -InferredAssetDataConnector has fewer options, so it's simpler to set up. It’s a good choice if you want to connect to a single `DataAsset`, or several `DataAssets` that all share the same naming convention. +InferredAssetDataConnector has fewer options, so it's simpler to set up. It’s a good choice if you want to connect to a single Data Asset, or several `Data Assets` that all share the same naming convention. If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md). @@ -33,7 +33,7 @@ Import these necessary packages and modules: : """ context.test_yaml_config(yaml_config=datasource_config) @@ -91,8 +91,8 @@ datasource_config = { "class_name": "PandasExecutionEngine", }, "data_connectors": { - "default_inferred_data_connector_name": { - + "": { + "" }, }, } @@ -119,11 +119,11 @@ Imagine you have the following files in `my_directory/`: We can imagine two approaches to loading the data into GE. -The simplest approach would be to consider each file to be its own DataAsset. In that case, the configuration would look like the following: +The simplest approach would be to consider each file to be its own Data Asset. In that case, the configuration would look like the following: -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L33-L51 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L35-L53 ``` Notice that the `default_regex` is configured to have one capture group (`(.*)`) which captures the entire filename. That capture group is assigned to `data_asset_name` under `group_names`. -Running `test_yaml_config()` would result in 3 DataAssets : `yellow_tripdata_2019-01`, `yellow_tripdata_2019-02` and `yellow_tripdata_2019-03`. +Running `test_yaml_config()` would result in 3 Data Assets : `yellow_tripdata_2019-01`, `yellow_tripdata_2019-02` and `yellow_tripdata_2019-03`. However, a closer look at the filenames reveals a pattern that is common to the 3 files. Each have `yellow_tripdata_` in the name, and have date information afterwards. These are the types of patterns that InferredAssetDataConnectors allow you to take advantage of. -We could treat `yellow_tripdata_*.csv` files as batches within the `yellow_tripdata` `DataAsset` with a more specific regex `pattern` and adding `group_names` for `year` and `month`. +We could treat `yellow_tripdata_*.csv` files as batches within the `yellow_tripdata` Data Asset with a more specific regex `pattern` and adding `group_names` for `year` and `month`. **Note: ** We have chosen to be more specific in the capture groups for the `year` and `month` by specifying the integer value (using `\d`) and the number of digits, but a simpler capture group like `(.*)` would also work. -Running `test_yaml_config()` would result in 1 DataAsset `yellow_tripdata` with 3 associated data_references: `yellow_tripdata_2019-01.csv`, `yellow_tripdata_2019-02.csv` and `yellow_tripdata_2019-03.csv`, seen also in Example 1 below. +Running `test_yaml_config()` would result in 1 Data Asset `yellow_tripdata` with 3 associated data_references: `yellow_tripdata_2019-01.csv`, `yellow_tripdata_2019-02.csv` and `yellow_tripdata_2019-03.csv`, seen also in Example 1 below. A corresponding configuration for `InferredAssetS3DataConnector` would look similar but would require `bucket` and `prefix` values instead of `base_directory`. `: @@ -214,7 +214,7 @@ Then this configuration... -...will make available `yelow_tripdata` as a single DataAsset with the following data_references: +...will make available `yelow_tripdata` as a single Data Asset with the following data_references: ```bash Available data_asset_names (1 of 1): @@ -247,7 +247,7 @@ Once configured, you can get `Validators` from the `Data Context` as follows: ```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L294-L303 ``` -### Example 2: Basic configuration with more than one DataAsset +### Example 2: Basic configuration with more than one Data Asset Here’s a similar example, but this time two Data Assets are mixed together in one folder. @@ -267,7 +267,7 @@ The same configuration as Example 1... Date: Mon, 25 Oct 2021 12:43:51 -0400 Subject: [PATCH 40/62] How to choose under test --- ...ow_to_choose_which_dataconnector_to_use.md | 194 ++++++++---------- ...onfigure_a_configuredassetdataconnector.md | 3 +- ...configure_an_inferredassetdataconnector.md | 3 +- ...ow_to_choose_which_dataconnector_to_use.py | 175 ++++++++++++++++ tests/integration/test_script_runner.py | 8 +- 5 files changed, 263 insertions(+), 120 deletions(-) create mode 100644 tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index 02f67f7d5571..4a9c1c06317e 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -1,11 +1,23 @@ --- title: How to choose which DataConnector to use --- +import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx' +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; -Great Expectations provides three types of `DataConnector` classes. Two classes are for connecting to `DataAsset`s stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data: +This guide demonstrates how to choose which `DataConnector`s to configure within your `Datasource`s. + + + +- [Understand the basics of Datasources in the V3 (Batch Request) API](../../reference/datasources.md) +- Learned how to configure a [Data Context using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md) + + + +Great Expectations provides three types of `DataConnector` classes. Two classes are for connecting to Data Assets stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data: - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. Examples of this type of `DataConnector` include `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector`. -- A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each `DataAsset` you want to connect to. Examples of this type of `DataConnector` include `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector`. +- A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each Data Asset you want to connect to. Examples of this type of `DataConnector` include `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector`. The third type of `DataConnector` class is for providing a batch's data directly at runtime: @@ -15,149 +27,103 @@ If you know for example, that your Pipeline Runner will already have your batch If you aren't sure which type of the remaining `DataConnector`s to use, the following examples will use `DataConnector` classes designed to connect to files on disk, namely `InferredAssetFilesystemDataConnector` and `ConfiguredAssetFilesystemDataConnector` to demonstrate the difference between these types of `DataConnectors`. ------------------------------------------- -When to use an InferredAssetDataConnector ------------------------------------------- +### When to use an InferredAssetDataConnector -If you have the following `my_data/` directory in your filesystem, and you want to treat the `A-*.csv` files as batches within the `A` DataAsset, and do the same for `B` and `C`: +If you have the following `/` directory in your filesystem, and you want to treat the `A-*.csv` files as batches within the `A` Data Asset, and do the same for `B` and `C`: ``` -my_data/A/A-1.csv -my_data/A/A-2.csv -my_data/A/A-3.csv -my_data/B/B-1.csv -my_data/B/B-2.csv -my_data/B/B-3.csv -my_data/C/C-1.csv -my_data/C/C-2.csv -my_data/C/C-3.csv +/yellow_tripdata/yellow_tripdata_2019-01.csv +/yellow_tripdata/yellow_tripdata_2019-02.csv +/yellow_tripdata/yellow_tripdata_2019-03.csv +/green_tripdata/2019-01.csv +/green_tripdata/2019-02.csv +/green_tripdata/2019-03.csv ``` This config... -```yaml -class_name: Datasource -data_connectors: - my_data_connector: - class_name: InferredAssetFilesystemDataConnector - base_directory: my_data/ - default_regex: - pattern: (.*)/.*-(\d+)\.csv - group_names: - - data_asset_name - - id + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py#L8-L26 +``` + + + + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py#L37-L60 ``` -...will make available the following DataAssets and data_references: + + + +...will make available the following Data Assets and data_references: ```bash -Available data_asset_names (3 of 3): - A (3 of 3): [ - 'A/A-1.csv', - 'A/A-2.csv', - 'A/A-3.csv' - ] - B (3 of 3): [ - 'B/B-1.csv', - 'B/B-2.csv', - 'B/B-3.csv' - ] - C (3 of 3): [ - 'C/C-1.csv', - 'C/C-2.csv', - 'C/C-3.csv' - ] - -Unmatched data_references (0 of 0): [] +Available data_asset_names (2 of 2): + green_tripdata (3 of 3): ['green_tripdata/*2019-01.csv', 'green_tripdata/*2019-02.csv', 'green_tripdata/*2019-03.csv'] + yellow_tripdata (3 of 3): ['yellow_tripdata/*2019-01.csv', 'yellow_tripdata/*2019-02.csv', 'yellow_tripdata/*2019-03.csv'] + +Unmatched data_references (0 of 0):[] ``` Note that the `InferredAssetFileSystemDataConnector` **infers** `data_asset_names` **from the regex you provide.** This is the key difference between InferredAssetDataConnector and ConfiguredAssetDataConnector, and also requires that one of the `group_names` in the `default_regex` configuration be `data_asset_name`. ------------------------------------------- -When to use a ConfiguredAssetDataConnector ------------------------------------------- +### When to use a ConfiguredAssetDataConnector -On the other hand, `ConfiguredAssetFilesSystemDataConnector` requires an explicit listing of each DataAsset you want to connect to. This tends to be helpful when the naming conventions for your DataAssets are less standardized. +On the other hand, `ConfiguredAssetFilesSystemDataConnector` requires an explicit listing of each Data Asset you want to connect to. This tends to be helpful when the naming conventions for your Data Assets are less standardized. -If you have the following `my_messier_data/` directory in your filesystem, +If you have the same `/` directory in your filesystem, ``` - my_messier_data/1/A-1.csv - my_messier_data/1/B-1.txt +/yellow_tripdata/yellow_tripdata_2019-01.csv +/yellow_tripdata/yellow_tripdata_2019-02.csv +/yellow_tripdata/yellow_tripdata_2019-03.csv +/green_tripdata/2019-01.csv +/green_tripdata/2019-02.csv +/green_tripdata/2019-03.csv +``` - my_messier_data/2/A-2.csv - my_messier_data/2/B-2.txt +Then this config... - my_messier_data/2017/C-1.csv - my_messier_data/2018/C-2.csv - my_messier_data/2019/C-3.csv + + - my_messier_data/aaa/D-1.csv - my_messier_data/bbb/D-2.csv - my_messier_data/ccc/D-3.csv +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py#L90-L114 ``` -Then this config... + + -```yaml -class_name: Datasource -execution_engine: - class_name: PandasExecutionEngine -data_connectors: - my_data_connector: - class_name: ConfiguredAssetFilesystemDataConnector - glob_directive: "*/*" - base_directory: my_messier_data/ - assets: - A: - pattern: (.+A)-(\d+)\.csv - group_names: - - name - - id - B: - pattern: (.+B)-(\d+)\.txt - group_names: - - name - - val - C: - pattern: (.+C)-(\d+)\.csv - group_names: - - name - - id - D: - pattern: (.+D)-(\d+)\.csv - group_names: - - name - - id +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py#L125-L151 ``` -...will make available the following DataAssets and data_references: + + + +...will make available the following Data Assets and data_references: ```bash -Available data_asset_names (4 of 4): - A (2 of 2): [ - '1/A-1.csv', - '2/A-2.csv' - ] - B (2 of 2): [ - '1/B-1.txt', - '2/B-2.txt' - ] - C (3 of 3): [ - '2017/C-1.csv', - '2018/C-2.csv', - '2019/C-3.csv' - ] - D (3 of 3): [ - 'aaa/D-1.csv', - 'bbb/D-2.csv', - 'ccc/D-3.csv' - ] +Available data_asset_names (2 of 2): + green_tripdata (3 of 3): ['2019-01.csv', '2019-02.csv', '2019-03.csv'] + yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] + +Unmatched data_references (0 of 0):[] ``` ----------------- -Additional Notes ----------------- +### Additional Notes - Additional examples and configurations for `ConfiguredAssetFilesystemDataConnector`s can be found here: [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_configuredassetdataconnector.md) - Additional examples and configurations for `InferredAssetFilesystemDataConnector`s can be found here: [How to configure an InferredAssetDataConnector](./how_to_configure_an_inferredassetdataconnector.md) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 7f3075ec3cb2..5dd1881b1561 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -14,8 +14,7 @@ This guide demonstrates how to configure a ConfiguredAssetDataConnector, and pro -Great Expectations provides two `DataConnector` classes for connecting to Data Assets stored as file-system-like data. This includes files on disk, -but also S3 object stores, etc: +Great Expectations provides two `DataConnector` classes for connecting to Data Assets stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data: - A ConfiguredAssetDataConnector allows you to specify that you have multiple Data Assets in a `Datasource`, but also requires an explicit listing of each Data Asset you want to connect to. This allows more fine-tuning, but also requires more setup. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 8b390aa0474e..fa1b02ca6f06 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -15,8 +15,7 @@ can use for configuration. -Great Expectations provides two types of `DataConnector` classes for connecting to Data Assets stored as file-system-like data. This includes files on disk, -but also S3 object stores, etc: +Great Expectations provides two types of `DataConnector` classes for connecting to Data Assets stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data: - A ConfiguredAssetDataConnector requires an explicit listing of each Data Asset you want to connect to. This allows more fine-tuning, but also requires more setup. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py new file mode 100644 index 000000000000..6ca0fd40f371 --- /dev/null +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py @@ -0,0 +1,175 @@ +from ruamel import yaml + +import great_expectations as ge + +context = ge.get_context() + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_inferred_data_connector_name: + class_name: InferredAssetFilesystemDataConnector + base_directory: / + glob_directive: "*/*.csv" + default_regex: + group_names: + - data_asset_name + - year + - month + pattern: (.*)/.*(\d{4})-(\d{2})\.csv +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "/", "../data/nested_directories_data_asset/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_inferred_data_connector_name": { + "class_name": "InferredAssetFilesystemDataConnector", + "base_directory": "/", + "glob_directive": "*/*.csv", + "default_regex": { + "group_names": [ + "data_asset_name", + "year", + "month", + ], + "pattern": "(.*)/.*(\d{4})-(\d{2})\.csv", + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_inferred_data_connector_name"][ + "base_directory" +] = "../data/nested_directories_data_asset/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +# NOTE: The following code is only for testing and can be ignored by users. +assert test_yaml == test_python + +context.add_datasource(**datasource_config) + +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) +assert "green_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_inferred_data_connector_name" + ] +) + +# YAML +datasource_yaml = """ +name: taxi_datasource +class_name: Datasource +module_name: great_expectations.datasource +execution_engine: + module_name: great_expectations.execution_engine + class_name: PandasExecutionEngine +data_connectors: + default_configured_data_connector_name: + class_name: ConfiguredAssetFilesystemDataConnector + base_directory: / + assets: + yellow_tripdata: + base_directory: yellow_tripdata/ + pattern: yellow_tripdata_(\d{4})-(\d{2})\.csv + group_names: + - year + - month + green_tripdata: + base_directory: green_tripdata/ + pattern: (\d{4})-(\d{2})\.csv + group_names: + - year + - month +""" + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the yaml above. +datasource_yaml = datasource_yaml.replace( + "/", "../data/nested_directories_data_asset/" +) + +test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object") + +# Python +datasource_config = { + "name": "taxi_datasource", + "class_name": "Datasource", + "module_name": "great_expectations.datasource", + "execution_engine": { + "module_name": "great_expectations.execution_engine", + "class_name": "PandasExecutionEngine", + }, + "data_connectors": { + "default_configured_data_connector_name": { + "class_name": "ConfiguredAssetFilesystemDataConnector", + "base_directory": "/", + "assets": { + "yellow_tripdata": { + "base_directory": "yellow_tripdata/", + "pattern": "yellow_tripdata_(\d{4})-(\d{2})\.csv", + "group_names": ["year", "month"], + }, + "green_tripdata": { + "base_directory": "green_tripdata/", + "pattern": "(\d{4})-(\d{2})\.csv", + "group_names": ["year", "month"], + }, + }, + }, + }, +} + +# Please note this override is only to provide good UX for docs and tests. +# In normal usage you'd set your path directly in the code above. +datasource_config["data_connectors"]["default_configured_data_connector_name"][ + "base_directory" +] = "../data/nested_directories_data_asset/" + +test_python = context.test_yaml_config( + yaml.dump(datasource_config), return_mode="report_object" +) + +# NOTE: The following code is only for testing and can be ignored by users. +assert test_yaml != test_python +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "yellow_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) +assert "green_tripdata" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_configured_data_connector_name" + ] +) diff --git a/tests/integration/test_script_runner.py b/tests/integration/test_script_runner.py index 45eda0176c53..d42375993420 100755 --- a/tests/integration/test_script_runner.py +++ b/tests/integration/test_script_runner.py @@ -298,19 +298,23 @@ class BackendDependencies(enum.Enum): "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", "extra_backend_dependencies": BackendDependencies.MSSQL, }, + { + "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py", + "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations", + "data_dir": "tests/test_sets/dataconnector_docs", + "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", + }, { "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py", "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations", "data_dir": "tests/test_sets/dataconnector_docs", "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", - "extra_backend_dependencies": BackendDependencies.POSTGRESQL, }, { "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py", "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations", "data_dir": "tests/test_sets/dataconnector_docs", "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py", - "extra_backend_dependencies": BackendDependencies.POSTGRESQL, }, { "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py", From e997160f9ee81750368e32a7a004f751db1ff56c Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Mon, 25 Oct 2021 12:53:39 -0400 Subject: [PATCH 41/62] Clean up --- .../how_to_choose_which_dataconnector_to_use.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index 4a9c1c06317e..26395f0f6412 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -29,7 +29,7 @@ If you aren't sure which type of the remaining `DataConnector`s to use, the foll ### When to use an InferredAssetDataConnector -If you have the following `/` directory in your filesystem, and you want to treat the `A-*.csv` files as batches within the `A` Data Asset, and do the same for `B` and `C`: +If you have the following `/` directory in your filesystem, and you want to treat the `yellow_tripdata_*.csv` files as batches within the `yellow_tripdata` Data Asset, and do the same for files in the `green_tripdata` directory: ``` /yellow_tripdata/yellow_tripdata_2019-01.csv From 87b4fc60f0bb6430063c3d877d4c6dc397afb725 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Mon, 25 Oct 2021 13:25:26 -0400 Subject: [PATCH 42/62] Link to capture group documentation --- .../how_to_configure_a_configuredassetdataconnector.md | 2 +- .../how_to_configure_an_inferredassetdataconnector.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 5dd1881b1561..1ffe7467aa34 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -140,7 +140,7 @@ In that case, the configuration would look like the following: Notice that we have specified a pattern that captures the year-month combination after `yellow_tripdata_` in the filename and assigns it to the `group_name` `month`. -The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. +The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. For more information about capture groups, refer to the Python documentation on [regular expressions](https://docs.python.org/3/library/re.html#re.Match.group) Later on we could retrieve the data in `yellow_tripdata_2019-02.csv` of `yellow_tripdata` as its own batch using `context.get_validator()` by specifying `{"month": "2019-02"}` as the `batch_identifier`. diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index fa1b02ca6f06..098ccecb9b0d 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -148,7 +148,7 @@ However, a closer look at the filenames reveals a pattern that is common to the We could treat `yellow_tripdata_*.csv` files as batches within the `yellow_tripdata` Data Asset with a more specific regex `pattern` and adding `group_names` for `year` and `month`. -**Note: ** We have chosen to be more specific in the capture groups for the `year` and `month` by specifying the integer value (using `\d`) and the number of digits, but a simpler capture group like `(.*)` would also work. +**Note: ** We have chosen to be more specific in the capture groups for the `year` and `month` by specifying the integer value (using `\d`) and the number of digits, but a simpler capture group like `(.*)` would also work. For more information about capture groups, refer to the Python documentation on [regular expressions](https://docs.python.org/3/library/re.html#re.Match.group). Date: Mon, 25 Oct 2021 13:52:39 -0400 Subject: [PATCH 43/62] Enable final test --- .../how_to_choose_which_dataconnector_to_use.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py index 6ca0fd40f371..1f35ad232ad2 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py @@ -161,7 +161,7 @@ ) # NOTE: The following code is only for testing and can be ignored by users. -assert test_yaml != test_python +assert test_yaml == test_python assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] assert "yellow_tripdata" in set( context.get_available_data_asset_names()["taxi_datasource"][ From 58c231dd678c42051a81798130d3ef8ae6d19c48 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Mon, 25 Oct 2021 15:14:38 -0400 Subject: [PATCH 44/62] Example RuntimeBatchRequest with batch_data df --- ...how_to_configure_a_runtimedataconnector.md | 35 ++++++++++++++---- ...how_to_configure_a_runtimedataconnector.py | 37 +++++++++++++++++-- 2 files changed, 61 insertions(+), 11 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md index 73d167b59bf8..46fb6991202b 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md @@ -31,13 +31,13 @@ Import these necessary packages and modules: ]}> -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L3-L4 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L4-L5 ``` -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L1-L4 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L2-L5 ``` @@ -97,6 +97,8 @@ If you’re not familiar with the `test_yaml_config` method, please check out: [ ### 3. Add a RuntimeDataConnector to a Datasource configuration +This basic configuration can be used in multiple ways depending on how the `RuntimeBatchRequest` is configured: + -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L9-L21 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L10-L22 ``` -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L26-L40 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L27-L41 ``` @@ -120,15 +122,34 @@ If you’re not familiar with the `test_yaml_config` method, please check out: [ Once the RuntimeDataConnector is configured you can add your datasource using: -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L48-L48 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L49-L49 ``` -At runtime, you would get a Validator from the Data Context by first defining a `RuntimeBatchRequest`: +#### Example 1: RuntimeDataConnector for access to file-system data: + +At runtime, you would get a Validator from the Data Context by first defining a `RuntimeBatchRequest` with the `path` to your data defined in `runtime_parameters`: ```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L50-L57 ``` -and then passing that request into `context.get_validator`: +Next, you would pass that request into `context.get_validator`: ```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L64-L68 ``` + +### Example 2: RuntimeDataConnector that uses an in-memory DataFrame + +At runtime, you would get a Validator from the Data Context by first defining a `RuntimeBatchRequest` with the DataFrame passed into `batch_data` in `runtime_parameters`: + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L1-L1 +``` +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L80-L80 +``` +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L83-L92 +``` + +Next, you would pass that request into `context.get_validator`: + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L94-L98 +``` + diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py index 3d27823adfb9..dae9671b17f3 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py @@ -1,3 +1,4 @@ +import pandas as pd from ruamel import yaml import great_expectations as ge @@ -50,9 +51,9 @@ batch_request = RuntimeBatchRequest( datasource_name="taxi_datasource", data_connector_name="default_runtime_data_connector_name", - data_asset_name="", # This can be anything that identifies this data_asset for you - runtime_parameters={"path": ""}, # Add your path here. - batch_identifiers={"default_identifier_name": ""}, + data_asset_name="", # This can be anything that identifies this data_asset for you + runtime_parameters={"path": ""}, # Add your path here. + batch_identifiers={"default_identifier_name": ""}, ) # Please note this override is only to provide good UX for docs and tests. @@ -70,7 +71,35 @@ # NOTE: The following code is only for testing and can be ignored by users. assert isinstance(validator, ge.validator.validator.Validator) assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] -assert "" in set( +assert "" in set( + context.get_available_data_asset_names()["taxi_datasource"][ + "default_runtime_data_connector_name" + ] +) + +path = "" +# Please note this override is only to provide good UX for docs and tests. +path = "./data/single_directory_one_data_asset/yellow_tripdata_2019-01.csv" +df = pd.read_csv(path) + +batch_request = RuntimeBatchRequest( + datasource_name="taxi_datasource", + data_connector_name="default_runtime_data_connector_name", + data_asset_name="", # This can be anything that identifies this data_asset for you + runtime_parameters={"batch_data": df}, # Pass your DataFrame here. + batch_identifiers={"default_identifier_name": ""}, +) + +validator = context.get_validator( + batch_request=batch_request, + expectation_suite_name="", +) +print(validator.head()) + +# NOTE: The following code is only for testing and can be ignored by users. +assert isinstance(validator, ge.validator.validator.Validator) +assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"] +assert "" in set( context.get_available_data_asset_names()["taxi_datasource"][ "default_runtime_data_connector_name" ] From edcec55af3f2ac9d39b75af59d48dc4bf7c30936 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Mon, 25 Oct 2021 15:22:42 -0400 Subject: [PATCH 45/62] Reducde test set record count from 20 to 5 --- .../green_tripdata/green_tripdata_2019-01.csv | 15 --------------- .../green_tripdata/green_tripdata_2019-02.csv | 15 --------------- .../green_tripdata/green_tripdata_2019-03.csv | 15 --------------- .../yellow/tripdata/yellow_tripdata_2019-01.txt | 15 --------------- .../yellow/tripdata/yellow_tripdata_2019-02.txt | 15 --------------- .../yellow/tripdata/yellow_tripdata_2019-03.txt | 15 --------------- .../green_tripdata/2019-01.csv | 15 --------------- .../green_tripdata/2019-02.csv | 15 --------------- .../green_tripdata/2019-03.csv | 15 --------------- .../yellow_tripdata/yellow_tripdata_2019-01.csv | 15 --------------- .../yellow_tripdata/yellow_tripdata_2019-02.csv | 15 --------------- .../yellow_tripdata/yellow_tripdata_2019-03.csv | 15 --------------- .../2018/10/green_tripdata.csv | 15 --------------- .../2018/10/yellow_tripdata.csv | 15 --------------- .../2018/11/green_tripdata.csv | 15 --------------- .../2018/11/yellow_tripdata.csv | 15 --------------- .../2018/12/green_tripdata.csv | 15 --------------- .../2018/12/yellow_tripdata.csv | 15 --------------- .../2019/01/green_tripdata.csv | 15 --------------- .../2019/01/yellow_tripdata.csv | 15 --------------- .../2019/02/green_tripdata.csv | 15 --------------- .../2019/02/yellow_tripdata.csv | 15 --------------- .../2019/03/green_tripdata.csv | 15 --------------- .../2019/03/yellow_tripdata.csv | 15 --------------- .../yellow_tripdata_2019-01.csv | 15 --------------- .../yellow_tripdata_2019-02.csv | 15 --------------- .../yellow_tripdata_2019-03.csv | 15 --------------- .../green_tripdata_2019-01.csv | 15 --------------- .../green_tripdata_2019-02.csv | 15 --------------- .../green_tripdata_2019-03.csv | 15 --------------- .../yellow_tripdata_2019-01.csv | 15 --------------- .../yellow_tripdata_2019-02.csv | 15 --------------- .../yellow_tripdata_2019-03.csv | 15 --------------- 33 files changed, 495 deletions(-) diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv index 07a92dc26d64..1168d9f9ef35 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv @@ -4,18 +4,3 @@ 517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1, 368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1, 155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1, -366694,2,2019-01-18 22:58:57,2019-01-18 23:15:37,N,1,75,185,1,8.71,24.5,0.5,0.5,0.0,0.0,,0.3,25.8,1,1, -474972,2,2019-01-24 17:59:00,2019-01-24 18:35:53,N,5,195,90,1,6.46,23.45,0.0,0.5,0.0,0.0,,0.0,23.95,1,2, -69932,1,2019-01-04 18:07:02,2019-01-04 18:34:37,N,1,97,35,2,4.7,20.0,1.0,0.5,0.0,0.0,,0.3,21.8,1,1, -244782,2,2019-01-13 02:16:18,2019-01-13 02:37:30,N,1,260,198,1,3.9,16.5,0.5,0.5,0.0,0.0,,0.3,17.8,2,1, -482363,2,2019-01-25 04:42:54,2019-01-25 04:47:06,N,1,82,56,1,0.74,5.0,0.5,0.5,0.0,0.0,,0.3,6.3,2,1, -573895,2,2019-01-29 11:36:48,2019-01-29 11:43:03,N,1,43,236,1,0.91,6.0,0.0,0.5,2.0,0.0,,0.3,8.8,1,1,0.0 -182986,2,2019-01-10 12:07:54,2019-01-10 12:19:31,N,1,29,155,1,5.49,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, -526993,1,2019-01-26 22:13:18,2019-01-26 22:13:51,N,1,33,33,1,0.1,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1, -490940,2,2019-01-25 13:09:50,2019-01-25 13:15:30,N,1,220,153,1,0.35,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,1,1, -145635,2,2019-01-08 16:01:58,2019-01-08 16:10:16,N,1,77,72,1,1.48,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,1,1, -242083,1,2019-01-12 23:43:28,2019-01-13 00:02:13,N,1,210,216,1,12.3,34.5,0.5,0.5,0.0,0.0,,0.3,35.8,2,1, -328924,2,2019-01-17 10:07:09,2019-01-17 10:25:59,N,1,32,242,1,4.71,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, -132260,1,2019-01-07 21:00:39,2019-01-07 21:18:13,N,1,52,188,1,3.6,14.5,0.5,0.5,0.0,0.0,,0.3,15.8,2,1, -568032,2,2019-01-29 07:01:03,2019-01-29 07:54:28,N,1,225,251,1,16.91,55.0,0.0,0.5,0.0,11.52,,0.3,67.32,1,1,0.0 -92205,2,2019-01-05 20:26:49,2019-01-05 20:44:48,N,1,29,108,1,3.93,15.5,0.5,0.5,0.0,0.0,,0.3,16.8,1,1, diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv index 9a6442e61899..b19b1f827f4f 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv @@ -4,18 +4,3 @@ 150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75 245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0 151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0 -18916,2,2019-02-01 19:54:28,2019-02-01 19:58:41,N,1,74,262,1,1.04,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,1,1,0.0 -110154,2,2019-02-06 08:33:30,2019-02-06 08:48:52,N,1,66,87,1,2.14,12.0,0.0,0.5,4.66,0.0,,0.3,20.21,1,1,2.75 -335114,2,2019-02-16 18:40:24,2019-02-16 18:46:47,N,1,74,75,1,0.89,6.0,0.0,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 -453776,2,2019-02-22 22:38:18,2019-02-22 22:49:20,N,1,116,41,1,1.86,10.0,0.5,0.5,2.26,0.0,,0.3,13.56,1,1,0.0 -192685,2,2019-02-09 19:56:13,2019-02-09 20:13:56,N,1,167,235,1,2.59,13.0,0.0,0.5,0.0,0.0,,0.3,13.8,1,1,0.0 -574982,2,2019-02-28 22:42:50,2019-02-28 22:47:40,N,1,42,116,1,0.91,5.5,0.5,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 -155855,2,2019-02-08 09:02:21,2019-02-08 10:17:23,N,5,210,161,1,20.74,59.86,0.0,0.5,0.0,0.0,,0.0,60.36,1,2,0.0 -130109,2,2019-02-07 05:18:25,2019-02-07 05:24:30,N,1,62,188,4,1.54,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,1,1,0.0 -157145,2,2019-02-08 10:20:47,2019-02-08 10:47:36,N,1,74,31,2,7.73,25.5,0.0,0.5,5.26,0.0,,0.3,31.56,1,1,0.0 -411748,2,2019-02-20 22:31:32,2019-02-20 22:37:11,N,1,129,129,4,0.9,5.5,0.5,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 -470538,2,2019-02-23 18:15:38,2019-02-23 18:27:05,N,1,126,213,1,2.21,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,1,1,0.0 -262705,2,2019-02-13 13:07:02,2019-02-13 13:19:50,N,1,186,249,1,1.5,9.5,0.0,0.5,2.56,0.0,,0.3,15.36,1,1,2.5 -163160,2,2019-02-08 15:39:26,2019-02-08 16:06:31,N,1,33,17,1,2.9,17.5,0.0,0.5,0.0,0.0,,0.3,18.3,2,1,0.0 -263174,2,2019-02-13 15:06:57,2019-02-13 15:28:26,N,1,165,123,5,1.66,13.5,0.0,0.5,0.0,0.0,,0.3,14.3,1,1,0.0 -451270,2,2019-02-22 21:00:04,2019-02-22 21:00:24,N,1,82,82,1,0.06,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv index 5104e10f24c5..c5f49aef4b67 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv @@ -4,18 +4,3 @@ 6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0 76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75 282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0 -439163,2,2019-03-23 12:07:55,2019-03-23 12:11:20,N,1,166,166,1,0.75,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1,0.0 -518619,2,2019-03-27 18:33:46,2019-03-27 18:40:59,N,1,7,223,1,0.59,6.0,1.0,0.5,1.95,0.0,,0.3,9.75,1,1,0.0 -385766,2,2019-03-20 18:19:37,2019-03-20 18:28:31,N,1,129,129,1,1.21,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,2,1,0.0 -131793,2,2019-03-07 18:40:23,2019-03-07 18:58:26,N,1,51,32,1,3.35,14.0,1.0,0.5,0.0,0.0,,0.3,15.8,1,1,0.0 -203472,2,2019-03-11 12:46:20,2019-03-11 13:20:08,N,1,119,254,3,6.47,27.5,0.0,0.5,0.0,0.0,,0.3,28.3,1,1,0.0 -399106,2,2019-03-21 13:10:44,2019-03-21 13:21:19,N,1,74,75,1,1.27,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 -425811,2,2019-03-22 19:00:23,2019-03-22 19:02:51,N,1,95,95,1,0.51,4.0,1.0,0.5,2.0,0.0,,0.3,7.8,1,1,0.0 -36345,2,2019-03-02 19:27:41,2019-03-02 19:33:52,N,1,7,179,1,0.86,6.0,0.0,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 -323601,2,2019-03-17 10:19:28,2019-03-17 10:22:59,N,1,97,49,1,0.68,4.5,0.0,0.5,1.06,0.0,,0.3,6.36,1,1,0.0 -246980,2,2019-03-13 17:03:07,2019-03-13 17:12:04,N,1,75,263,2,1.32,8.0,1.0,0.5,2.51,0.0,,0.3,15.06,1,1,2.75 -269737,1,2019-03-14 18:03:35,2019-03-14 18:10:01,N,1,66,65,1,0.6,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1,0.0 -145416,2,2019-03-08 12:21:02,2019-03-08 12:22:57,N,1,75,75,1,0.5,3.5,0.0,0.5,0.0,0.0,,0.3,4.3,2,1,0.0 -142333,2,2019-03-08 09:39:09,2019-03-08 10:03:48,N,5,215,16,1,7.14,21.62,0.0,0.5,0.0,0.0,,0.0,22.12,1,2,0.0 -381567,2,2019-03-20 15:09:47,2019-03-20 15:22:12,N,1,89,188,1,1.75,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 -476898,1,2019-03-25 13:10:35,2019-03-25 13:20:23,N,1,244,243,1,1.1,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt index 288e8ac8a023..14f45f9cf3bc 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt @@ -4,18 +4,3 @@ 2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0 5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95, 4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56, -1505,2,2019-01-29 17:58:08,2019-01-29 18:11:19,1,0.72,1,N,13,13,2,9.0,1.0,0.5,0.0,0.0,0.3,10.8,0.0 -8131,1,2019-01-13 16:22:03,2019-01-13 16:33:06,2,1.4,1,N,161,170,1,8.5,0.0,0.5,3.0,0.0,0.3,12.3, -9601,2,2019-01-23 12:07:55,2019-01-23 12:11:19,5,0.57,1,N,239,239,1,4.0,0.0,0.5,1.0,0.0,0.3,5.8,0.0 -4522,2,2019-01-25 15:42:39,2019-01-25 15:50:54,1,1.26,1,N,170,229,1,7.5,0.0,0.5,1.0,0.0,0.3,9.3,0.0 -9008,1,2019-01-13 19:40:27,2019-01-13 19:50:54,3,2.1,1,N,239,263,1,9.5,0.0,0.5,2.05,0.0,0.3,12.35, -7012,2,2019-01-19 20:44:35,2019-01-19 21:05:13,1,3.16,1,N,142,107,1,14.5,0.5,0.5,3.16,0.0,0.3,18.96, -7316,2,2019-01-07 22:38:10,2019-01-07 22:44:31,2,2.4,1,N,233,263,1,8.5,0.5,0.5,1.47,0.0,0.3,11.27, -6384,2,2019-01-25 07:35:59,2019-01-25 07:57:44,1,5.12,1,N,143,231,1,18.5,0.0,0.5,2.5,0.0,0.3,21.8,0.0 -3652,1,2019-01-26 20:43:57,2019-01-26 21:03:51,1,4.4,1,N,13,230,2,17.0,0.5,0.5,0.0,0.0,0.3,18.3,0.0 -6409,2,2019-01-06 22:27:16,2019-01-06 22:33:58,1,1.52,1,N,161,234,2,7.0,0.5,0.5,0.0,0.0,0.3,8.3, -1130,2,2019-01-15 20:05:59,2019-01-15 20:22:11,1,3.96,1,N,233,41,2,14.5,0.5,0.5,0.0,0.0,0.3,15.8, -5618,2,2019-01-14 14:47:18,2019-01-14 14:54:44,1,0.5,1,N,140,237,1,6.0,0.0,0.5,1.0,0.0,0.3,7.8, -8515,1,2019-01-16 16:26:17,2019-01-16 16:41:36,2,1.4,1,N,233,163,1,10.5,1.0,0.5,2.45,0.0,0.3,14.75, -5397,1,2019-01-23 08:31:58,2019-01-23 08:55:43,1,2.6,1,N,246,163,1,16.0,0.0,0.5,5.0,0.0,0.3,21.8,0.0 -4648,1,2019-01-30 19:59:30,2019-01-30 20:10:34,1,1.9,1,N,186,79,1,9.0,0.5,0.5,2.05,0.0,0.3,12.35,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt index 573017273621..dd5bcc804be2 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt @@ -4,18 +4,3 @@ 4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5 1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5 6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5 -1633,2,2019-02-09 12:14:12,2019-02-10 11:50:00,3,3.02,1,N,48,236,1,12.5,0.0,0.5,1.58,0.0,0.3,17.38,2.5 -4450,1,2019-02-05 08:56:21,2019-02-05 09:02:06,1,0.8,1,N,262,75,1,5.5,2.5,0.5,1.7,0.0,0.3,10.5,2.5 -8929,2,2019-02-13 11:44:00,2019-02-13 11:57:32,1,1.13,1,N,100,170,2,9.0,0.0,0.5,0.0,0.0,0.3,12.3,2.5 -6968,2,2019-02-27 20:52:49,2019-02-27 21:18:37,1,4.64,1,N,246,125,1,18.5,0.5,0.5,4.46,0.0,0.3,26.76,2.5 -7969,2,2019-02-01 11:12:35,2019-02-01 11:19:39,1,0.54,1,N,236,263,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8,0.0 -7695,1,2019-02-28 00:51:50,2019-02-28 00:55:37,1,1.3,1,N,263,74,2,6.0,0.5,0.5,0.0,0.0,0.3,7.3,0.0 -9904,1,2019-02-16 17:45:00,2019-02-16 17:51:09,1,1.3,1,N,170,162,1,6.5,2.5,0.5,1.95,0.0,0.3,11.75,2.5 -1580,2,2019-02-07 10:14:39,2019-02-07 10:51:38,1,17.49,2,N,186,132,2,52.0,0.0,0.5,0.0,5.76,0.3,61.06,2.5 -699,2,2019-02-18 11:01:16,2019-02-18 11:11:06,2,2.31,1,N,249,170,1,10.0,0.0,0.5,2.66,0.0,0.3,15.96,2.5 -5181,2,2019-02-28 22:39:19,2019-02-28 22:42:13,1,0.49,1,N,237,162,1,4.0,0.5,0.5,1.56,0.0,0.3,9.36,2.5 -2631,2,2019-02-01 00:55:00,2019-02-01 00:55:20,1,0.0,5,N,265,265,1,78.0,0.0,0.0,10.0,10.5,0.3,98.8,0.0 -4018,2,2019-02-11 16:59:03,2019-02-11 17:05:01,1,1.35,1,N,143,238,1,6.5,1.0,0.5,2.16,0.0,0.3,12.96,2.5 -7582,2,2019-02-08 17:29:00,2019-02-08 17:31:14,1,0.47,1,N,236,236,2,3.5,1.0,0.5,0.0,0.0,0.3,7.8,2.5 -1810,1,2019-02-17 13:10:12,2019-02-17 13:16:03,1,0.5,1,N,161,230,2,5.5,2.5,0.5,0.0,0.0,0.3,8.8,2.5 -5095,2,2019-02-09 14:30:05,2019-02-09 14:42:22,0,1.18,1,N,170,234,1,9.0,0.0,0.5,2.46,0.0,0.3,14.76,2.5 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt index 3d254ce261c2..085f0cc37265 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt +++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt @@ -4,18 +4,3 @@ 7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0 9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5 2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0 -5030,2,2019-03-13 20:22:28,2019-03-13 20:42:11,2,9.81,1,N,138,244,1,28.0,0.5,0.5,7.01,5.76,0.3,42.07,0.0 -7848,4,2019-03-21 13:22:08,2019-03-21 13:31:36,1,0.86,1,N,229,229,1,7.5,0.0,0.5,2.16,0.0,0.3,12.96,2.5 -167,1,2019-03-15 22:37:06,2019-03-15 22:38:34,1,0.0,1,N,264,145,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0 -7197,2,2019-03-14 20:38:43,2019-03-14 20:45:04,5,1.04,1,N,151,166,1,6.0,0.5,0.5,1.1,0.0,0.3,8.4,0.0 -568,1,2019-03-09 09:29:44,2019-03-09 09:40:36,0,2.7,1,N,24,142,1,10.5,2.5,0.5,2.75,0.0,0.3,16.55,2.5 -2198,2,2019-03-11 18:32:36,2019-03-11 18:48:27,1,2.02,1,N,234,162,2,11.5,1.0,0.5,0.0,0.0,0.3,15.8,2.5 -5249,1,2019-03-15 09:35:22,2019-03-15 09:46:33,1,2.4,1,N,140,161,1,10.0,2.5,0.5,1.33,0.0,0.3,14.63,2.5 -5531,1,2019-03-04 19:44:33,2019-03-04 20:14:34,0,15.8,1,N,132,145,1,43.5,1.0,0.5,11.3,0.0,0.3,56.6,0.0 -4943,1,2019-03-03 15:12:19,2019-03-03 15:16:06,1,1.2,1,N,237,236,1,5.5,2.5,0.5,1.1,0.0,0.3,9.9,2.5 -617,2,2019-03-20 14:48:47,2019-03-20 14:52:43,1,0.59,1,N,142,163,1,4.5,0.0,0.5,1.0,0.0,0.3,8.8,2.5 -7014,2,2019-03-03 23:56:36,2019-03-04 00:22:25,1,7.91,1,N,132,258,2,25.0,0.5,0.5,0.0,0.0,0.3,26.3,0.0 -2138,2,2019-03-23 14:47:31,2019-03-23 14:54:11,2,1.12,1,N,234,161,2,6.5,0.0,0.5,0.0,0.0,0.3,9.8,2.5 -5298,2,2019-03-20 15:47:08,2019-03-20 15:51:15,1,0.54,1,N,161,161,1,4.5,0.0,0.5,1.56,0.0,0.3,9.36,2.5 -9838,1,2019-03-13 20:42:59,2019-03-13 20:50:23,1,1.1,1,N,162,48,1,6.5,3.0,0.5,2.05,0.0,0.3,12.35,2.5 -2565,1,2019-03-13 10:28:36,2019-03-13 10:28:38,1,0.0,1,N,246,246,2,2.5,2.5,0.5,0.0,0.0,0.3,5.8,2.5 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-01.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-01.csv index 07a92dc26d64..1168d9f9ef35 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-01.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-01.csv @@ -4,18 +4,3 @@ 517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1, 368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1, 155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1, -366694,2,2019-01-18 22:58:57,2019-01-18 23:15:37,N,1,75,185,1,8.71,24.5,0.5,0.5,0.0,0.0,,0.3,25.8,1,1, -474972,2,2019-01-24 17:59:00,2019-01-24 18:35:53,N,5,195,90,1,6.46,23.45,0.0,0.5,0.0,0.0,,0.0,23.95,1,2, -69932,1,2019-01-04 18:07:02,2019-01-04 18:34:37,N,1,97,35,2,4.7,20.0,1.0,0.5,0.0,0.0,,0.3,21.8,1,1, -244782,2,2019-01-13 02:16:18,2019-01-13 02:37:30,N,1,260,198,1,3.9,16.5,0.5,0.5,0.0,0.0,,0.3,17.8,2,1, -482363,2,2019-01-25 04:42:54,2019-01-25 04:47:06,N,1,82,56,1,0.74,5.0,0.5,0.5,0.0,0.0,,0.3,6.3,2,1, -573895,2,2019-01-29 11:36:48,2019-01-29 11:43:03,N,1,43,236,1,0.91,6.0,0.0,0.5,2.0,0.0,,0.3,8.8,1,1,0.0 -182986,2,2019-01-10 12:07:54,2019-01-10 12:19:31,N,1,29,155,1,5.49,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, -526993,1,2019-01-26 22:13:18,2019-01-26 22:13:51,N,1,33,33,1,0.1,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1, -490940,2,2019-01-25 13:09:50,2019-01-25 13:15:30,N,1,220,153,1,0.35,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,1,1, -145635,2,2019-01-08 16:01:58,2019-01-08 16:10:16,N,1,77,72,1,1.48,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,1,1, -242083,1,2019-01-12 23:43:28,2019-01-13 00:02:13,N,1,210,216,1,12.3,34.5,0.5,0.5,0.0,0.0,,0.3,35.8,2,1, -328924,2,2019-01-17 10:07:09,2019-01-17 10:25:59,N,1,32,242,1,4.71,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, -132260,1,2019-01-07 21:00:39,2019-01-07 21:18:13,N,1,52,188,1,3.6,14.5,0.5,0.5,0.0,0.0,,0.3,15.8,2,1, -568032,2,2019-01-29 07:01:03,2019-01-29 07:54:28,N,1,225,251,1,16.91,55.0,0.0,0.5,0.0,11.52,,0.3,67.32,1,1,0.0 -92205,2,2019-01-05 20:26:49,2019-01-05 20:44:48,N,1,29,108,1,3.93,15.5,0.5,0.5,0.0,0.0,,0.3,16.8,1,1, diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-02.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-02.csv index 9a6442e61899..b19b1f827f4f 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-02.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-02.csv @@ -4,18 +4,3 @@ 150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75 245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0 151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0 -18916,2,2019-02-01 19:54:28,2019-02-01 19:58:41,N,1,74,262,1,1.04,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,1,1,0.0 -110154,2,2019-02-06 08:33:30,2019-02-06 08:48:52,N,1,66,87,1,2.14,12.0,0.0,0.5,4.66,0.0,,0.3,20.21,1,1,2.75 -335114,2,2019-02-16 18:40:24,2019-02-16 18:46:47,N,1,74,75,1,0.89,6.0,0.0,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 -453776,2,2019-02-22 22:38:18,2019-02-22 22:49:20,N,1,116,41,1,1.86,10.0,0.5,0.5,2.26,0.0,,0.3,13.56,1,1,0.0 -192685,2,2019-02-09 19:56:13,2019-02-09 20:13:56,N,1,167,235,1,2.59,13.0,0.0,0.5,0.0,0.0,,0.3,13.8,1,1,0.0 -574982,2,2019-02-28 22:42:50,2019-02-28 22:47:40,N,1,42,116,1,0.91,5.5,0.5,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 -155855,2,2019-02-08 09:02:21,2019-02-08 10:17:23,N,5,210,161,1,20.74,59.86,0.0,0.5,0.0,0.0,,0.0,60.36,1,2,0.0 -130109,2,2019-02-07 05:18:25,2019-02-07 05:24:30,N,1,62,188,4,1.54,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,1,1,0.0 -157145,2,2019-02-08 10:20:47,2019-02-08 10:47:36,N,1,74,31,2,7.73,25.5,0.0,0.5,5.26,0.0,,0.3,31.56,1,1,0.0 -411748,2,2019-02-20 22:31:32,2019-02-20 22:37:11,N,1,129,129,4,0.9,5.5,0.5,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 -470538,2,2019-02-23 18:15:38,2019-02-23 18:27:05,N,1,126,213,1,2.21,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,1,1,0.0 -262705,2,2019-02-13 13:07:02,2019-02-13 13:19:50,N,1,186,249,1,1.5,9.5,0.0,0.5,2.56,0.0,,0.3,15.36,1,1,2.5 -163160,2,2019-02-08 15:39:26,2019-02-08 16:06:31,N,1,33,17,1,2.9,17.5,0.0,0.5,0.0,0.0,,0.3,18.3,2,1,0.0 -263174,2,2019-02-13 15:06:57,2019-02-13 15:28:26,N,1,165,123,5,1.66,13.5,0.0,0.5,0.0,0.0,,0.3,14.3,1,1,0.0 -451270,2,2019-02-22 21:00:04,2019-02-22 21:00:24,N,1,82,82,1,0.06,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-03.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-03.csv index 5104e10f24c5..c5f49aef4b67 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-03.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-03.csv @@ -4,18 +4,3 @@ 6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0 76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75 282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0 -439163,2,2019-03-23 12:07:55,2019-03-23 12:11:20,N,1,166,166,1,0.75,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1,0.0 -518619,2,2019-03-27 18:33:46,2019-03-27 18:40:59,N,1,7,223,1,0.59,6.0,1.0,0.5,1.95,0.0,,0.3,9.75,1,1,0.0 -385766,2,2019-03-20 18:19:37,2019-03-20 18:28:31,N,1,129,129,1,1.21,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,2,1,0.0 -131793,2,2019-03-07 18:40:23,2019-03-07 18:58:26,N,1,51,32,1,3.35,14.0,1.0,0.5,0.0,0.0,,0.3,15.8,1,1,0.0 -203472,2,2019-03-11 12:46:20,2019-03-11 13:20:08,N,1,119,254,3,6.47,27.5,0.0,0.5,0.0,0.0,,0.3,28.3,1,1,0.0 -399106,2,2019-03-21 13:10:44,2019-03-21 13:21:19,N,1,74,75,1,1.27,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 -425811,2,2019-03-22 19:00:23,2019-03-22 19:02:51,N,1,95,95,1,0.51,4.0,1.0,0.5,2.0,0.0,,0.3,7.8,1,1,0.0 -36345,2,2019-03-02 19:27:41,2019-03-02 19:33:52,N,1,7,179,1,0.86,6.0,0.0,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 -323601,2,2019-03-17 10:19:28,2019-03-17 10:22:59,N,1,97,49,1,0.68,4.5,0.0,0.5,1.06,0.0,,0.3,6.36,1,1,0.0 -246980,2,2019-03-13 17:03:07,2019-03-13 17:12:04,N,1,75,263,2,1.32,8.0,1.0,0.5,2.51,0.0,,0.3,15.06,1,1,2.75 -269737,1,2019-03-14 18:03:35,2019-03-14 18:10:01,N,1,66,65,1,0.6,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1,0.0 -145416,2,2019-03-08 12:21:02,2019-03-08 12:22:57,N,1,75,75,1,0.5,3.5,0.0,0.5,0.0,0.0,,0.3,4.3,2,1,0.0 -142333,2,2019-03-08 09:39:09,2019-03-08 10:03:48,N,5,215,16,1,7.14,21.62,0.0,0.5,0.0,0.0,,0.0,22.12,1,2,0.0 -381567,2,2019-03-20 15:09:47,2019-03-20 15:22:12,N,1,89,188,1,1.75,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 -476898,1,2019-03-25 13:10:35,2019-03-25 13:20:23,N,1,244,243,1,1.1,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-01.csv index 288e8ac8a023..14f45f9cf3bc 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-01.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-01.csv @@ -4,18 +4,3 @@ 2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0 5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95, 4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56, -1505,2,2019-01-29 17:58:08,2019-01-29 18:11:19,1,0.72,1,N,13,13,2,9.0,1.0,0.5,0.0,0.0,0.3,10.8,0.0 -8131,1,2019-01-13 16:22:03,2019-01-13 16:33:06,2,1.4,1,N,161,170,1,8.5,0.0,0.5,3.0,0.0,0.3,12.3, -9601,2,2019-01-23 12:07:55,2019-01-23 12:11:19,5,0.57,1,N,239,239,1,4.0,0.0,0.5,1.0,0.0,0.3,5.8,0.0 -4522,2,2019-01-25 15:42:39,2019-01-25 15:50:54,1,1.26,1,N,170,229,1,7.5,0.0,0.5,1.0,0.0,0.3,9.3,0.0 -9008,1,2019-01-13 19:40:27,2019-01-13 19:50:54,3,2.1,1,N,239,263,1,9.5,0.0,0.5,2.05,0.0,0.3,12.35, -7012,2,2019-01-19 20:44:35,2019-01-19 21:05:13,1,3.16,1,N,142,107,1,14.5,0.5,0.5,3.16,0.0,0.3,18.96, -7316,2,2019-01-07 22:38:10,2019-01-07 22:44:31,2,2.4,1,N,233,263,1,8.5,0.5,0.5,1.47,0.0,0.3,11.27, -6384,2,2019-01-25 07:35:59,2019-01-25 07:57:44,1,5.12,1,N,143,231,1,18.5,0.0,0.5,2.5,0.0,0.3,21.8,0.0 -3652,1,2019-01-26 20:43:57,2019-01-26 21:03:51,1,4.4,1,N,13,230,2,17.0,0.5,0.5,0.0,0.0,0.3,18.3,0.0 -6409,2,2019-01-06 22:27:16,2019-01-06 22:33:58,1,1.52,1,N,161,234,2,7.0,0.5,0.5,0.0,0.0,0.3,8.3, -1130,2,2019-01-15 20:05:59,2019-01-15 20:22:11,1,3.96,1,N,233,41,2,14.5,0.5,0.5,0.0,0.0,0.3,15.8, -5618,2,2019-01-14 14:47:18,2019-01-14 14:54:44,1,0.5,1,N,140,237,1,6.0,0.0,0.5,1.0,0.0,0.3,7.8, -8515,1,2019-01-16 16:26:17,2019-01-16 16:41:36,2,1.4,1,N,233,163,1,10.5,1.0,0.5,2.45,0.0,0.3,14.75, -5397,1,2019-01-23 08:31:58,2019-01-23 08:55:43,1,2.6,1,N,246,163,1,16.0,0.0,0.5,5.0,0.0,0.3,21.8,0.0 -4648,1,2019-01-30 19:59:30,2019-01-30 20:10:34,1,1.9,1,N,186,79,1,9.0,0.5,0.5,2.05,0.0,0.3,12.35,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-02.csv index 573017273621..dd5bcc804be2 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-02.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-02.csv @@ -4,18 +4,3 @@ 4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5 1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5 6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5 -1633,2,2019-02-09 12:14:12,2019-02-10 11:50:00,3,3.02,1,N,48,236,1,12.5,0.0,0.5,1.58,0.0,0.3,17.38,2.5 -4450,1,2019-02-05 08:56:21,2019-02-05 09:02:06,1,0.8,1,N,262,75,1,5.5,2.5,0.5,1.7,0.0,0.3,10.5,2.5 -8929,2,2019-02-13 11:44:00,2019-02-13 11:57:32,1,1.13,1,N,100,170,2,9.0,0.0,0.5,0.0,0.0,0.3,12.3,2.5 -6968,2,2019-02-27 20:52:49,2019-02-27 21:18:37,1,4.64,1,N,246,125,1,18.5,0.5,0.5,4.46,0.0,0.3,26.76,2.5 -7969,2,2019-02-01 11:12:35,2019-02-01 11:19:39,1,0.54,1,N,236,263,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8,0.0 -7695,1,2019-02-28 00:51:50,2019-02-28 00:55:37,1,1.3,1,N,263,74,2,6.0,0.5,0.5,0.0,0.0,0.3,7.3,0.0 -9904,1,2019-02-16 17:45:00,2019-02-16 17:51:09,1,1.3,1,N,170,162,1,6.5,2.5,0.5,1.95,0.0,0.3,11.75,2.5 -1580,2,2019-02-07 10:14:39,2019-02-07 10:51:38,1,17.49,2,N,186,132,2,52.0,0.0,0.5,0.0,5.76,0.3,61.06,2.5 -699,2,2019-02-18 11:01:16,2019-02-18 11:11:06,2,2.31,1,N,249,170,1,10.0,0.0,0.5,2.66,0.0,0.3,15.96,2.5 -5181,2,2019-02-28 22:39:19,2019-02-28 22:42:13,1,0.49,1,N,237,162,1,4.0,0.5,0.5,1.56,0.0,0.3,9.36,2.5 -2631,2,2019-02-01 00:55:00,2019-02-01 00:55:20,1,0.0,5,N,265,265,1,78.0,0.0,0.0,10.0,10.5,0.3,98.8,0.0 -4018,2,2019-02-11 16:59:03,2019-02-11 17:05:01,1,1.35,1,N,143,238,1,6.5,1.0,0.5,2.16,0.0,0.3,12.96,2.5 -7582,2,2019-02-08 17:29:00,2019-02-08 17:31:14,1,0.47,1,N,236,236,2,3.5,1.0,0.5,0.0,0.0,0.3,7.8,2.5 -1810,1,2019-02-17 13:10:12,2019-02-17 13:16:03,1,0.5,1,N,161,230,2,5.5,2.5,0.5,0.0,0.0,0.3,8.8,2.5 -5095,2,2019-02-09 14:30:05,2019-02-09 14:42:22,0,1.18,1,N,170,234,1,9.0,0.0,0.5,2.46,0.0,0.3,14.76,2.5 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-03.csv index 3d254ce261c2..085f0cc37265 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-03.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-03.csv @@ -4,18 +4,3 @@ 7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0 9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5 2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0 -5030,2,2019-03-13 20:22:28,2019-03-13 20:42:11,2,9.81,1,N,138,244,1,28.0,0.5,0.5,7.01,5.76,0.3,42.07,0.0 -7848,4,2019-03-21 13:22:08,2019-03-21 13:31:36,1,0.86,1,N,229,229,1,7.5,0.0,0.5,2.16,0.0,0.3,12.96,2.5 -167,1,2019-03-15 22:37:06,2019-03-15 22:38:34,1,0.0,1,N,264,145,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0 -7197,2,2019-03-14 20:38:43,2019-03-14 20:45:04,5,1.04,1,N,151,166,1,6.0,0.5,0.5,1.1,0.0,0.3,8.4,0.0 -568,1,2019-03-09 09:29:44,2019-03-09 09:40:36,0,2.7,1,N,24,142,1,10.5,2.5,0.5,2.75,0.0,0.3,16.55,2.5 -2198,2,2019-03-11 18:32:36,2019-03-11 18:48:27,1,2.02,1,N,234,162,2,11.5,1.0,0.5,0.0,0.0,0.3,15.8,2.5 -5249,1,2019-03-15 09:35:22,2019-03-15 09:46:33,1,2.4,1,N,140,161,1,10.0,2.5,0.5,1.33,0.0,0.3,14.63,2.5 -5531,1,2019-03-04 19:44:33,2019-03-04 20:14:34,0,15.8,1,N,132,145,1,43.5,1.0,0.5,11.3,0.0,0.3,56.6,0.0 -4943,1,2019-03-03 15:12:19,2019-03-03 15:16:06,1,1.2,1,N,237,236,1,5.5,2.5,0.5,1.1,0.0,0.3,9.9,2.5 -617,2,2019-03-20 14:48:47,2019-03-20 14:52:43,1,0.59,1,N,142,163,1,4.5,0.0,0.5,1.0,0.0,0.3,8.8,2.5 -7014,2,2019-03-03 23:56:36,2019-03-04 00:22:25,1,7.91,1,N,132,258,2,25.0,0.5,0.5,0.0,0.0,0.3,26.3,0.0 -2138,2,2019-03-23 14:47:31,2019-03-23 14:54:11,2,1.12,1,N,234,161,2,6.5,0.0,0.5,0.0,0.0,0.3,9.8,2.5 -5298,2,2019-03-20 15:47:08,2019-03-20 15:51:15,1,0.54,1,N,161,161,1,4.5,0.0,0.5,1.56,0.0,0.3,9.36,2.5 -9838,1,2019-03-13 20:42:59,2019-03-13 20:50:23,1,1.1,1,N,162,48,1,6.5,3.0,0.5,2.05,0.0,0.3,12.35,2.5 -2565,1,2019-03-13 10:28:36,2019-03-13 10:28:38,1,0.0,1,N,246,246,2,2.5,2.5,0.5,0.0,0.0,0.3,5.8,2.5 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/green_tripdata.csv index 2b518c3e4a83..870ec360c1ba 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/green_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/green_tripdata.csv @@ -4,18 +4,3 @@ 520094,2,2018-10-23 17:39:25,2018-10-23 18:42:28,N,1,75,133,1,14.6,48.5,1.0,0.5,11.21,5.76,,0.3,67.27,1,1.0 465246,2,2018-10-21 03:41:56,2018-10-21 03:45:23,N,1,95,95,2,0.86,5.0,0.5,0.5,1.0,0.0,,0.3,7.3,1,1.0 652784,2,2018-10-29 12:03:25,2018-10-29 12:14:19,N,1,18,31,1,2.24,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,2,1.0 -234842,2,2018-10-11 15:24:48,2018-10-11 15:47:51,N,1,197,95,1,3.42,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1.0 -199443,2,2018-10-09 22:19:35,2018-10-09 22:23:38,N,1,41,42,1,0.63,5.0,0.5,0.5,1.26,0.0,,0.3,7.56,1,1.0 -478271,2,2018-10-21 18:16:09,2018-10-21 18:25:49,N,1,74,263,1,2.25,9.0,0.0,0.5,1.96,0.0,,0.3,11.76,1,1.0 -480009,2,2018-10-21 19:32:30,2018-10-21 19:54:38,N,1,95,260,2,4.31,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 -621419,2,2018-10-27 22:52:50,2018-10-27 22:55:15,N,1,136,136,1,0.01,3.5,0.5,0.5,0.0,0.0,,0.3,4.8,2,1.0 -60768,2,2018-10-03 19:47:13,2018-10-03 19:50:39,N,1,256,255,1,0.52,4.0,1.0,0.5,1.45,0.0,,0.3,7.25,1,1.0 -559361,2,2018-10-25 12:04:52,2018-10-25 12:39:25,N,5,65,76,1,5.8,21.86,0.0,0.5,0.0,0.0,,0.0,22.36,1,2.0 -226070,2,2018-10-11 08:52:56,2018-10-11 09:22:58,N,1,166,163,1,3.73,20.0,0.0,0.5,4.16,0.0,,0.3,24.96,1,1.0 -578687,2,2018-10-26 08:44:43,2018-10-26 09:20:34,N,1,49,114,5,3.58,23.0,0.0,0.5,4.76,0.0,,0.3,30.51,1,1.0 -133625,2,2018-10-06 19:59:11,2018-10-06 20:38:07,N,1,181,100,2,9.41,34.5,0.0,0.5,10.26,5.76,,0.3,51.32,1,1.0 -118040,2,2018-10-06 03:38:45,2018-10-07 03:18:41,N,1,256,256,1,0.65,4.5,0.5,0.5,0.0,0.0,,0.3,5.8,2,1.0 -199254,2,2018-10-09 22:23:05,2018-10-09 22:33:00,N,1,181,97,1,1.53,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 -508650,2,2018-10-23 08:21:15,2018-10-23 08:40:55,N,1,166,142,5,2.41,14.0,0.0,0.5,1.0,0.0,,0.3,15.8,1,1.0 -597703,2,2018-10-26 21:14:29,2018-10-26 21:22:49,N,1,41,239,5,1.8,8.5,0.5,0.5,0.0,0.0,,0.3,9.8,2,1.0 -472658,2,2018-10-21 13:36:13,2018-10-21 13:41:32,N,1,7,179,1,0.97,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/yellow_tripdata.csv index 0ce6520c5822..61bb9b8b3f00 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/yellow_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/yellow_tripdata.csv @@ -4,18 +4,3 @@ 9267,1,2018-10-16 07:10:28,2018-10-16 07:19:21,1,1.4,1,N,142,236,1,8.0,0.0,0.5,0.88,0.0,0.3,9.68, 4342,1,2018-10-10 10:19:16,2018-10-10 10:33:52,1,2.1,1,N,142,50,2,11.5,0.0,0.5,0.0,0.0,0.3,12.3, 8099,2,2018-10-01 19:18:44,2018-10-01 19:31:48,1,2.51,1,N,140,239,1,11.0,1.0,0.5,3.2,0.0,0.3,16.0, -8972,2,2018-10-19 01:40:24,2018-10-19 01:51:42,2,2.54,1,N,249,164,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, -240,1,2018-10-11 19:54:10,2018-10-11 20:19:34,1,5.2,1,N,231,232,1,22.0,1.0,0.5,4.76,0.0,0.3,28.56, -7844,2,2018-10-30 08:58:08,2018-10-30 09:05:09,1,1.08,1,N,249,211,1,6.5,0.0,0.5,1.0,0.0,0.3,8.3, -7024,1,2018-10-11 04:56:32,2018-10-11 05:14:29,1,8.0,1,N,263,138,1,25.0,0.5,0.5,8.0,5.76,0.3,40.06, -7601,2,2018-10-27 00:44:06,2018-10-27 00:57:52,1,2.18,1,N,113,100,1,11.0,0.5,0.5,2.46,0.0,0.3,14.76, -7686,2,2018-10-01 17:13:29,2018-10-01 17:16:10,5,0.36,1,N,263,236,1,3.5,1.0,0.5,1.59,0.0,0.3,6.89, -1344,1,2018-10-03 20:31:19,2018-10-03 21:11:37,1,20.2,3,N,236,1,1,73.5,0.5,0.0,18.35,17.5,0.3,110.15, -2539,2,2018-10-24 00:46:47,2018-10-24 01:07:38,1,14.1,1,N,132,210,1,39.0,0.5,0.5,5.0,0.0,0.3,45.3, -2758,1,2018-10-17 22:25:54,2018-10-17 22:42:38,1,6.2,1,N,237,87,1,20.0,0.5,0.5,2.0,0.0,0.3,23.3, -567,1,2018-10-22 15:57:44,2018-10-22 16:25:52,1,3.1,1,N,264,264,1,18.5,0.0,0.5,3.85,0.0,0.3,23.15, -1994,1,2018-10-28 13:36:15,2018-10-28 13:47:38,1,1.8,1,N,79,232,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, -3549,2,2018-10-25 21:00:53,2018-10-25 21:17:23,1,2.58,1,N,170,48,1,12.0,0.5,0.5,1.0,0.0,0.3,14.3, -3867,2,2018-10-16 13:26:54,2018-10-16 13:51:57,3,1.8,1,N,230,158,1,16.0,0.0,0.5,3.36,0.0,0.3,20.16, -864,2,2018-10-20 10:53:46,2018-10-20 11:03:28,1,1.22,1,N,262,75,1,8.0,0.0,0.5,2.64,0.0,0.3,11.44, -9457,1,2018-10-01 18:19:51,2018-10-01 18:39:05,1,2.6,1,N,144,186,1,14.0,1.0,0.5,3.15,0.0,0.3,18.95, diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/green_tripdata.csv index f68bd7715093..ae47bbf8b509 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/green_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/green_tripdata.csv @@ -4,18 +4,3 @@ 410503,2,2018-11-19 08:36:21,2018-11-19 08:53:22,N,1,116,128,1,4.59,17.5,0.0,0.5,4.58,0.0,,0.3,22.88,1,1.0 284524,2,2018-11-13 14:57:24,2018-11-13 15:23:48,N,1,53,121,1,5.38,21.5,0.0,0.5,0.0,0.0,,0.3,22.3,1,1.0 652213,2,2018-11-30 20:18:38,2018-11-30 20:23:04,N,1,75,74,1,1.23,6.0,0.5,0.5,1.46,0.0,,0.3,8.76,1,1.0 -453632,2,2018-11-21 08:46:22,2018-11-21 09:12:23,N,5,17,35,1,3.68,16.39,0.0,0.5,0.0,0.0,,0.0,16.89,1,2.0 -514609,2,2018-11-24 14:03:10,2018-11-24 14:26:29,N,1,149,89,1,4.51,18.5,0.0,0.5,0.0,0.0,,0.3,19.3,1,1.0 -570411,2,2018-11-27 12:59:00,2018-11-27 13:03:56,N,1,25,25,1,0.79,5.5,0.0,0.5,1.26,0.0,,0.3,7.56,1,1.0 -328751,2,2018-11-15 12:26:07,2018-11-15 12:58:31,N,5,52,37,1,6.39,21.25,0.0,0.5,0.0,0.0,,0.0,21.75,1,2.0 -290145,2,2018-11-13 18:54:48,2018-11-13 19:04:14,N,1,74,41,1,1.15,7.5,1.0,0.5,1.86,0.0,,0.3,11.16,1,1.0 -273210,2,2018-11-12 23:48:01,2018-11-12 23:48:03,N,5,166,166,1,0.0,20.0,0.0,0.0,4.0,0.0,,0.0,24.0,1,2.0 -598576,2,2018-11-28 16:55:56,2018-11-28 17:03:05,N,1,41,74,2,0.71,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 -19526,2,2018-11-01 19:20:33,2018-11-01 19:27:39,N,1,25,181,1,0.87,6.5,1.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 -645647,2,2018-11-30 16:50:09,2018-11-30 16:56:01,N,1,7,7,2,0.69,5.5,1.0,0.5,1.46,0.0,,0.3,10.71,1,1.0 -642343,2,2018-11-30 14:56:19,2018-11-30 15:06:20,N,1,33,97,1,1.08,7.5,0.0,0.5,1.66,0.0,,0.3,9.96,1,1.0 -284366,2,2018-11-13 14:23:08,2018-11-13 14:53:25,N,1,81,20,1,6.94,26.0,0.0,0.5,0.0,0.0,,0.3,26.8,1,1.0 -608380,2,2018-11-29 03:50:30,2018-11-29 03:55:17,N,1,74,42,1,0.98,5.5,0.5,0.5,2.04,0.0,,0.3,8.84,1,1.0 -131427,2,2018-11-06 17:46:21,2018-11-06 17:50:42,N,1,41,42,1,0.85,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1.0 -368687,2,2018-11-17 10:13:10,2018-11-17 10:27:14,N,1,95,82,1,1.65,10.5,0.0,0.5,0.0,0.0,,0.3,11.3,2,1.0 -13155,1,2018-11-01 15:40:11,2018-11-01 15:48:58,N,1,43,236,1,0.8,7.0,0.0,0.5,1.17,0.0,,0.3,8.97,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/yellow_tripdata.csv index 87ff7e28e6fa..f61adc4debd6 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/yellow_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/yellow_tripdata.csv @@ -4,18 +4,3 @@ 9216,1,2018-11-16 13:12:09,2018-11-16 13:24:41,1,1.5,1,N,238,143,2,9.5,0.0,0.5,0.0,0.0,0.3,10.3, 5518,1,2018-11-15 20:44:20,2018-11-15 20:58:41,1,1.7,1,N,236,151,1,11.0,0.5,0.5,0.0,0.0,0.3,12.3, 104,2,2018-11-07 00:01:14,2018-11-07 00:05:56,5,0.85,1,N,234,170,1,5.5,0.5,0.5,1.36,0.0,0.3,8.16, -536,2,2018-11-11 12:43:04,2018-11-11 12:59:33,1,2.11,1,N,233,114,1,11.5,0.0,0.5,2.46,0.0,0.3,14.76, -2167,2,2018-11-15 14:14:45,2018-11-15 14:24:39,2,0.8,1,N,142,230,1,7.5,0.0,0.5,2.08,0.0,0.3,10.38, -5875,4,2018-11-07 07:24:55,2018-11-07 07:30:28,1,0.81,1,N,164,233,2,5.5,0.0,0.5,0.0,0.0,0.3,6.3, -8196,2,2018-11-05 13:46:25,2018-11-05 13:47:12,3,0.08,1,N,236,236,1,2.5,0.0,0.5,0.66,0.0,0.3,3.96, -8175,1,2018-11-13 20:46:11,2018-11-13 20:50:29,1,0.6,1,N,107,137,2,5.0,0.5,0.5,0.0,0.0,0.3,6.3, -6314,2,2018-11-25 19:36:38,2018-11-25 19:41:30,1,1.77,1,N,263,74,2,7.0,0.0,0.5,0.0,0.0,0.3,7.8, -7700,2,2018-11-18 21:33:49,2018-11-18 21:46:58,2,2.76,1,N,163,24,1,12.0,0.5,0.5,3.32,0.0,0.3,16.62, -9062,1,2018-11-03 18:39:31,2018-11-03 18:49:25,1,0.8,1,N,164,230,2,7.5,0.0,0.5,0.0,0.0,0.3,8.3, -6701,1,2018-11-20 06:12:00,2018-11-20 06:19:35,1,2.2,1,N,237,75,2,8.5,0.0,0.5,0.0,0.0,0.3,9.3, -399,2,2018-11-13 21:20:51,2018-11-13 21:34:45,1,1.18,1,N,162,230,1,9.5,0.5,0.5,5.0,0.0,0.3,15.8, -2745,2,2018-11-03 00:07:35,2018-11-03 00:29:31,1,2.55,1,N,68,4,2,15.0,0.5,0.5,0.0,0.0,0.3,16.3, -5363,2,2018-11-23 20:16:12,2018-11-23 20:20:46,1,1.05,1,N,237,162,2,5.5,0.5,0.5,0.0,0.0,0.3,6.8, -383,1,2018-11-10 22:31:50,2018-11-10 22:47:48,1,1.1,1,N,161,141,2,10.5,0.5,0.5,0.0,0.0,0.3,11.8, -1537,2,2018-11-17 01:05:40,2018-11-17 01:18:09,1,1.62,1,N,114,79,1,9.5,0.5,0.5,2.16,0.0,0.3,12.96, -1760,2,2018-11-01 13:52:35,2018-11-01 13:59:37,1,0.5,1,N,230,162,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8, diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/green_tripdata.csv index 375b98f8a05a..848fed127917 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/green_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/green_tripdata.csv @@ -4,18 +4,3 @@ 519266,2,2018-12-22 23:20:48,2018-12-22 23:42:53,N,1,255,225,1,4.03,16.5,0.5,0.5,3.56,0.0,,0.3,21.36,1,1.0 419676,2,2018-12-18 20:30:48,2018-12-18 20:42:37,N,1,166,116,1,2.59,11.0,0.5,0.5,0.0,0.0,,0.3,12.3,1,1.0 110768,2,2018-12-05 21:20:39,2018-12-05 21:45:12,N,1,255,61,1,4.08,18.5,0.5,0.5,5.94,0.0,,0.3,25.74,1,1.0 -591680,2,2018-12-27 11:21:47,2018-12-27 12:21:54,N,1,50,9,1,15.01,56.0,0.0,0.5,0.0,5.76,,0.3,62.56,1,1.0 -532284,2,2018-12-23 16:30:52,2018-12-23 16:39:42,N,1,74,75,1,0.57,7.0,0.0,0.5,0.0,0.0,,0.3,7.8,2,1.0 -149369,2,2018-12-07 14:18:11,2018-12-07 14:40:06,N,1,179,95,1,6.96,23.0,0.0,0.5,4.76,0.0,,0.3,28.56,1,1.0 -40899,2,2018-12-02 19:07:00,2018-12-02 19:17:53,N,1,97,49,1,1.74,9.0,0.0,0.5,2.45,0.0,,0.3,12.25,1,1.0 -341430,2,2018-12-15 14:19:06,2018-12-15 14:32:47,N,5,11,29,1,4.9,15.06,0.0,0.5,0.0,0.0,,0.0,15.56,1,2.0 -400460,2,2018-12-18 06:09:49,2018-12-18 06:27:36,N,5,14,231,1,7.75,24.39,0.0,0.5,0.0,5.76,,0.0,30.65,1,2.0 -320076,2,2018-12-14 18:00:27,2018-12-14 18:09:47,N,1,7,7,1,0.83,7.0,1.0,0.5,0.0,0.0,,0.3,8.8,2,1.0 -263463,2,2018-12-12 12:06:59,2018-12-12 12:17:05,N,1,260,223,1,3.48,12.5,0.0,0.5,0.0,0.0,,0.3,13.3,2,1.0 -245734,2,2018-12-11 16:20:31,2018-12-11 16:30:51,N,1,75,151,1,1.58,8.5,1.0,0.5,0.0,0.0,,0.3,10.3,2,1.0 -173368,2,2018-12-08 11:34:24,2018-12-08 11:53:37,N,1,181,61,1,3.79,15.5,0.0,0.5,0.0,0.0,,0.3,16.3,1,1.0 -37580,2,2018-12-02 15:23:58,2018-12-02 15:45:49,N,1,82,28,1,3.73,16.5,0.0,0.5,0.0,0.0,,0.3,17.3,1,1.0 -82903,2,2018-12-04 19:07:55,2018-12-04 19:32:12,N,5,242,167,1,4.84,19.86,0.0,0.5,0.0,0.0,,0.0,20.36,1,2.0 -531182,2,2018-12-23 15:43:36,2018-12-23 16:16:42,N,1,82,173,1,2.87,20.0,0.0,0.5,0.0,0.0,,0.3,20.8,2,1.0 -532295,2,2018-12-23 16:09:21,2018-12-23 16:15:12,N,1,181,181,1,0.64,5.5,0.0,0.5,0.0,0.0,,0.3,6.3,2,1.0 -112713,2,2018-12-05 23:07:24,2018-12-05 23:15:40,N,1,129,129,1,1.3,7.5,0.5,0.5,1.76,0.0,,0.3,10.56,1,1.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/yellow_tripdata.csv index 50eb34d13f83..94f782e0e315 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/yellow_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/yellow_tripdata.csv @@ -4,18 +4,3 @@ 4105,2,2018-12-15 23:27:23,2018-12-15 23:44:02,1,2.62,1,N,232,137,2,13.0,0.5,0.5,0.0,0.0,0.3,14.3, 2743,2,2018-12-12 19:54:39,2018-12-12 19:57:53,1,0.41,1,N,237,237,1,4.0,1.0,0.5,1.16,0.0,0.3,6.96, 4126,1,2018-12-21 18:57:55,2018-12-21 19:09:35,1,1.0,1,N,249,79,1,8.5,1.0,0.5,2.05,0.0,0.3,12.35, -1566,1,2018-12-16 15:38:10,2018-12-16 15:55:05,3,1.4,1,N,236,141,1,11.5,0.0,0.5,1.85,0.0,0.3,14.15, -4857,2,2018-12-06 17:26:50,2018-12-06 17:39:34,2,0.93,1,N,142,239,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96, -304,1,2018-12-27 16:00:34,2018-12-27 16:34:26,2,5.7,1,N,163,209,2,24.0,1.0,0.5,0.0,0.0,0.3,25.8, -8159,2,2018-12-05 18:32:06,2018-12-05 18:41:03,1,1.38,1,N,68,90,1,8.0,1.0,0.5,2.94,0.0,0.3,12.74, -6575,4,2018-12-30 23:53:05,2018-12-30 23:55:40,1,0.39,1,N,256,256,1,4.0,0.5,0.5,1.06,0.0,0.3,6.36, -8327,2,2018-12-14 18:42:59,2018-12-14 18:51:49,1,1.26,1,N,163,236,1,8.0,1.0,0.5,1.5,0.0,0.3,11.3, -5245,1,2018-12-23 16:00:21,2018-12-23 16:12:44,1,1.6,1,N,141,142,1,10.0,0.0,0.5,2.15,0.0,0.3,12.95, -3521,2,2018-12-05 00:12:07,2018-12-05 00:30:25,2,3.15,1,N,164,45,1,14.0,0.5,0.5,3.06,0.0,0.3,18.36, -9442,2,2018-12-14 08:58:16,2018-12-14 09:06:42,1,0.67,1,N,239,238,1,6.0,0.0,0.5,0.8,0.0,0.3,7.6, -922,1,2018-12-09 03:45:39,2018-12-09 03:52:09,1,1.8,1,N,90,170,1,7.5,0.5,0.5,1.0,0.0,0.3,9.8, -807,2,2018-12-26 17:30:45,2018-12-26 17:53:04,1,5.34,1,N,164,261,2,19.0,1.0,0.5,0.0,0.0,0.3,20.8, -6354,2,2018-12-31 09:38:01,2018-12-31 09:46:53,1,2.2,1,N,24,236,1,9.5,0.0,0.5,2.0,0.0,0.3,12.3, -7329,2,2018-12-09 02:54:29,2018-12-09 03:00:14,5,1.19,1,N,164,246,1,6.5,0.5,0.5,1.95,0.0,0.3,9.75, -6227,1,2018-12-01 18:32:30,2018-12-01 19:06:38,1,4.2,1,N,90,263,1,22.0,0.0,0.5,4.55,0.0,0.3,27.35, -9796,1,2018-12-17 18:18:12,2018-12-17 18:32:05,1,1.4,1,N,68,107,1,10.0,1.0,0.5,1.2,0.0,0.3,13.0, diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv index 07a92dc26d64..1168d9f9ef35 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv @@ -4,18 +4,3 @@ 517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1, 368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1, 155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1, -366694,2,2019-01-18 22:58:57,2019-01-18 23:15:37,N,1,75,185,1,8.71,24.5,0.5,0.5,0.0,0.0,,0.3,25.8,1,1, -474972,2,2019-01-24 17:59:00,2019-01-24 18:35:53,N,5,195,90,1,6.46,23.45,0.0,0.5,0.0,0.0,,0.0,23.95,1,2, -69932,1,2019-01-04 18:07:02,2019-01-04 18:34:37,N,1,97,35,2,4.7,20.0,1.0,0.5,0.0,0.0,,0.3,21.8,1,1, -244782,2,2019-01-13 02:16:18,2019-01-13 02:37:30,N,1,260,198,1,3.9,16.5,0.5,0.5,0.0,0.0,,0.3,17.8,2,1, -482363,2,2019-01-25 04:42:54,2019-01-25 04:47:06,N,1,82,56,1,0.74,5.0,0.5,0.5,0.0,0.0,,0.3,6.3,2,1, -573895,2,2019-01-29 11:36:48,2019-01-29 11:43:03,N,1,43,236,1,0.91,6.0,0.0,0.5,2.0,0.0,,0.3,8.8,1,1,0.0 -182986,2,2019-01-10 12:07:54,2019-01-10 12:19:31,N,1,29,155,1,5.49,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, -526993,1,2019-01-26 22:13:18,2019-01-26 22:13:51,N,1,33,33,1,0.1,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1, -490940,2,2019-01-25 13:09:50,2019-01-25 13:15:30,N,1,220,153,1,0.35,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,1,1, -145635,2,2019-01-08 16:01:58,2019-01-08 16:10:16,N,1,77,72,1,1.48,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,1,1, -242083,1,2019-01-12 23:43:28,2019-01-13 00:02:13,N,1,210,216,1,12.3,34.5,0.5,0.5,0.0,0.0,,0.3,35.8,2,1, -328924,2,2019-01-17 10:07:09,2019-01-17 10:25:59,N,1,32,242,1,4.71,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, -132260,1,2019-01-07 21:00:39,2019-01-07 21:18:13,N,1,52,188,1,3.6,14.5,0.5,0.5,0.0,0.0,,0.3,15.8,2,1, -568032,2,2019-01-29 07:01:03,2019-01-29 07:54:28,N,1,225,251,1,16.91,55.0,0.0,0.5,0.0,11.52,,0.3,67.32,1,1,0.0 -92205,2,2019-01-05 20:26:49,2019-01-05 20:44:48,N,1,29,108,1,3.93,15.5,0.5,0.5,0.0,0.0,,0.3,16.8,1,1, diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv index 288e8ac8a023..14f45f9cf3bc 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv @@ -4,18 +4,3 @@ 2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0 5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95, 4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56, -1505,2,2019-01-29 17:58:08,2019-01-29 18:11:19,1,0.72,1,N,13,13,2,9.0,1.0,0.5,0.0,0.0,0.3,10.8,0.0 -8131,1,2019-01-13 16:22:03,2019-01-13 16:33:06,2,1.4,1,N,161,170,1,8.5,0.0,0.5,3.0,0.0,0.3,12.3, -9601,2,2019-01-23 12:07:55,2019-01-23 12:11:19,5,0.57,1,N,239,239,1,4.0,0.0,0.5,1.0,0.0,0.3,5.8,0.0 -4522,2,2019-01-25 15:42:39,2019-01-25 15:50:54,1,1.26,1,N,170,229,1,7.5,0.0,0.5,1.0,0.0,0.3,9.3,0.0 -9008,1,2019-01-13 19:40:27,2019-01-13 19:50:54,3,2.1,1,N,239,263,1,9.5,0.0,0.5,2.05,0.0,0.3,12.35, -7012,2,2019-01-19 20:44:35,2019-01-19 21:05:13,1,3.16,1,N,142,107,1,14.5,0.5,0.5,3.16,0.0,0.3,18.96, -7316,2,2019-01-07 22:38:10,2019-01-07 22:44:31,2,2.4,1,N,233,263,1,8.5,0.5,0.5,1.47,0.0,0.3,11.27, -6384,2,2019-01-25 07:35:59,2019-01-25 07:57:44,1,5.12,1,N,143,231,1,18.5,0.0,0.5,2.5,0.0,0.3,21.8,0.0 -3652,1,2019-01-26 20:43:57,2019-01-26 21:03:51,1,4.4,1,N,13,230,2,17.0,0.5,0.5,0.0,0.0,0.3,18.3,0.0 -6409,2,2019-01-06 22:27:16,2019-01-06 22:33:58,1,1.52,1,N,161,234,2,7.0,0.5,0.5,0.0,0.0,0.3,8.3, -1130,2,2019-01-15 20:05:59,2019-01-15 20:22:11,1,3.96,1,N,233,41,2,14.5,0.5,0.5,0.0,0.0,0.3,15.8, -5618,2,2019-01-14 14:47:18,2019-01-14 14:54:44,1,0.5,1,N,140,237,1,6.0,0.0,0.5,1.0,0.0,0.3,7.8, -8515,1,2019-01-16 16:26:17,2019-01-16 16:41:36,2,1.4,1,N,233,163,1,10.5,1.0,0.5,2.45,0.0,0.3,14.75, -5397,1,2019-01-23 08:31:58,2019-01-23 08:55:43,1,2.6,1,N,246,163,1,16.0,0.0,0.5,5.0,0.0,0.3,21.8,0.0 -4648,1,2019-01-30 19:59:30,2019-01-30 20:10:34,1,1.9,1,N,186,79,1,9.0,0.5,0.5,2.05,0.0,0.3,12.35,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv index 9a6442e61899..b19b1f827f4f 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv @@ -4,18 +4,3 @@ 150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75 245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0 151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0 -18916,2,2019-02-01 19:54:28,2019-02-01 19:58:41,N,1,74,262,1,1.04,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,1,1,0.0 -110154,2,2019-02-06 08:33:30,2019-02-06 08:48:52,N,1,66,87,1,2.14,12.0,0.0,0.5,4.66,0.0,,0.3,20.21,1,1,2.75 -335114,2,2019-02-16 18:40:24,2019-02-16 18:46:47,N,1,74,75,1,0.89,6.0,0.0,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 -453776,2,2019-02-22 22:38:18,2019-02-22 22:49:20,N,1,116,41,1,1.86,10.0,0.5,0.5,2.26,0.0,,0.3,13.56,1,1,0.0 -192685,2,2019-02-09 19:56:13,2019-02-09 20:13:56,N,1,167,235,1,2.59,13.0,0.0,0.5,0.0,0.0,,0.3,13.8,1,1,0.0 -574982,2,2019-02-28 22:42:50,2019-02-28 22:47:40,N,1,42,116,1,0.91,5.5,0.5,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 -155855,2,2019-02-08 09:02:21,2019-02-08 10:17:23,N,5,210,161,1,20.74,59.86,0.0,0.5,0.0,0.0,,0.0,60.36,1,2,0.0 -130109,2,2019-02-07 05:18:25,2019-02-07 05:24:30,N,1,62,188,4,1.54,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,1,1,0.0 -157145,2,2019-02-08 10:20:47,2019-02-08 10:47:36,N,1,74,31,2,7.73,25.5,0.0,0.5,5.26,0.0,,0.3,31.56,1,1,0.0 -411748,2,2019-02-20 22:31:32,2019-02-20 22:37:11,N,1,129,129,4,0.9,5.5,0.5,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 -470538,2,2019-02-23 18:15:38,2019-02-23 18:27:05,N,1,126,213,1,2.21,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,1,1,0.0 -262705,2,2019-02-13 13:07:02,2019-02-13 13:19:50,N,1,186,249,1,1.5,9.5,0.0,0.5,2.56,0.0,,0.3,15.36,1,1,2.5 -163160,2,2019-02-08 15:39:26,2019-02-08 16:06:31,N,1,33,17,1,2.9,17.5,0.0,0.5,0.0,0.0,,0.3,18.3,2,1,0.0 -263174,2,2019-02-13 15:06:57,2019-02-13 15:28:26,N,1,165,123,5,1.66,13.5,0.0,0.5,0.0,0.0,,0.3,14.3,1,1,0.0 -451270,2,2019-02-22 21:00:04,2019-02-22 21:00:24,N,1,82,82,1,0.06,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv index 573017273621..dd5bcc804be2 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv @@ -4,18 +4,3 @@ 4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5 1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5 6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5 -1633,2,2019-02-09 12:14:12,2019-02-10 11:50:00,3,3.02,1,N,48,236,1,12.5,0.0,0.5,1.58,0.0,0.3,17.38,2.5 -4450,1,2019-02-05 08:56:21,2019-02-05 09:02:06,1,0.8,1,N,262,75,1,5.5,2.5,0.5,1.7,0.0,0.3,10.5,2.5 -8929,2,2019-02-13 11:44:00,2019-02-13 11:57:32,1,1.13,1,N,100,170,2,9.0,0.0,0.5,0.0,0.0,0.3,12.3,2.5 -6968,2,2019-02-27 20:52:49,2019-02-27 21:18:37,1,4.64,1,N,246,125,1,18.5,0.5,0.5,4.46,0.0,0.3,26.76,2.5 -7969,2,2019-02-01 11:12:35,2019-02-01 11:19:39,1,0.54,1,N,236,263,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8,0.0 -7695,1,2019-02-28 00:51:50,2019-02-28 00:55:37,1,1.3,1,N,263,74,2,6.0,0.5,0.5,0.0,0.0,0.3,7.3,0.0 -9904,1,2019-02-16 17:45:00,2019-02-16 17:51:09,1,1.3,1,N,170,162,1,6.5,2.5,0.5,1.95,0.0,0.3,11.75,2.5 -1580,2,2019-02-07 10:14:39,2019-02-07 10:51:38,1,17.49,2,N,186,132,2,52.0,0.0,0.5,0.0,5.76,0.3,61.06,2.5 -699,2,2019-02-18 11:01:16,2019-02-18 11:11:06,2,2.31,1,N,249,170,1,10.0,0.0,0.5,2.66,0.0,0.3,15.96,2.5 -5181,2,2019-02-28 22:39:19,2019-02-28 22:42:13,1,0.49,1,N,237,162,1,4.0,0.5,0.5,1.56,0.0,0.3,9.36,2.5 -2631,2,2019-02-01 00:55:00,2019-02-01 00:55:20,1,0.0,5,N,265,265,1,78.0,0.0,0.0,10.0,10.5,0.3,98.8,0.0 -4018,2,2019-02-11 16:59:03,2019-02-11 17:05:01,1,1.35,1,N,143,238,1,6.5,1.0,0.5,2.16,0.0,0.3,12.96,2.5 -7582,2,2019-02-08 17:29:00,2019-02-08 17:31:14,1,0.47,1,N,236,236,2,3.5,1.0,0.5,0.0,0.0,0.3,7.8,2.5 -1810,1,2019-02-17 13:10:12,2019-02-17 13:16:03,1,0.5,1,N,161,230,2,5.5,2.5,0.5,0.0,0.0,0.3,8.8,2.5 -5095,2,2019-02-09 14:30:05,2019-02-09 14:42:22,0,1.18,1,N,170,234,1,9.0,0.0,0.5,2.46,0.0,0.3,14.76,2.5 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv index 5104e10f24c5..c5f49aef4b67 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv @@ -4,18 +4,3 @@ 6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0 76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75 282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0 -439163,2,2019-03-23 12:07:55,2019-03-23 12:11:20,N,1,166,166,1,0.75,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1,0.0 -518619,2,2019-03-27 18:33:46,2019-03-27 18:40:59,N,1,7,223,1,0.59,6.0,1.0,0.5,1.95,0.0,,0.3,9.75,1,1,0.0 -385766,2,2019-03-20 18:19:37,2019-03-20 18:28:31,N,1,129,129,1,1.21,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,2,1,0.0 -131793,2,2019-03-07 18:40:23,2019-03-07 18:58:26,N,1,51,32,1,3.35,14.0,1.0,0.5,0.0,0.0,,0.3,15.8,1,1,0.0 -203472,2,2019-03-11 12:46:20,2019-03-11 13:20:08,N,1,119,254,3,6.47,27.5,0.0,0.5,0.0,0.0,,0.3,28.3,1,1,0.0 -399106,2,2019-03-21 13:10:44,2019-03-21 13:21:19,N,1,74,75,1,1.27,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 -425811,2,2019-03-22 19:00:23,2019-03-22 19:02:51,N,1,95,95,1,0.51,4.0,1.0,0.5,2.0,0.0,,0.3,7.8,1,1,0.0 -36345,2,2019-03-02 19:27:41,2019-03-02 19:33:52,N,1,7,179,1,0.86,6.0,0.0,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 -323601,2,2019-03-17 10:19:28,2019-03-17 10:22:59,N,1,97,49,1,0.68,4.5,0.0,0.5,1.06,0.0,,0.3,6.36,1,1,0.0 -246980,2,2019-03-13 17:03:07,2019-03-13 17:12:04,N,1,75,263,2,1.32,8.0,1.0,0.5,2.51,0.0,,0.3,15.06,1,1,2.75 -269737,1,2019-03-14 18:03:35,2019-03-14 18:10:01,N,1,66,65,1,0.6,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1,0.0 -145416,2,2019-03-08 12:21:02,2019-03-08 12:22:57,N,1,75,75,1,0.5,3.5,0.0,0.5,0.0,0.0,,0.3,4.3,2,1,0.0 -142333,2,2019-03-08 09:39:09,2019-03-08 10:03:48,N,5,215,16,1,7.14,21.62,0.0,0.5,0.0,0.0,,0.0,22.12,1,2,0.0 -381567,2,2019-03-20 15:09:47,2019-03-20 15:22:12,N,1,89,188,1,1.75,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 -476898,1,2019-03-25 13:10:35,2019-03-25 13:20:23,N,1,244,243,1,1.1,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv index 3d254ce261c2..085f0cc37265 100644 --- a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv +++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv @@ -4,18 +4,3 @@ 7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0 9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5 2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0 -5030,2,2019-03-13 20:22:28,2019-03-13 20:42:11,2,9.81,1,N,138,244,1,28.0,0.5,0.5,7.01,5.76,0.3,42.07,0.0 -7848,4,2019-03-21 13:22:08,2019-03-21 13:31:36,1,0.86,1,N,229,229,1,7.5,0.0,0.5,2.16,0.0,0.3,12.96,2.5 -167,1,2019-03-15 22:37:06,2019-03-15 22:38:34,1,0.0,1,N,264,145,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0 -7197,2,2019-03-14 20:38:43,2019-03-14 20:45:04,5,1.04,1,N,151,166,1,6.0,0.5,0.5,1.1,0.0,0.3,8.4,0.0 -568,1,2019-03-09 09:29:44,2019-03-09 09:40:36,0,2.7,1,N,24,142,1,10.5,2.5,0.5,2.75,0.0,0.3,16.55,2.5 -2198,2,2019-03-11 18:32:36,2019-03-11 18:48:27,1,2.02,1,N,234,162,2,11.5,1.0,0.5,0.0,0.0,0.3,15.8,2.5 -5249,1,2019-03-15 09:35:22,2019-03-15 09:46:33,1,2.4,1,N,140,161,1,10.0,2.5,0.5,1.33,0.0,0.3,14.63,2.5 -5531,1,2019-03-04 19:44:33,2019-03-04 20:14:34,0,15.8,1,N,132,145,1,43.5,1.0,0.5,11.3,0.0,0.3,56.6,0.0 -4943,1,2019-03-03 15:12:19,2019-03-03 15:16:06,1,1.2,1,N,237,236,1,5.5,2.5,0.5,1.1,0.0,0.3,9.9,2.5 -617,2,2019-03-20 14:48:47,2019-03-20 14:52:43,1,0.59,1,N,142,163,1,4.5,0.0,0.5,1.0,0.0,0.3,8.8,2.5 -7014,2,2019-03-03 23:56:36,2019-03-04 00:22:25,1,7.91,1,N,132,258,2,25.0,0.5,0.5,0.0,0.0,0.3,26.3,0.0 -2138,2,2019-03-23 14:47:31,2019-03-23 14:54:11,2,1.12,1,N,234,161,2,6.5,0.0,0.5,0.0,0.0,0.3,9.8,2.5 -5298,2,2019-03-20 15:47:08,2019-03-20 15:51:15,1,0.54,1,N,161,161,1,4.5,0.0,0.5,1.56,0.0,0.3,9.36,2.5 -9838,1,2019-03-13 20:42:59,2019-03-13 20:50:23,1,1.1,1,N,162,48,1,6.5,3.0,0.5,2.05,0.0,0.3,12.35,2.5 -2565,1,2019-03-13 10:28:36,2019-03-13 10:28:38,1,0.0,1,N,246,246,2,2.5,2.5,0.5,0.0,0.0,0.3,5.8,2.5 diff --git a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-01.csv index 288e8ac8a023..14f45f9cf3bc 100644 --- a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-01.csv +++ b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-01.csv @@ -4,18 +4,3 @@ 2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0 5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95, 4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56, -1505,2,2019-01-29 17:58:08,2019-01-29 18:11:19,1,0.72,1,N,13,13,2,9.0,1.0,0.5,0.0,0.0,0.3,10.8,0.0 -8131,1,2019-01-13 16:22:03,2019-01-13 16:33:06,2,1.4,1,N,161,170,1,8.5,0.0,0.5,3.0,0.0,0.3,12.3, -9601,2,2019-01-23 12:07:55,2019-01-23 12:11:19,5,0.57,1,N,239,239,1,4.0,0.0,0.5,1.0,0.0,0.3,5.8,0.0 -4522,2,2019-01-25 15:42:39,2019-01-25 15:50:54,1,1.26,1,N,170,229,1,7.5,0.0,0.5,1.0,0.0,0.3,9.3,0.0 -9008,1,2019-01-13 19:40:27,2019-01-13 19:50:54,3,2.1,1,N,239,263,1,9.5,0.0,0.5,2.05,0.0,0.3,12.35, -7012,2,2019-01-19 20:44:35,2019-01-19 21:05:13,1,3.16,1,N,142,107,1,14.5,0.5,0.5,3.16,0.0,0.3,18.96, -7316,2,2019-01-07 22:38:10,2019-01-07 22:44:31,2,2.4,1,N,233,263,1,8.5,0.5,0.5,1.47,0.0,0.3,11.27, -6384,2,2019-01-25 07:35:59,2019-01-25 07:57:44,1,5.12,1,N,143,231,1,18.5,0.0,0.5,2.5,0.0,0.3,21.8,0.0 -3652,1,2019-01-26 20:43:57,2019-01-26 21:03:51,1,4.4,1,N,13,230,2,17.0,0.5,0.5,0.0,0.0,0.3,18.3,0.0 -6409,2,2019-01-06 22:27:16,2019-01-06 22:33:58,1,1.52,1,N,161,234,2,7.0,0.5,0.5,0.0,0.0,0.3,8.3, -1130,2,2019-01-15 20:05:59,2019-01-15 20:22:11,1,3.96,1,N,233,41,2,14.5,0.5,0.5,0.0,0.0,0.3,15.8, -5618,2,2019-01-14 14:47:18,2019-01-14 14:54:44,1,0.5,1,N,140,237,1,6.0,0.0,0.5,1.0,0.0,0.3,7.8, -8515,1,2019-01-16 16:26:17,2019-01-16 16:41:36,2,1.4,1,N,233,163,1,10.5,1.0,0.5,2.45,0.0,0.3,14.75, -5397,1,2019-01-23 08:31:58,2019-01-23 08:55:43,1,2.6,1,N,246,163,1,16.0,0.0,0.5,5.0,0.0,0.3,21.8,0.0 -4648,1,2019-01-30 19:59:30,2019-01-30 20:10:34,1,1.9,1,N,186,79,1,9.0,0.5,0.5,2.05,0.0,0.3,12.35,0.0 diff --git a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-02.csv index 573017273621..dd5bcc804be2 100644 --- a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-02.csv +++ b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-02.csv @@ -4,18 +4,3 @@ 4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5 1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5 6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5 -1633,2,2019-02-09 12:14:12,2019-02-10 11:50:00,3,3.02,1,N,48,236,1,12.5,0.0,0.5,1.58,0.0,0.3,17.38,2.5 -4450,1,2019-02-05 08:56:21,2019-02-05 09:02:06,1,0.8,1,N,262,75,1,5.5,2.5,0.5,1.7,0.0,0.3,10.5,2.5 -8929,2,2019-02-13 11:44:00,2019-02-13 11:57:32,1,1.13,1,N,100,170,2,9.0,0.0,0.5,0.0,0.0,0.3,12.3,2.5 -6968,2,2019-02-27 20:52:49,2019-02-27 21:18:37,1,4.64,1,N,246,125,1,18.5,0.5,0.5,4.46,0.0,0.3,26.76,2.5 -7969,2,2019-02-01 11:12:35,2019-02-01 11:19:39,1,0.54,1,N,236,263,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8,0.0 -7695,1,2019-02-28 00:51:50,2019-02-28 00:55:37,1,1.3,1,N,263,74,2,6.0,0.5,0.5,0.0,0.0,0.3,7.3,0.0 -9904,1,2019-02-16 17:45:00,2019-02-16 17:51:09,1,1.3,1,N,170,162,1,6.5,2.5,0.5,1.95,0.0,0.3,11.75,2.5 -1580,2,2019-02-07 10:14:39,2019-02-07 10:51:38,1,17.49,2,N,186,132,2,52.0,0.0,0.5,0.0,5.76,0.3,61.06,2.5 -699,2,2019-02-18 11:01:16,2019-02-18 11:11:06,2,2.31,1,N,249,170,1,10.0,0.0,0.5,2.66,0.0,0.3,15.96,2.5 -5181,2,2019-02-28 22:39:19,2019-02-28 22:42:13,1,0.49,1,N,237,162,1,4.0,0.5,0.5,1.56,0.0,0.3,9.36,2.5 -2631,2,2019-02-01 00:55:00,2019-02-01 00:55:20,1,0.0,5,N,265,265,1,78.0,0.0,0.0,10.0,10.5,0.3,98.8,0.0 -4018,2,2019-02-11 16:59:03,2019-02-11 17:05:01,1,1.35,1,N,143,238,1,6.5,1.0,0.5,2.16,0.0,0.3,12.96,2.5 -7582,2,2019-02-08 17:29:00,2019-02-08 17:31:14,1,0.47,1,N,236,236,2,3.5,1.0,0.5,0.0,0.0,0.3,7.8,2.5 -1810,1,2019-02-17 13:10:12,2019-02-17 13:16:03,1,0.5,1,N,161,230,2,5.5,2.5,0.5,0.0,0.0,0.3,8.8,2.5 -5095,2,2019-02-09 14:30:05,2019-02-09 14:42:22,0,1.18,1,N,170,234,1,9.0,0.0,0.5,2.46,0.0,0.3,14.76,2.5 diff --git a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-03.csv index 3d254ce261c2..085f0cc37265 100644 --- a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-03.csv +++ b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-03.csv @@ -4,18 +4,3 @@ 7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0 9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5 2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0 -5030,2,2019-03-13 20:22:28,2019-03-13 20:42:11,2,9.81,1,N,138,244,1,28.0,0.5,0.5,7.01,5.76,0.3,42.07,0.0 -7848,4,2019-03-21 13:22:08,2019-03-21 13:31:36,1,0.86,1,N,229,229,1,7.5,0.0,0.5,2.16,0.0,0.3,12.96,2.5 -167,1,2019-03-15 22:37:06,2019-03-15 22:38:34,1,0.0,1,N,264,145,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0 -7197,2,2019-03-14 20:38:43,2019-03-14 20:45:04,5,1.04,1,N,151,166,1,6.0,0.5,0.5,1.1,0.0,0.3,8.4,0.0 -568,1,2019-03-09 09:29:44,2019-03-09 09:40:36,0,2.7,1,N,24,142,1,10.5,2.5,0.5,2.75,0.0,0.3,16.55,2.5 -2198,2,2019-03-11 18:32:36,2019-03-11 18:48:27,1,2.02,1,N,234,162,2,11.5,1.0,0.5,0.0,0.0,0.3,15.8,2.5 -5249,1,2019-03-15 09:35:22,2019-03-15 09:46:33,1,2.4,1,N,140,161,1,10.0,2.5,0.5,1.33,0.0,0.3,14.63,2.5 -5531,1,2019-03-04 19:44:33,2019-03-04 20:14:34,0,15.8,1,N,132,145,1,43.5,1.0,0.5,11.3,0.0,0.3,56.6,0.0 -4943,1,2019-03-03 15:12:19,2019-03-03 15:16:06,1,1.2,1,N,237,236,1,5.5,2.5,0.5,1.1,0.0,0.3,9.9,2.5 -617,2,2019-03-20 14:48:47,2019-03-20 14:52:43,1,0.59,1,N,142,163,1,4.5,0.0,0.5,1.0,0.0,0.3,8.8,2.5 -7014,2,2019-03-03 23:56:36,2019-03-04 00:22:25,1,7.91,1,N,132,258,2,25.0,0.5,0.5,0.0,0.0,0.3,26.3,0.0 -2138,2,2019-03-23 14:47:31,2019-03-23 14:54:11,2,1.12,1,N,234,161,2,6.5,0.0,0.5,0.0,0.0,0.3,9.8,2.5 -5298,2,2019-03-20 15:47:08,2019-03-20 15:51:15,1,0.54,1,N,161,161,1,4.5,0.0,0.5,1.56,0.0,0.3,9.36,2.5 -9838,1,2019-03-13 20:42:59,2019-03-13 20:50:23,1,1.1,1,N,162,48,1,6.5,3.0,0.5,2.05,0.0,0.3,12.35,2.5 -2565,1,2019-03-13 10:28:36,2019-03-13 10:28:38,1,0.0,1,N,246,246,2,2.5,2.5,0.5,0.0,0.0,0.3,5.8,2.5 diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-01.csv index 07a92dc26d64..1168d9f9ef35 100644 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-01.csv +++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-01.csv @@ -4,18 +4,3 @@ 517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1, 368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1, 155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1, -366694,2,2019-01-18 22:58:57,2019-01-18 23:15:37,N,1,75,185,1,8.71,24.5,0.5,0.5,0.0,0.0,,0.3,25.8,1,1, -474972,2,2019-01-24 17:59:00,2019-01-24 18:35:53,N,5,195,90,1,6.46,23.45,0.0,0.5,0.0,0.0,,0.0,23.95,1,2, -69932,1,2019-01-04 18:07:02,2019-01-04 18:34:37,N,1,97,35,2,4.7,20.0,1.0,0.5,0.0,0.0,,0.3,21.8,1,1, -244782,2,2019-01-13 02:16:18,2019-01-13 02:37:30,N,1,260,198,1,3.9,16.5,0.5,0.5,0.0,0.0,,0.3,17.8,2,1, -482363,2,2019-01-25 04:42:54,2019-01-25 04:47:06,N,1,82,56,1,0.74,5.0,0.5,0.5,0.0,0.0,,0.3,6.3,2,1, -573895,2,2019-01-29 11:36:48,2019-01-29 11:43:03,N,1,43,236,1,0.91,6.0,0.0,0.5,2.0,0.0,,0.3,8.8,1,1,0.0 -182986,2,2019-01-10 12:07:54,2019-01-10 12:19:31,N,1,29,155,1,5.49,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, -526993,1,2019-01-26 22:13:18,2019-01-26 22:13:51,N,1,33,33,1,0.1,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1, -490940,2,2019-01-25 13:09:50,2019-01-25 13:15:30,N,1,220,153,1,0.35,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,1,1, -145635,2,2019-01-08 16:01:58,2019-01-08 16:10:16,N,1,77,72,1,1.48,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,1,1, -242083,1,2019-01-12 23:43:28,2019-01-13 00:02:13,N,1,210,216,1,12.3,34.5,0.5,0.5,0.0,0.0,,0.3,35.8,2,1, -328924,2,2019-01-17 10:07:09,2019-01-17 10:25:59,N,1,32,242,1,4.71,17.0,0.0,0.5,0.0,0.0,,0.3,17.8,1,1, -132260,1,2019-01-07 21:00:39,2019-01-07 21:18:13,N,1,52,188,1,3.6,14.5,0.5,0.5,0.0,0.0,,0.3,15.8,2,1, -568032,2,2019-01-29 07:01:03,2019-01-29 07:54:28,N,1,225,251,1,16.91,55.0,0.0,0.5,0.0,11.52,,0.3,67.32,1,1,0.0 -92205,2,2019-01-05 20:26:49,2019-01-05 20:44:48,N,1,29,108,1,3.93,15.5,0.5,0.5,0.0,0.0,,0.3,16.8,1,1, diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-02.csv index 9a6442e61899..b19b1f827f4f 100644 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-02.csv +++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-02.csv @@ -4,18 +4,3 @@ 150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75 245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0 151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0 -18916,2,2019-02-01 19:54:28,2019-02-01 19:58:41,N,1,74,262,1,1.04,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,1,1,0.0 -110154,2,2019-02-06 08:33:30,2019-02-06 08:48:52,N,1,66,87,1,2.14,12.0,0.0,0.5,4.66,0.0,,0.3,20.21,1,1,2.75 -335114,2,2019-02-16 18:40:24,2019-02-16 18:46:47,N,1,74,75,1,0.89,6.0,0.0,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 -453776,2,2019-02-22 22:38:18,2019-02-22 22:49:20,N,1,116,41,1,1.86,10.0,0.5,0.5,2.26,0.0,,0.3,13.56,1,1,0.0 -192685,2,2019-02-09 19:56:13,2019-02-09 20:13:56,N,1,167,235,1,2.59,13.0,0.0,0.5,0.0,0.0,,0.3,13.8,1,1,0.0 -574982,2,2019-02-28 22:42:50,2019-02-28 22:47:40,N,1,42,116,1,0.91,5.5,0.5,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 -155855,2,2019-02-08 09:02:21,2019-02-08 10:17:23,N,5,210,161,1,20.74,59.86,0.0,0.5,0.0,0.0,,0.0,60.36,1,2,0.0 -130109,2,2019-02-07 05:18:25,2019-02-07 05:24:30,N,1,62,188,4,1.54,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,1,1,0.0 -157145,2,2019-02-08 10:20:47,2019-02-08 10:47:36,N,1,74,31,2,7.73,25.5,0.0,0.5,5.26,0.0,,0.3,31.56,1,1,0.0 -411748,2,2019-02-20 22:31:32,2019-02-20 22:37:11,N,1,129,129,4,0.9,5.5,0.5,0.5,0.0,0.0,,0.3,6.8,2,1,0.0 -470538,2,2019-02-23 18:15:38,2019-02-23 18:27:05,N,1,126,213,1,2.21,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,1,1,0.0 -262705,2,2019-02-13 13:07:02,2019-02-13 13:19:50,N,1,186,249,1,1.5,9.5,0.0,0.5,2.56,0.0,,0.3,15.36,1,1,2.5 -163160,2,2019-02-08 15:39:26,2019-02-08 16:06:31,N,1,33,17,1,2.9,17.5,0.0,0.5,0.0,0.0,,0.3,18.3,2,1,0.0 -263174,2,2019-02-13 15:06:57,2019-02-13 15:28:26,N,1,165,123,5,1.66,13.5,0.0,0.5,0.0,0.0,,0.3,14.3,1,1,0.0 -451270,2,2019-02-22 21:00:04,2019-02-22 21:00:24,N,1,82,82,1,0.06,2.5,0.5,0.5,0.0,0.0,,0.3,3.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-03.csv index 5104e10f24c5..c5f49aef4b67 100644 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-03.csv +++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-03.csv @@ -4,18 +4,3 @@ 6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0 76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75 282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0 -439163,2,2019-03-23 12:07:55,2019-03-23 12:11:20,N,1,166,166,1,0.75,5.0,0.0,0.5,0.0,0.0,,0.3,5.8,2,1,0.0 -518619,2,2019-03-27 18:33:46,2019-03-27 18:40:59,N,1,7,223,1,0.59,6.0,1.0,0.5,1.95,0.0,,0.3,9.75,1,1,0.0 -385766,2,2019-03-20 18:19:37,2019-03-20 18:28:31,N,1,129,129,1,1.21,7.5,1.0,0.5,0.0,0.0,,0.3,9.3,2,1,0.0 -131793,2,2019-03-07 18:40:23,2019-03-07 18:58:26,N,1,51,32,1,3.35,14.0,1.0,0.5,0.0,0.0,,0.3,15.8,1,1,0.0 -203472,2,2019-03-11 12:46:20,2019-03-11 13:20:08,N,1,119,254,3,6.47,27.5,0.0,0.5,0.0,0.0,,0.3,28.3,1,1,0.0 -399106,2,2019-03-21 13:10:44,2019-03-21 13:21:19,N,1,74,75,1,1.27,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 -425811,2,2019-03-22 19:00:23,2019-03-22 19:02:51,N,1,95,95,1,0.51,4.0,1.0,0.5,2.0,0.0,,0.3,7.8,1,1,0.0 -36345,2,2019-03-02 19:27:41,2019-03-02 19:33:52,N,1,7,179,1,0.86,6.0,0.0,0.5,1.36,0.0,,0.3,8.16,1,1,0.0 -323601,2,2019-03-17 10:19:28,2019-03-17 10:22:59,N,1,97,49,1,0.68,4.5,0.0,0.5,1.06,0.0,,0.3,6.36,1,1,0.0 -246980,2,2019-03-13 17:03:07,2019-03-13 17:12:04,N,1,75,263,2,1.32,8.0,1.0,0.5,2.51,0.0,,0.3,15.06,1,1,2.75 -269737,1,2019-03-14 18:03:35,2019-03-14 18:10:01,N,1,66,65,1,0.6,6.0,1.0,0.5,0.0,0.0,,0.3,7.8,2,1,0.0 -145416,2,2019-03-08 12:21:02,2019-03-08 12:22:57,N,1,75,75,1,0.5,3.5,0.0,0.5,0.0,0.0,,0.3,4.3,2,1,0.0 -142333,2,2019-03-08 09:39:09,2019-03-08 10:03:48,N,5,215,16,1,7.14,21.62,0.0,0.5,0.0,0.0,,0.0,22.12,1,2,0.0 -381567,2,2019-03-20 15:09:47,2019-03-20 15:22:12,N,1,89,188,1,1.75,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0 -476898,1,2019-03-25 13:10:35,2019-03-25 13:20:23,N,1,244,243,1,1.1,8.0,0.0,0.5,0.0,0.0,,0.3,8.8,2,1,0.0 diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-01.csv index 288e8ac8a023..14f45f9cf3bc 100644 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-01.csv +++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-01.csv @@ -4,18 +4,3 @@ 2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0 5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95, 4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56, -1505,2,2019-01-29 17:58:08,2019-01-29 18:11:19,1,0.72,1,N,13,13,2,9.0,1.0,0.5,0.0,0.0,0.3,10.8,0.0 -8131,1,2019-01-13 16:22:03,2019-01-13 16:33:06,2,1.4,1,N,161,170,1,8.5,0.0,0.5,3.0,0.0,0.3,12.3, -9601,2,2019-01-23 12:07:55,2019-01-23 12:11:19,5,0.57,1,N,239,239,1,4.0,0.0,0.5,1.0,0.0,0.3,5.8,0.0 -4522,2,2019-01-25 15:42:39,2019-01-25 15:50:54,1,1.26,1,N,170,229,1,7.5,0.0,0.5,1.0,0.0,0.3,9.3,0.0 -9008,1,2019-01-13 19:40:27,2019-01-13 19:50:54,3,2.1,1,N,239,263,1,9.5,0.0,0.5,2.05,0.0,0.3,12.35, -7012,2,2019-01-19 20:44:35,2019-01-19 21:05:13,1,3.16,1,N,142,107,1,14.5,0.5,0.5,3.16,0.0,0.3,18.96, -7316,2,2019-01-07 22:38:10,2019-01-07 22:44:31,2,2.4,1,N,233,263,1,8.5,0.5,0.5,1.47,0.0,0.3,11.27, -6384,2,2019-01-25 07:35:59,2019-01-25 07:57:44,1,5.12,1,N,143,231,1,18.5,0.0,0.5,2.5,0.0,0.3,21.8,0.0 -3652,1,2019-01-26 20:43:57,2019-01-26 21:03:51,1,4.4,1,N,13,230,2,17.0,0.5,0.5,0.0,0.0,0.3,18.3,0.0 -6409,2,2019-01-06 22:27:16,2019-01-06 22:33:58,1,1.52,1,N,161,234,2,7.0,0.5,0.5,0.0,0.0,0.3,8.3, -1130,2,2019-01-15 20:05:59,2019-01-15 20:22:11,1,3.96,1,N,233,41,2,14.5,0.5,0.5,0.0,0.0,0.3,15.8, -5618,2,2019-01-14 14:47:18,2019-01-14 14:54:44,1,0.5,1,N,140,237,1,6.0,0.0,0.5,1.0,0.0,0.3,7.8, -8515,1,2019-01-16 16:26:17,2019-01-16 16:41:36,2,1.4,1,N,233,163,1,10.5,1.0,0.5,2.45,0.0,0.3,14.75, -5397,1,2019-01-23 08:31:58,2019-01-23 08:55:43,1,2.6,1,N,246,163,1,16.0,0.0,0.5,5.0,0.0,0.3,21.8,0.0 -4648,1,2019-01-30 19:59:30,2019-01-30 20:10:34,1,1.9,1,N,186,79,1,9.0,0.5,0.5,2.05,0.0,0.3,12.35,0.0 diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-02.csv index 573017273621..dd5bcc804be2 100644 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-02.csv +++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-02.csv @@ -4,18 +4,3 @@ 4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5 1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5 6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5 -1633,2,2019-02-09 12:14:12,2019-02-10 11:50:00,3,3.02,1,N,48,236,1,12.5,0.0,0.5,1.58,0.0,0.3,17.38,2.5 -4450,1,2019-02-05 08:56:21,2019-02-05 09:02:06,1,0.8,1,N,262,75,1,5.5,2.5,0.5,1.7,0.0,0.3,10.5,2.5 -8929,2,2019-02-13 11:44:00,2019-02-13 11:57:32,1,1.13,1,N,100,170,2,9.0,0.0,0.5,0.0,0.0,0.3,12.3,2.5 -6968,2,2019-02-27 20:52:49,2019-02-27 21:18:37,1,4.64,1,N,246,125,1,18.5,0.5,0.5,4.46,0.0,0.3,26.76,2.5 -7969,2,2019-02-01 11:12:35,2019-02-01 11:19:39,1,0.54,1,N,236,263,2,6.0,0.0,0.5,0.0,0.0,0.3,6.8,0.0 -7695,1,2019-02-28 00:51:50,2019-02-28 00:55:37,1,1.3,1,N,263,74,2,6.0,0.5,0.5,0.0,0.0,0.3,7.3,0.0 -9904,1,2019-02-16 17:45:00,2019-02-16 17:51:09,1,1.3,1,N,170,162,1,6.5,2.5,0.5,1.95,0.0,0.3,11.75,2.5 -1580,2,2019-02-07 10:14:39,2019-02-07 10:51:38,1,17.49,2,N,186,132,2,52.0,0.0,0.5,0.0,5.76,0.3,61.06,2.5 -699,2,2019-02-18 11:01:16,2019-02-18 11:11:06,2,2.31,1,N,249,170,1,10.0,0.0,0.5,2.66,0.0,0.3,15.96,2.5 -5181,2,2019-02-28 22:39:19,2019-02-28 22:42:13,1,0.49,1,N,237,162,1,4.0,0.5,0.5,1.56,0.0,0.3,9.36,2.5 -2631,2,2019-02-01 00:55:00,2019-02-01 00:55:20,1,0.0,5,N,265,265,1,78.0,0.0,0.0,10.0,10.5,0.3,98.8,0.0 -4018,2,2019-02-11 16:59:03,2019-02-11 17:05:01,1,1.35,1,N,143,238,1,6.5,1.0,0.5,2.16,0.0,0.3,12.96,2.5 -7582,2,2019-02-08 17:29:00,2019-02-08 17:31:14,1,0.47,1,N,236,236,2,3.5,1.0,0.5,0.0,0.0,0.3,7.8,2.5 -1810,1,2019-02-17 13:10:12,2019-02-17 13:16:03,1,0.5,1,N,161,230,2,5.5,2.5,0.5,0.0,0.0,0.3,8.8,2.5 -5095,2,2019-02-09 14:30:05,2019-02-09 14:42:22,0,1.18,1,N,170,234,1,9.0,0.0,0.5,2.46,0.0,0.3,14.76,2.5 diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-03.csv index 3d254ce261c2..085f0cc37265 100644 --- a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-03.csv +++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-03.csv @@ -4,18 +4,3 @@ 7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0 9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5 2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0 -5030,2,2019-03-13 20:22:28,2019-03-13 20:42:11,2,9.81,1,N,138,244,1,28.0,0.5,0.5,7.01,5.76,0.3,42.07,0.0 -7848,4,2019-03-21 13:22:08,2019-03-21 13:31:36,1,0.86,1,N,229,229,1,7.5,0.0,0.5,2.16,0.0,0.3,12.96,2.5 -167,1,2019-03-15 22:37:06,2019-03-15 22:38:34,1,0.0,1,N,264,145,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0 -7197,2,2019-03-14 20:38:43,2019-03-14 20:45:04,5,1.04,1,N,151,166,1,6.0,0.5,0.5,1.1,0.0,0.3,8.4,0.0 -568,1,2019-03-09 09:29:44,2019-03-09 09:40:36,0,2.7,1,N,24,142,1,10.5,2.5,0.5,2.75,0.0,0.3,16.55,2.5 -2198,2,2019-03-11 18:32:36,2019-03-11 18:48:27,1,2.02,1,N,234,162,2,11.5,1.0,0.5,0.0,0.0,0.3,15.8,2.5 -5249,1,2019-03-15 09:35:22,2019-03-15 09:46:33,1,2.4,1,N,140,161,1,10.0,2.5,0.5,1.33,0.0,0.3,14.63,2.5 -5531,1,2019-03-04 19:44:33,2019-03-04 20:14:34,0,15.8,1,N,132,145,1,43.5,1.0,0.5,11.3,0.0,0.3,56.6,0.0 -4943,1,2019-03-03 15:12:19,2019-03-03 15:16:06,1,1.2,1,N,237,236,1,5.5,2.5,0.5,1.1,0.0,0.3,9.9,2.5 -617,2,2019-03-20 14:48:47,2019-03-20 14:52:43,1,0.59,1,N,142,163,1,4.5,0.0,0.5,1.0,0.0,0.3,8.8,2.5 -7014,2,2019-03-03 23:56:36,2019-03-04 00:22:25,1,7.91,1,N,132,258,2,25.0,0.5,0.5,0.0,0.0,0.3,26.3,0.0 -2138,2,2019-03-23 14:47:31,2019-03-23 14:54:11,2,1.12,1,N,234,161,2,6.5,0.0,0.5,0.0,0.0,0.3,9.8,2.5 -5298,2,2019-03-20 15:47:08,2019-03-20 15:51:15,1,0.54,1,N,161,161,1,4.5,0.0,0.5,1.56,0.0,0.3,9.36,2.5 -9838,1,2019-03-13 20:42:59,2019-03-13 20:50:23,1,1.1,1,N,162,48,1,6.5,3.0,0.5,2.05,0.0,0.3,12.35,2.5 -2565,1,2019-03-13 10:28:36,2019-03-13 10:28:38,1,0.0,1,N,246,246,2,2.5,2.5,0.5,0.0,0.0,0.3,5.8,2.5 From 36c1ad872f0c801ec8f42208016ae8a57a32158e Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 26 Oct 2021 10:13:22 -0400 Subject: [PATCH 46/62] Listing all available inferred and configured DataConnectors --- .../how_to_choose_which_dataconnector_to_use.md | 13 +++++++++++-- ...w_to_configure_a_configuredassetdataconnector.md | 2 +- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index 26395f0f6412..e19bb25dbd50 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -16,8 +16,17 @@ This guide demonstrates how to choose which `DataConnector`s to configure within Great Expectations provides three types of `DataConnector` classes. Two classes are for connecting to Data Assets stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data: -- An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. Examples of this type of `DataConnector` include `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector`. -- A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each Data Asset you want to connect to. Examples of this type of `DataConnector` include `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector`. +- An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. +- A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each Data Asset you want to connect to. + +| InferredAssetDataConnectors | ConfiguredAssetDataConnectors | +| --- | --- | +| InferredAssetFilesystemDataConnector | ConfiguredAssetFilesystemDataConnector | +| InferredAssetFilePathDataConnector | ConfiguredAssetFilePathDataConnector | +| InferredAssetAzureDataConnector | ConfiguredAssetAzureDataConnector | +| InferredAssetGCSDataConnector | ConfiguredAssetGCSDataConnector | +| InferredAssetS3DataConnector | ConfiguredAssetS3DataConnector | +| InferredAssetSqlDataConnector | ConfiguredAssetSqlDataConnector | The third type of `DataConnector` class is for providing a batch's data directly at runtime: diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 1ffe7467aa34..677152dfaedb 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -140,7 +140,7 @@ In that case, the configuration would look like the following: Notice that we have specified a pattern that captures the year-month combination after `yellow_tripdata_` in the filename and assigns it to the `group_name` `month`. -The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. For more information about capture groups, refer to the Python documentation on [regular expressions](https://docs.python.org/3/library/re.html#re.Match.group) +The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. For more information about capture groups, refer to the Python documentation on [regular expressions](https://docs.python.org/3/library/re.html#re.Match.group). Later on we could retrieve the data in `yellow_tripdata_2019-02.csv` of `yellow_tripdata` as its own batch using `context.get_validator()` by specifying `{"month": "2019-02"}` as the `batch_identifier`. From 9620eb663adc448d159be20c6764f0009792dc08 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 26 Oct 2021 10:14:47 -0400 Subject: [PATCH 47/62] Clean up --- .../how_to_choose_which_dataconnector_to_use.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index e19bb25dbd50..79ff6fd6b7c8 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -49,7 +49,7 @@ If you have the following `/` directory in your filesystem, and yo /green_tripdata/2019-03.csv ``` -This config... +This configuration: Date: Tue, 26 Oct 2021 10:47:46 -0400 Subject: [PATCH 48/62] Escape underscore --- .../how_to_configure_a_configuredassetdataconnector.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 677152dfaedb..37a2d6b3035e 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -140,7 +140,7 @@ In that case, the configuration would look like the following: Notice that we have specified a pattern that captures the year-month combination after `yellow_tripdata_` in the filename and assigns it to the `group_name` `month`. -The configuration would also work with a regex capturing the entire filename (ie `pattern: (.*)\\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. For more information about capture groups, refer to the Python documentation on [regular expressions](https://docs.python.org/3/library/re.html#re.Match.group). +The configuration would also work with a regex capturing the entire filename (e.g. `pattern: (.*)\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. For more information about capture groups, refer to the Python documentation on [regular expressions](https://docs.python.org/3/library/re.html#re.Match.group). Later on we could retrieve the data in `yellow_tripdata_2019-02.csv` of `yellow_tripdata` as its own batch using `context.get_validator()` by specifying `{"month": "2019-02"}` as the `batch_identifier`. @@ -256,9 +256,9 @@ Available data_asset_names (1 of 1): Unmatched data_references (3 of 3):['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv'] ``` -Notice that `yellow_tripdata` has 0 data_references, and there are 3 `Unmatched data_references` listed. +Notice that `yellow_tripdata` has 0 `data_references`, and there are 3 `Unmatched data_references` listed. This would indicate that some part of the configuration is incorrect and would need to be reviewed. -In our case, changing `pattern` to : `yellow_tripdata_(.*)\\.csv` will fix our problem and give the same output to above. +In our case, changing `pattern` to `yellow_tripdata_(.*)\.csv` will fix our problem and give the same output to above. ### Example 2: Basic configuration with more than one Data Asset From 3e62c56101369df35f0911f5826fb17ef0f94b4f Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 26 Oct 2021 10:51:03 -0400 Subject: [PATCH 49/62] Clean up --- .../how_to_configure_an_inferredassetdataconnector.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 098ccecb9b0d..eeca5f5174ae 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -20,7 +20,7 @@ Great Expectations provides two types of `DataConnector` classes for connecting - A ConfiguredAssetDataConnector requires an explicit listing of each Data Asset you want to connect to. This allows more fine-tuning, but also requires more setup. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. -InferredAssetDataConnector has fewer options, so it's simpler to set up. It’s a good choice if you want to connect to a single Data Asset, or several `Data Assets` that all share the same naming convention. +InferredAssetDataConnector has fewer options, so it's simpler to set up. It’s a good choice if you want to connect to a single Data Asset, or several Data Assets that all share the same naming convention. If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md). From fed71265107c2ce6f5c783231df8d39090b77c4e Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 26 Oct 2021 11:07:30 -0400 Subject: [PATCH 50/62] Links to test scripts provided at the bottom of each document --- .../how_to_choose_which_dataconnector_to_use.md | 3 +++ .../how_to_configure_a_configuredassetdataconnector.md | 4 ++++ .../how_to_configure_a_runtimedataconnector.md | 3 +++ .../how_to_configure_an_inferredassetdataconnector.md | 4 ++++ 4 files changed, 14 insertions(+) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index 79ff6fd6b7c8..3276a0a5bdce 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -137,3 +137,6 @@ Unmatched data_references (0 of 0):[] - Additional examples and configurations for `ConfiguredAssetFilesystemDataConnector`s can be found here: [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_configuredassetdataconnector.md) - Additional examples and configurations for `InferredAssetFilesystemDataConnector`s can be found here: [How to configure an InferredAssetDataConnector](./how_to_configure_an_inferredassetdataconnector.md) - Additional examples and configurations for `RuntimeDataConnector`s can be found here: [How to configure a RuntimeDataConnector](./how_to_configure_a_runtimedataconnector.md) + +To view the full script used in this page, see it on GitHub: +- [how_to_choose_which_dataconnector_to_use.py](https://github.com/great-expectations/great_expectations/tree/develop/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 37a2d6b3035e..8e70ae60bca4 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -398,3 +398,7 @@ Available data_asset_names (2 of 2): Unmatched data_references (0 of 0):[] ``` + +### Additional Notes +To view the full script used in this page, see it on GitHub: +- [how_to_configure_a_configuredassetdataconnector.py](https://github.com/great-expectations/great_expectations/tree/develop/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md index 46fb6991202b..a5f98bd8105d 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md @@ -153,3 +153,6 @@ Next, you would pass that request into `context.get_validator`: ```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L94-L98 ``` +### Additional Notes +To view the full script used in this page, see it on GitHub: +- [how_to_configure_a_runtimedataconnector.py](https://github.com/great-expectations/great_expectations/tree/develop/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index eeca5f5174ae..fbe5ed72ab2c 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -439,3 +439,7 @@ Available data_asset_names (2 of 2): Unmatched data_references (0 of 0):[] ``` + +### Additional Notes +To view the full script used in this page, see it on GitHub: +- [how_to_configure_an_inferredassetdataconnector.py](https://github.com/great-expectations/great_expectations/tree/develop/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py) From 6d4073d8c428dbca34b73e9c4fc53cf144ab0201 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 26 Oct 2021 13:28:08 -0400 Subject: [PATCH 51/62] Data Assets and data_references clarification --- .../how_to_choose_which_dataconnector_to_use.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index 3276a0a5bdce..4174e8b3d701 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -28,6 +28,8 @@ Great Expectations provides three types of `DataConnector` classes. Two classes | InferredAssetS3DataConnector | ConfiguredAssetS3DataConnector | | InferredAssetSqlDataConnector | ConfiguredAssetSqlDataConnector | +InferredAssetDataConnectors and ConfiguredAssetDataConnectors are used to define Data Assets and their associated data_references. A Data Asset is an abstraction that can consist of one or more data_references to CSVs or relational database tables. + The third type of `DataConnector` class is for providing a batch's data directly at runtime: - A `RuntimeDataConnector` enables you to use a `RuntimeBatchRequest` to wrap either an in-memory dataframe, filepath, or SQL query, and must include batch identifiers that uniquely identify the data (e.g. a `run_id` from an AirFlow DAG run). @@ -38,7 +40,7 @@ If you aren't sure which type of the remaining `DataConnector`s to use, the foll ### When to use an InferredAssetDataConnector -If you have the following `/` directory in your filesystem, and you want to treat the `yellow_tripdata_*.csv` files as batches within the `yellow_tripdata` Data Asset, and do the same for files in the `green_tripdata` directory: +If you have the following `/` directory in your filesystem, and you want to treat the `yellow_tripdata_*.csv` files as batches within the `yellow_tripdata` Data Asset, and also do the same for files in the `green_tripdata` directory: ``` /yellow_tripdata/yellow_tripdata_2019-01.csv @@ -99,7 +101,7 @@ If you have the same `/` directory in your filesystem, /green_tripdata/2019-03.csv ``` -Then this config... +Then this configuration: Date: Wed, 27 Oct 2021 09:18:18 -0400 Subject: [PATCH 52/62] Minor revisions --- ...ow_to_choose_which_dataconnector_to_use.md | 4 ++-- ...onfigure_a_configuredassetdataconnector.md | 20 +++++++++-------- ...configure_an_inferredassetdataconnector.md | 22 +++++++++---------- ...configure_an_inferredassetdataconnector.py | 4 ++-- 4 files changed, 26 insertions(+), 24 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index 4174e8b3d701..fcc52edcc99b 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -88,7 +88,7 @@ Note that the `InferredAssetFileSystemDataConnector` **infers** `data_asset_name ### When to use a ConfiguredAssetDataConnector -On the other hand, `ConfiguredAssetFilesSystemDataConnector` requires an explicit listing of each Data Asset you want to connect to. This tends to be helpful when the naming conventions for your Data Assets are less standardized. +On the other hand, `ConfiguredAssetFilesSystemDataConnector` requires an explicit listing of each Data Asset you want to connect to. This tends to be helpful when the naming conventions for your Data Assets are less standardized, but the user has a strong understanding of the semantics governing the segmentation of data (files, database tables). If you have the same `/` directory in your filesystem, @@ -124,7 +124,7 @@ Then this configuration: -...will make available the following Data Assets and data_references: +will make available the following Data Assets and data_references: ```bash Available data_asset_names (2 of 2): diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 8e70ae60bca4..5a845ca7d8a8 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -185,7 +185,7 @@ Continuing the example above, imagine you have the following files in the direct /yellow_tripdata_2019-03.csv ``` -Then this configuration... +Then this configuration: -...will make available `yelow_tripdata` as a single Data Asset with the following data_references: +will make available `yelow_tripdata` as a single Data Asset with the following data_references: ```bash Available data_asset_names (1 of 1): @@ -224,7 +224,7 @@ Once configured, you can get a `Validator` from the `Data Context` as follows: But what if the regex does not match any files in the directory? -Then this configuration... +Then this configuration: -...will give you this output +will give you this output ```bash Available data_asset_names (1 of 1): @@ -276,7 +276,7 @@ Here’s a similar example, but this time two Data Assets are mixed together in /green_tripdata_2019-03.csv ``` -Then this configuration... +Then this configuration: -...will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references: +will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references: ```bash Available data_asset_names (2 of 2): @@ -322,6 +322,8 @@ In the following example, files are placed folders that match the `data_asset_na /green_tripdata/2019-03.csv ``` +The following configuration: + -...will now make `yellow_tripdata` and `green_tripdata` available a Data Assets, with the following data_references: +will now make `yellow_tripdata` and `green_tripdata` available a Data Assets, with the following data_references: ```bash Available data_asset_names (2 of 2): @@ -366,7 +368,7 @@ In this example, the assets `yellow_tripdata` and `green_tripdata` are being exp /green_tripdata/green_tripdata_2019-03.csv ``` -The following configuration... +The following configuration: -...will make `yellow_tripdata` and `green_tripdata` available as Data Assets, with the following data_references: +will make `yellow_tripdata` and `green_tripdata` available as Data Assets, with the following data_references: ```bash Available data_asset_names (2 of 2): diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index fbe5ed72ab2c..3e5f22edc97c 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -141,7 +141,7 @@ The simplest approach would be to consider each file to be its own Data Asset. I -Notice that the `default_regex` is configured to have one capture group (`(.*)`) which captures the entire filename. That capture group is assigned to `data_asset_name` under `group_names`. +Notice that the `default_regex` is configured to have one capture group (`(.*)`) which captures the entire filename. That capture group is assigned to `data_asset_name` under `group_names`. For InferredAssetDataConnectors `data_asset_name` is a required `group_name`, and it's associated capture group is the way each `data_asset_name` is inferred. Running `test_yaml_config()` would result in 3 Data Assets : `yellow_tripdata_2019-01`, `yellow_tripdata_2019-02` and `yellow_tripdata_2019-03`. However, a closer look at the filenames reveals a pattern that is common to the 3 files. Each have `yellow_tripdata_` in the name, and have date information afterwards. These are the types of patterns that InferredAssetDataConnectors allow you to take advantage of. @@ -209,7 +209,7 @@ Continuing the example above, imagine you have the following files in the direct /yellow_tripdata_2019-03.csv ``` -Then this configuration... +Then this configuration: -...will make available `yelow_tripdata` as a single Data Asset with the following data_references: +will make available `yelow_tripdata` as a single Data Asset with the following data_references: ```bash Available data_asset_names (1 of 1): @@ -262,7 +262,7 @@ in [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_config /green_tripdata_2019-03.csv ``` -The same configuration as Example 1... +The same configuration as Example 1: -...will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references: +will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references: ```bash Available data_asset_names (2 of 2): @@ -298,7 +298,7 @@ Unmatched data_references (0 of 0): [] ### Example 3: Nested directory structure with the data_asset_name on the inside -Here’s a similar example, with a nested directory structure... +Here’s a similar example, with a nested directory structure: ``` /2018/10/yellow_tripdata.csv @@ -315,7 +315,7 @@ Here’s a similar example, with a nested directory structure... /2019/03/green_tripdata.csv ``` -Then this configuration... +Then this configuration: -...will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references: +will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references: ```bash Available data_asset_names (2 of 2): @@ -361,7 +361,7 @@ In the following example, files are placed in a folder structure with the `data_ /green_tripdata/2019-03.csv ``` -Then this configuration... +Then this configuration: -...will now make `yellow_tripdata` and `green_tripdata` into Data Assets, with each containing 3 data_references +will now make `yellow_tripdata` and `green_tripdata` into Data Assets, with each containing 3 data_references ```bash Available data_asset_names (2 of 2): @@ -407,7 +407,7 @@ In the following example, files are placed in a folder structure with the `data_ /green_tripdata/2019-03.csv ``` -Then this configuration... +Then this configuration: Date: Wed, 27 Oct 2021 09:23:54 -0400 Subject: [PATCH 53/62] Example for data_references --- .../how_to_choose_which_dataconnector_to_use.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index fcc52edcc99b..fefa14c2e512 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -28,7 +28,7 @@ Great Expectations provides three types of `DataConnector` classes. Two classes | InferredAssetS3DataConnector | ConfiguredAssetS3DataConnector | | InferredAssetSqlDataConnector | ConfiguredAssetSqlDataConnector | -InferredAssetDataConnectors and ConfiguredAssetDataConnectors are used to define Data Assets and their associated data_references. A Data Asset is an abstraction that can consist of one or more data_references to CSVs or relational database tables. +InferredAssetDataConnectors and ConfiguredAssetDataConnectors are used to define Data Assets and their associated data_references. A Data Asset is an abstraction that can consist of one or more data_references to CSVs or relational database tables. For instance, you might have a `yellow_tripdata` Data Asset containing information about taxi rides, which consists of twelve data_references to twelve CSVs, each consisting of one month of data. The third type of `DataConnector` class is for providing a batch's data directly at runtime: From b9accbc227186e70dc5eab45b5b3701b2a709636 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 27 Oct 2021 09:45:50 -0400 Subject: [PATCH 54/62] Update glob_directive docstrings --- .../configured_asset_filesystem_data_connector.py | 2 +- .../data_connector/inferred_asset_filesystem_data_connector.py | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/great_expectations/datasource/data_connector/configured_asset_filesystem_data_connector.py b/great_expectations/datasource/data_connector/configured_asset_filesystem_data_connector.py index 188b8bc8d822..4a275aac0f57 100644 --- a/great_expectations/datasource/data_connector/configured_asset_filesystem_data_connector.py +++ b/great_expectations/datasource/data_connector/configured_asset_filesystem_data_connector.py @@ -49,7 +49,7 @@ def __init__( assets (dict): configured assets as a dictionary. These can each have their own regex and sorters execution_engine (ExecutionEngine): ExecutionEngine object to actually read the data default_regex (dict): Optional dict the filter and organize the data_references. - glob_directive (str): glob for selecting files in directory (defaults to *) + glob_directive (str): glob for selecting files in directory (defaults to **/*) or nested directories (e.g. */*/*.csv) sorters (list): Optional list if you want to sort the data_references batch_spec_passthrough (dict): dictionary with keys that will be added directly to batch_spec diff --git a/great_expectations/datasource/data_connector/inferred_asset_filesystem_data_connector.py b/great_expectations/datasource/data_connector/inferred_asset_filesystem_data_connector.py index f75388131fa6..be8e01e54cf9 100644 --- a/great_expectations/datasource/data_connector/inferred_asset_filesystem_data_connector.py +++ b/great_expectations/datasource/data_connector/inferred_asset_filesystem_data_connector.py @@ -48,6 +48,7 @@ def __init__( base_directory(str): base_directory for DataConnector to begin reading files execution_engine (ExecutionEngine): ExecutionEngine object to actually read the data default_regex (dict): Optional dict the filter and organize the data_references. + glob_directive (str): glob for selecting files in directory (defaults to *) or nested directories (e.g. */*.csv) sorters (list): Optional list if you want to sort the data_references batch_spec_passthrough (dict): dictionary with keys that will be added directly to batch_spec """ From 4ea85cc4a74d3cc590072ceba88670b5096ef4cb Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 27 Oct 2021 10:10:04 -0400 Subject: [PATCH 55/62] Explain what glob_directive does in this context --- .../how_to_choose_which_dataconnector_to_use.md | 4 +++- .../how_to_configure_an_inferredassetdataconnector.md | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md index fefa14c2e512..dd6fa4bd2a05 100644 --- a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md +++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md @@ -74,7 +74,7 @@ This configuration: -...will make available the following Data Assets and data_references: +will make available the following Data Assets and data_references: ```bash Available data_asset_names (2 of 2): @@ -86,6 +86,8 @@ Unmatched data_references (0 of 0):[] Note that the `InferredAssetFileSystemDataConnector` **infers** `data_asset_names` **from the regex you provide.** This is the key difference between InferredAssetDataConnector and ConfiguredAssetDataConnector, and also requires that one of the `group_names` in the `default_regex` configuration be `data_asset_name`. +The `glob_directive` is provided to give the `DataConnector` information about the directory structure to expect for each Data Asset. The default `glob_directive` for the `InferredAssetFileSystemDataConnector` is `"*"` and therefore must be overridden when your data_references exist in subdirectories. + ### When to use a ConfiguredAssetDataConnector On the other hand, `ConfiguredAssetFilesSystemDataConnector` requires an explicit listing of each Data Asset you want to connect to. This tends to be helpful when the naming conventions for your Data Assets are less standardized, but the user has a strong understanding of the semantics governing the segmentation of data (files, database tables). diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 3e5f22edc97c..27d0329e23f8 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -348,6 +348,8 @@ Available data_asset_names (2 of 2): Unmatched data_references (0 of 0):[] ``` +The `glob_directive` is provided to give the `DataConnector` information about the directory structure to expect for each Data Asset. The default `glob_directive` for the `InferredAssetFileSystemDataConnector` is `"*"` and therefore must be overridden when your data_references exist in subdirectories. + ### Example 4: Nested directory structure with the data_asset_name on the outside In the following example, files are placed in a folder structure with the `data_asset_name` defined by the folder name (`yellow_tripdata` or `green_tripdata`) From c977227af8ea431a5b6b4696d3d81573acff0be7 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 27 Oct 2021 10:13:55 -0400 Subject: [PATCH 56/62] Better explanation for ConfiguredAssetDataConnector --- .../how_to_configure_an_inferredassetdataconnector.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 27d0329e23f8..244e28d8470d 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -17,7 +17,7 @@ can use for configuration. Great Expectations provides two types of `DataConnector` classes for connecting to Data Assets stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data: -- A ConfiguredAssetDataConnector requires an explicit listing of each Data Asset you want to connect to. This allows more fine-tuning, but also requires more setup. +- A ConfiguredAssetDataConnector allows you to specify that you have multiple Data Assets in a `Datasource`, but also requires an explicit listing of each Data Asset you want to connect to. This allows more fine-tuning, but also requires more setup. - An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure. InferredAssetDataConnector has fewer options, so it's simpler to set up. It’s a good choice if you want to connect to a single Data Asset, or several Data Assets that all share the same naming convention. From 3630de2c9d87543c7b46283e672544c7649fa6ea Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Wed, 27 Oct 2021 10:18:08 -0400 Subject: [PATCH 57/62] Also add clarifiaction for what we mean by Data Asset to how_to_configure_a_configuredassetdataconnector.md --- .../how_to_configure_a_configuredassetdataconnector.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md index 5a845ca7d8a8..9a3236c44b7b 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md @@ -102,11 +102,9 @@ If you’re not familiar with the `test_yaml_config` method, please check out: [ ### 3. Add a ConfiguredAssetDataConnector to a Datasource configuration -ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require Data Assets to be -explicitly named. Each Data Asset can have their own regex `pattern` and `group_names`, and if configured, will override any -`pattern` or `group_names` under `default_regex`. +ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require Data Assets to be explicitly named. A Data Asset is an abstraction that can consist of one or more data_references to CSVs or relational database tables. For instance, you might have a `yellow_tripdata` Data Asset containing information about taxi rides, which consists of twelve data_references to twelve CSVs, each consisting of one month of data. Each Data Asset can have their own regex `pattern` and `group_names`, and if configured, will override any `pattern` or `group_names` under `default_regex`. -Imagine you have the following files in `my_directory/`: +Imagine you have the following files in `/`: ``` /yellow_tripdata_2019-01.csv From d21bfe81f8d188a57e05fe67863b8dbf800990c4 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 2 Nov 2021 09:19:30 -0400 Subject: [PATCH 58/62] Example for loading a specific batch with batch_identifiers --- ...configure_an_inferredassetdataconnector.md | 53 ++++++++++++++++--- ...configure_an_inferredassetdataconnector.py | 21 ++++++++ 2 files changed, 68 insertions(+), 6 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 244e28d8470d..d3cf6bc09ec7 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -246,6 +246,47 @@ Once configured, you can get `Validators` from the `Data Context` as follows: ```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L294-L303 ``` +Since this `BatchRequest` does not specify which data_reference to load, the `ActiveBatch` for the validator will be the last data_reference that was loaded. In this case, `yellow_tripdata_2019-03.csv` is what is being used by `validator`. We can verfiy this with: + +```python +print(validator.active_batch_definition) +``` + +which outputs: +```bash +{ + "datasource_name": "taxi_datasource", + "data_connector_name": "default_inferred_data_connector_name", + "data_asset_name": "yellow_tripdata", + "batch_identifiers": { + "year": "2019", + "month": "03" + } +} +``` + +Notice that the `batch_identifiers` for this `batch_definition` specify `"year": "2019", "month": "03"`. The parameter `batch_identifiers` can be used in our `BatchRequest` to return the data_reference CSV of our choosing using the `group_names` defined in our `DataConnector`: + +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L308-L320 +``` + +```python +print(validator.active_batch_definition) +``` + +which outputs: +```bash +{ + "datasource_name": "taxi_datasource", + "data_connector_name": "default_inferred_data_connector_name", + "data_asset_name": "yellow_tripdata", + "batch_identifiers": { + "year": "2019", + "month": "02" + } +} +``` + ### Example 2: Basic configuration with more than one Data Asset Here’s a similar example, but this time two Data Assets are mixed together in one folder. @@ -326,13 +367,13 @@ Then this configuration: ]}> -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L306-L323 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L327-L345 ``` -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L334-L352 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L356-L375 ``` @@ -374,13 +415,13 @@ Then this configuration: ]}> -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L384-L403 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L405-L424 ``` -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L414-L438 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L435-L459 ``` @@ -420,13 +461,13 @@ Then this configuration: ]}> -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L468-L486 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L489-L507 ``` -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L497-L520 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L518-L541 ``` diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index 436d7dadea53..c1566211380a 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -302,6 +302,27 @@ create_expectation_suite_with_name="", ) +# NOTE: The following code is only for testing and can be ignored by users. +assert isinstance(validator, ge.validator.validator.Validator) + +batch_request = BatchRequest( + datasource_name="taxi_datasource", + data_connector_name="default_inferred_data_connector_name", + data_asset_name="yellow_tripdata", + data_connector_query={ + "batch_filter_parameters": {"year": "2019", "month": "02"} + }, +) + +validator = context.get_validator( + batch_request=batch_request, + expectation_suite_name="", +) + +# NOTE: The following code is only for testing and can be ignored by users. +assert isinstance(validator, ge.validator.validator.Validator) +assert validator.active_batch_definition.batch_identifiers == {"year": "2019", "month": "02"} + # YAML datasource_yaml = """ name: taxi_datasource From 2b1086c83b1a15a8087d5621c1f4c6d48487fcb6 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 2 Nov 2021 09:23:04 -0400 Subject: [PATCH 59/62] Linting --- .../how_to_configure_an_inferredassetdataconnector.py | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py index c1566211380a..02ed1bc20605 100644 --- a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py +++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py @@ -309,9 +309,7 @@ datasource_name="taxi_datasource", data_connector_name="default_inferred_data_connector_name", data_asset_name="yellow_tripdata", - data_connector_query={ - "batch_filter_parameters": {"year": "2019", "month": "02"} - }, + data_connector_query={"batch_filter_parameters": {"year": "2019", "month": "02"}}, ) validator = context.get_validator( @@ -321,7 +319,10 @@ # NOTE: The following code is only for testing and can be ignored by users. assert isinstance(validator, ge.validator.validator.Validator) -assert validator.active_batch_definition.batch_identifiers == {"year": "2019", "month": "02"} +assert validator.active_batch_definition.batch_identifiers == { + "year": "2019", + "month": "02", +} # YAML datasource_yaml = """ From 0b79069b12282cd7bbf872504468abc0915eb467 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 2 Nov 2021 09:25:14 -0400 Subject: [PATCH 60/62] Re-align line numbers after lint --- ...how_to_configure_an_inferredassetdataconnector.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index d3cf6bc09ec7..6f777d772992 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -367,13 +367,13 @@ Then this configuration: ]}> -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L327-L345 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L328-L346 ``` -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L356-L375 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L357-L376 ``` @@ -415,13 +415,13 @@ Then this configuration: ]}> -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L405-L424 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L406-L425 ``` -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L435-L459 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L436-L460 ``` @@ -461,13 +461,13 @@ Then this configuration: ]}> -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L489-L507 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L490-L508 ``` -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L518-L541 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L519-L542 ``` From 648f64d3482529464a19dc86b64b73f755d4e3c3 Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 2 Nov 2021 09:28:23 -0400 Subject: [PATCH 61/62] Batching core concepts link --- .../how_to_configure_an_inferredassetdataconnector.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index 6f777d772992..febfec680181 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -287,6 +287,9 @@ which outputs: } ``` +This ability to access specific Batches using `batch_identifiers` is very useful when validating Data Assets that span multiple files. +For more information on `batches` and `batch_identifiers`, please refer to the [Core Concepts document](../../reference/dividing_data_assets_into_batches.md). + ### Example 2: Basic configuration with more than one Data Asset Here’s a similar example, but this time two Data Assets are mixed together in one folder. From b78069127bbebf15eb8d68bcc9396388eaf5ef7b Mon Sep 17 00:00:00 2001 From: Nathan Farmer Date: Tue, 2 Nov 2021 09:53:06 -0400 Subject: [PATCH 62/62] Clean up --- .../how_to_configure_an_inferredassetdataconnector.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md index febfec680181..b249d50864a3 100644 --- a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md +++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md @@ -252,7 +252,7 @@ Since this `BatchRequest` does not specify which data_reference to load, the `Ac print(validator.active_batch_definition) ``` -which outputs: +which prints: ```bash { "datasource_name": "taxi_datasource", @@ -267,14 +267,14 @@ which outputs: Notice that the `batch_identifiers` for this `batch_definition` specify `"year": "2019", "month": "03"`. The parameter `batch_identifiers` can be used in our `BatchRequest` to return the data_reference CSV of our choosing using the `group_names` defined in our `DataConnector`: -```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L308-L320 +```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L308-L318 ``` ```python print(validator.active_batch_definition) ``` -which outputs: +which prints: ```bash { "datasource_name": "taxi_datasource",