diff --git a/docs/changelog.md b/docs/changelog.md
index 108eed43e149..e150fe732b72 100644
--- a/docs/changelog.md
+++ b/docs/changelog.md
@@ -3,6 +3,7 @@ title: Changelog
---
### Develop
+* [DOCS] Choosing and configuring DataConnectors (#3533)
### 0.13.40
* [FEATURE] Retrieve data context config through Cloud API endpoint #3586
@@ -56,7 +57,6 @@ title: Changelog
* [MAINTENANCE] Content and test script update (#3532)
* [MAINTENANCE] Provide Deprecation Notice for the "parse_strings_as_datetimes" Expectation Parameter in V3 (#3539)
-
### 0.13.37
* [FEATURE] Implement CompoundColumnsUnique metric for SqlAlchemyExecutionEngine (#3477)
* [FEATURE] add get_available_data_asset_names_and_types (#3476)
diff --git a/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md
new file mode 100644
index 000000000000..dd6fa4bd2a05
--- /dev/null
+++ b/docs/guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.md
@@ -0,0 +1,146 @@
+---
+title: How to choose which DataConnector to use
+---
+import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx'
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+This guide demonstrates how to choose which `DataConnector`s to configure within your `Datasource`s.
+
+
+
+- [Understand the basics of Datasources in the V3 (Batch Request) API](../../reference/datasources.md)
+- Learned how to configure a [Data Context using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md)
+
+
+
+Great Expectations provides three types of `DataConnector` classes. Two classes are for connecting to Data Assets stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data:
+
+- An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure.
+- A ConfiguredAssetDataConnector allows users to have the most fine-tuning, and requires an explicit listing of each Data Asset you want to connect to.
+
+| InferredAssetDataConnectors | ConfiguredAssetDataConnectors |
+| --- | --- |
+| InferredAssetFilesystemDataConnector | ConfiguredAssetFilesystemDataConnector |
+| InferredAssetFilePathDataConnector | ConfiguredAssetFilePathDataConnector |
+| InferredAssetAzureDataConnector | ConfiguredAssetAzureDataConnector |
+| InferredAssetGCSDataConnector | ConfiguredAssetGCSDataConnector |
+| InferredAssetS3DataConnector | ConfiguredAssetS3DataConnector |
+| InferredAssetSqlDataConnector | ConfiguredAssetSqlDataConnector |
+
+InferredAssetDataConnectors and ConfiguredAssetDataConnectors are used to define Data Assets and their associated data_references. A Data Asset is an abstraction that can consist of one or more data_references to CSVs or relational database tables. For instance, you might have a `yellow_tripdata` Data Asset containing information about taxi rides, which consists of twelve data_references to twelve CSVs, each consisting of one month of data.
+
+The third type of `DataConnector` class is for providing a batch's data directly at runtime:
+
+- A `RuntimeDataConnector` enables you to use a `RuntimeBatchRequest` to wrap either an in-memory dataframe, filepath, or SQL query, and must include batch identifiers that uniquely identify the data (e.g. a `run_id` from an AirFlow DAG run).
+
+If you know for example, that your Pipeline Runner will already have your batch data in memory at runtime, you can choose to configure a `RuntimeDataConnector` with unique batch identifiers. Reference the documents on [How to configure a RuntimeDataConnector](guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md) and [How to create a Batch of data from an in-memory Spark or Pandas dataframe](guides/connecting_to_your_data/how_to_create_a_batch_of_data_from_an_in_memory_spark_or_pandas_dataframe.md) to get started with `RuntimeDataConnectors`.
+
+If you aren't sure which type of the remaining `DataConnector`s to use, the following examples will use `DataConnector` classes designed to connect to files on disk, namely `InferredAssetFilesystemDataConnector` and `ConfiguredAssetFilesystemDataConnector` to demonstrate the difference between these types of `DataConnectors`.
+
+### When to use an InferredAssetDataConnector
+
+If you have the following `/` directory in your filesystem, and you want to treat the `yellow_tripdata_*.csv` files as batches within the `yellow_tripdata` Data Asset, and also do the same for files in the `green_tripdata` directory:
+
+```
+/yellow_tripdata/yellow_tripdata_2019-01.csv
+/yellow_tripdata/yellow_tripdata_2019-02.csv
+/yellow_tripdata/yellow_tripdata_2019-03.csv
+/green_tripdata/2019-01.csv
+/green_tripdata/2019-02.csv
+/green_tripdata/2019-03.csv
+```
+
+This configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py#L8-L26
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py#L37-L60
+```
+
+
+
+
+will make available the following Data Assets and data_references:
+
+```bash
+Available data_asset_names (2 of 2):
+ green_tripdata (3 of 3): ['green_tripdata/*2019-01.csv', 'green_tripdata/*2019-02.csv', 'green_tripdata/*2019-03.csv']
+ yellow_tripdata (3 of 3): ['yellow_tripdata/*2019-01.csv', 'yellow_tripdata/*2019-02.csv', 'yellow_tripdata/*2019-03.csv']
+
+Unmatched data_references (0 of 0):[]
+```
+
+Note that the `InferredAssetFileSystemDataConnector` **infers** `data_asset_names` **from the regex you provide.** This is the key difference between InferredAssetDataConnector and ConfiguredAssetDataConnector, and also requires that one of the `group_names` in the `default_regex` configuration be `data_asset_name`.
+
+The `glob_directive` is provided to give the `DataConnector` information about the directory structure to expect for each Data Asset. The default `glob_directive` for the `InferredAssetFileSystemDataConnector` is `"*"` and therefore must be overridden when your data_references exist in subdirectories.
+
+### When to use a ConfiguredAssetDataConnector
+
+On the other hand, `ConfiguredAssetFilesSystemDataConnector` requires an explicit listing of each Data Asset you want to connect to. This tends to be helpful when the naming conventions for your Data Assets are less standardized, but the user has a strong understanding of the semantics governing the segmentation of data (files, database tables).
+
+If you have the same `/` directory in your filesystem,
+
+```
+/yellow_tripdata/yellow_tripdata_2019-01.csv
+/yellow_tripdata/yellow_tripdata_2019-02.csv
+/yellow_tripdata/yellow_tripdata_2019-03.csv
+/green_tripdata/2019-01.csv
+/green_tripdata/2019-02.csv
+/green_tripdata/2019-03.csv
+```
+
+Then this configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py#L90-L114
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py#L125-L151
+```
+
+
+
+
+will make available the following Data Assets and data_references:
+
+```bash
+Available data_asset_names (2 of 2):
+ green_tripdata (3 of 3): ['2019-01.csv', '2019-02.csv', '2019-03.csv']
+ yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv']
+
+Unmatched data_references (0 of 0):[]
+```
+
+### Additional Notes
+
+- Additional examples and configurations for `ConfiguredAssetFilesystemDataConnector`s can be found here: [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_configuredassetdataconnector.md)
+- Additional examples and configurations for `InferredAssetFilesystemDataConnector`s can be found here: [How to configure an InferredAssetDataConnector](./how_to_configure_an_inferredassetdataconnector.md)
+- Additional examples and configurations for `RuntimeDataConnector`s can be found here: [How to configure a RuntimeDataConnector](./how_to_configure_a_runtimedataconnector.md)
+
+To view the full script used in this page, see it on GitHub:
+- [how_to_choose_which_dataconnector_to_use.py](https://github.com/great-expectations/great_expectations/tree/develop/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py)
diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md
new file mode 100644
index 000000000000..9a3236c44b7b
--- /dev/null
+++ b/docs/guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.md
@@ -0,0 +1,404 @@
+---
+title: How to configure a ConfiguredAssetDataConnector
+---
+import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx'
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+This guide demonstrates how to configure a ConfiguredAssetDataConnector, and provides several examples you can use for configuration.
+
+
+
+- [Understand the basics of Datasources in 0.13 or later](../../reference/datasources.md)
+- Learned how to configure a [Data Context using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md)
+
+
+
+Great Expectations provides two `DataConnector` classes for connecting to Data Assets stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data:
+
+- A ConfiguredAssetDataConnector allows you to specify that you have multiple Data Assets in a `Datasource`, but also requires an explicit listing of each Data Asset you want to connect to. This allows more fine-tuning, but also requires more setup.
+- An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure.
+
+If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md).
+
+## Steps
+
+### 1. Instantiate your project's DataContext
+
+Import these necessary packages and modules:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L3-L4
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L1-L4
+```
+
+
+
+
+### 2. Set up a Datasource
+
+All of the examples below assume you’re testing configuration using something like:
+
+
+
+
+```python
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+execution_engine:
+ class_name: PandasExecutionEngine
+data_connectors:
+ :
+
+"""
+context.test_yaml_config(yaml_config=datasource_config)
+```
+
+
+
+
+```python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "": {
+ ""
+ },
+ },
+}
+context.test_yaml_config(yaml.dump(datasource_config))
+```
+
+
+
+
+If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md)
+
+### 3. Add a ConfiguredAssetDataConnector to a Datasource configuration
+
+ConfiguredAssetDataConnectors like `ConfiguredAssetFilesystemDataConnector` and `ConfiguredAssetS3DataConnector` require Data Assets to be explicitly named. A Data Asset is an abstraction that can consist of one or more data_references to CSVs or relational database tables. For instance, you might have a `yellow_tripdata` Data Asset containing information about taxi rides, which consists of twelve data_references to twelve CSVs, each consisting of one month of data. Each Data Asset can have their own regex `pattern` and `group_names`, and if configured, will override any `pattern` or `group_names` under `default_regex`.
+
+Imagine you have the following files in `/`:
+
+```
+/yellow_tripdata_2019-01.csv
+/yellow_tripdata_2019-02.csv
+/yellow_tripdata_2019-03.csv
+```
+
+We could create a Data Asset `yellow_tripdata` that contains 3 data_references (`yellow_tripdata_2019-01.csv`, `yellow_tripdata_2019-02.csv`, and `yellow_tripdata_2019-03.csv`).
+In that case, the configuration would look like the following:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L9-L25
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L36-L56
+```
+
+
+
+
+Notice that we have specified a pattern that captures the year-month combination after `yellow_tripdata_` in the filename and assigns it to the `group_name` `month`.
+
+The configuration would also work with a regex capturing the entire filename (e.g. `pattern: (.*)\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. For more information about capture groups, refer to the Python documentation on [regular expressions](https://docs.python.org/3/library/re.html#re.Match.group).
+
+Later on we could retrieve the data in `yellow_tripdata_2019-02.csv` of `yellow_tripdata` as its own batch using `context.get_validator()` by specifying `{"month": "2019-02"}` as the `batch_identifier`.
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L72-L87
+```
+
+This ability to access specific Batches using `batch_identifiers` is very useful when validating Data Assets that span multiple files.
+For more information on `batches` and `batch_identifiers`, please refer to the [Core Concepts document](../../reference/dividing_data_assets_into_batches.md).
+
+A corresponding configuration for `ConfiguredAssetS3DataConnector` would look similar but would require `bucket` and `prefix` values instead of `base_directory`.
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L99-L115
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L128-L147
+```
+
+
+
+
+The following examples will show scenarios that ConfiguredAssetDataConnectors can help you analyze, using `ConfiguredAssetFilesystemDataConnector`.
+
+### Example 1: Basic Configuration for a single Data Asset
+
+Continuing the example above, imagine you have the following files in the directory ``:
+
+```
+/yellow_tripdata_2019-01.csv
+/yellow_tripdata_2019-02.csv
+/yellow_tripdata_2019-03.csv
+```
+
+Then this configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L175-L191
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L202-L222
+```
+
+
+
+
+will make available `yelow_tripdata` as a single Data Asset with the following data_references:
+
+```bash
+Available data_asset_names (1 of 1):
+ yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv']
+
+Unmatched data_references (0 of 0):[]
+```
+
+Once configured, you can get a `Validator` from the `Data Context` as follows:
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L238-L248
+```
+
+But what if the regex does not match any files in the directory?
+
+Then this configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L260-L276
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L287-L307
+```
+
+
+
+
+will give you this output
+
+```bash
+Available data_asset_names (1 of 1):
+ yellow_tripdata (0 of 0): []
+
+Unmatched data_references (3 of 3):['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv']
+```
+
+Notice that `yellow_tripdata` has 0 `data_references`, and there are 3 `Unmatched data_references` listed.
+This would indicate that some part of the configuration is incorrect and would need to be reviewed.
+In our case, changing `pattern` to `yellow_tripdata_(.*)\.csv` will fix our problem and give the same output to above.
+
+
+### Example 2: Basic configuration with more than one Data Asset
+
+Here’s a similar example, but this time two Data Assets are mixed together in one folder.
+
+**Note**: For an equivalent configuration using `InferredAssetFileSystemDataConnector`, please see Example 2 in [How to configure an InferredAssetDataConnector](./how_to_configure_an_inferredassetdataconnector).
+
+```
+/yellow_tripdata_2019-01.csv
+/green_tripdata_2019-01.csv
+/yellow_tripdata_2019-02.csv
+/green_tripdata_2019-02.csv
+/yellow_tripdata_2019-03.csv
+/green_tripdata_2019-03.csv
+```
+
+Then this configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L329-L351
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L362-L386
+```
+
+
+
+
+will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references:
+
+```bash
+Available data_asset_names (2 of 2):
+ green_tripdata (3 of 3): ['green_tripdata_2019-01.csv', 'green_tripdata_2019-02.csv', 'green_tripdata_2019-03.csv']
+ yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv']
+
+Unmatched data_references (0 of 0): []
+```
+
+### Example 3: Example with Nested Folders
+
+In the following example, files are placed folders that match the `data_asset_names` we want (`yellow_tripdata` and `green_tripdata`), but the filenames follow different formats.
+
+```
+/yellow_tripdata/yellow_tripdata_2019-01.csv
+/yellow_tripdata/yellow_tripdata_2019-02.csv
+/yellow_tripdata/yellow_tripdata_2019-03.csv
+/green_tripdata/2019-01.csv
+/green_tripdata/2019-02.csv
+/green_tripdata/2019-03.csv
+```
+
+The following configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L414-L438
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L449-L475
+```
+
+
+
+
+will now make `yellow_tripdata` and `green_tripdata` available a Data Assets, with the following data_references:
+
+```bash
+Available data_asset_names (2 of 2):
+ green_tripdata (3 of 3): ['2019-01.csv', '2019-02.csv', '2019-03.csv']
+ yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv']
+
+Unmatched data_references (0 of 0):[]
+```
+
+### Example 4: Example with Explicit data_asset_names and more complex nesting
+
+In this example, the assets `yellow_tripdata` and `green_tripdata` are being explicitly defined in the configuration, and have a more complex nesting pattern.
+
+```
+/yellow/tripdata/yellow_tripdata_2019-01.txt
+/yellow/tripdata/yellow_tripdata_2019-02.txt
+/yellow/tripdata/yellow_tripdata_2019-03.txt
+/green_tripdata/green_tripdata_2019-01.csv
+/green_tripdata/green_tripdata_2019-02.csv
+/green_tripdata/green_tripdata_2019-03.csv
+```
+
+The following configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L502-L526
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py#L537-L565
+```
+
+
+
+
+will make `yellow_tripdata` and `green_tripdata` available as Data Assets, with the following data_references:
+
+```bash
+Available data_asset_names (2 of 2):
+ green_tripdata (3 of 3): ['green_tripdata_2019-01.', 'green_tripdata_2019-02.', 'green_tripdata_2019-03.']
+ yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.', 'yellow_tripdata_2019-02.', 'yellow_tripdata_2019-03.']
+
+Unmatched data_references (0 of 0):[]
+```
+
+### Additional Notes
+To view the full script used in this page, see it on GitHub:
+- [how_to_configure_a_configuredassetdataconnector.py](https://github.com/great-expectations/great_expectations/tree/develop/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py)
diff --git a/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md
new file mode 100644
index 000000000000..a5f98bd8105d
--- /dev/null
+++ b/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector.md
@@ -0,0 +1,158 @@
+---
+title: How to configure a RuntimeDataConnector
+---
+import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx'
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+This guide demonstrates how to configure a RuntimeDataConnector and only applies to the V3 (Batch Request) API. A `RuntimeDataConnector` allows you to specify a Batch using a Runtime Batch Request, which is used to create a Validator. A Validator is the key object used to create Expectations and validate datasets.
+
+
+
+- [Understand the basics of Datasources in the V3 (Batch Request) API](../../reference/datasources.md)
+- Learned how to configure a [Data Context using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md)
+
+
+
+A RuntimeDataConnector is a special kind of [Data Connector](../../reference/datasources.md) that enables you to use a RuntimeBatchRequest to provide a [Batch's](../../reference/datasources.md#batches) data directly at runtime. The RuntimeBatchRequest can wrap an in-memory dataframe, a filepath, or a SQL query, and must include batch identifiers that uniquely identify the data (e.g. a `run_id` from an AirFlow DAG run). The batch identifiers that must be passed in at runtime are specified in the RuntimeDataConnector's configuration.
+
+## Steps
+
+### 1. Instantiate your project's DataContext
+
+Import these necessary packages and modules:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L4-L5
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L2-L5
+```
+
+
+
+
+### 2. Set up a Datasource
+
+All of the examples below assume you’re testing configuration using something like:
+
+
+
+
+```python
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+execution_engine:
+ class_name: PandasExecutionEngine
+data_connectors:
+ :
+
+"""
+context.test_yaml_config(yaml_config=datasource_config)
+```
+
+
+
+
+```python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "": {
+ ""
+ },
+ },
+}
+context.test_yaml_config(yaml.dump(datasource_config))
+```
+
+
+
+
+If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md)
+
+### 3. Add a RuntimeDataConnector to a Datasource configuration
+
+This basic configuration can be used in multiple ways depending on how the `RuntimeBatchRequest` is configured:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L10-L22
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L27-L41
+```
+
+
+
+
+Once the RuntimeDataConnector is configured you can add your datasource using:
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L49-L49
+```
+
+#### Example 1: RuntimeDataConnector for access to file-system data:
+
+At runtime, you would get a Validator from the Data Context by first defining a `RuntimeBatchRequest` with the `path` to your data defined in `runtime_parameters`:
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L50-L57
+```
+
+Next, you would pass that request into `context.get_validator`:
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L64-L68
+```
+
+### Example 2: RuntimeDataConnector that uses an in-memory DataFrame
+
+At runtime, you would get a Validator from the Data Context by first defining a `RuntimeBatchRequest` with the DataFrame passed into `batch_data` in `runtime_parameters`:
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L1-L1
+```
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L80-L80
+```
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L83-L92
+```
+
+Next, you would pass that request into `context.get_validator`:
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py#L94-L98
+```
+
+### Additional Notes
+To view the full script used in this page, see it on GitHub:
+- [how_to_configure_a_runtimedataconnector.py](https://github.com/great-expectations/great_expectations/tree/develop/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py)
diff --git a/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md
new file mode 100644
index 000000000000..b249d50864a3
--- /dev/null
+++ b/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.md
@@ -0,0 +1,491 @@
+---
+title: How to configure an InferredAssetDataConnector
+---
+import Prerequisites from '../connecting_to_your_data/components/prerequisites.jsx'
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+This guide demonstrates how to configure an InferredAssetDataConnector, and provides several examples you
+can use for configuration.
+
+
+
+- [Understand the basics of Datasources in 0.13 or later](../../reference/datasources.md)
+- Learned how to configure a [Data Context using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md)
+
+
+
+Great Expectations provides two types of `DataConnector` classes for connecting to Data Assets stored as file-system-like data (this includes files on disk, but also S3 object stores, etc) as well as relational database data:
+
+- A ConfiguredAssetDataConnector allows you to specify that you have multiple Data Assets in a `Datasource`, but also requires an explicit listing of each Data Asset you want to connect to. This allows more fine-tuning, but also requires more setup.
+- An InferredAssetDataConnector infers `data_asset_name` by using a regex that takes advantage of patterns that exist in the filename or folder structure.
+
+InferredAssetDataConnector has fewer options, so it's simpler to set up. It’s a good choice if you want to connect to a single Data Asset, or several Data Assets that all share the same naming convention.
+
+If you're not sure which one to use, please check out [How to choose which DataConnector to use](./how_to_choose_which_dataconnector_to_use.md).
+
+## Steps
+
+### 1. Instantiate your project's DataContext
+
+Import these necessary packages and modules:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L3-L4
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L1-L4
+```
+
+
+
+
+### 2. Set up a Datasource
+
+All the examples below assume you’re testing configurations using something like:
+
+
+
+
+```python
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+execution_engine:
+ class_name: PandasExecutionEngine
+data_connectors:
+ :
+
+"""
+context.test_yaml_config(yaml_config=datasource_config)
+```
+
+
+
+
+```python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "": {
+ ""
+ },
+ },
+}
+context.test_yaml_config(yaml.dump(datasource_config))
+```
+
+
+
+
+If you’re not familiar with the `test_yaml_config` method, please check out: [How to configure Data Context components using test_yaml_config](../setup/configuring_data_contexts/how_to_configure_datacontext_components_using_test_yaml_config.md)
+
+### 3. Add an InferredAssetDataConnector to a Datasource configuration
+
+InferredAssetDataConnectors like `InferredAssetFilesystemDataConnector` and `InferredAssetS3DataConnector`
+require a `default_regex` parameter, with a configured regex `pattern` and capture `group_names`.
+
+Imagine you have the following files in `my_directory/`:
+
+```
+/yellow_tripdata_2019-01.csv
+/yellow_tripdata_2019-02.csv
+/yellow_tripdata_2019-03.csv
+```
+
+We can imagine two approaches to loading the data into GE.
+
+The simplest approach would be to consider each file to be its own Data Asset. In that case, the configuration would look like the following:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L9-L24
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L35-L53
+```
+
+
+
+
+Notice that the `default_regex` is configured to have one capture group (`(.*)`) which captures the entire filename. That capture group is assigned to `data_asset_name` under `group_names`. For InferredAssetDataConnectors `data_asset_name` is a required `group_name`, and it's associated capture group is the way each `data_asset_name` is inferred.
+Running `test_yaml_config()` would result in 3 Data Assets : `yellow_tripdata_2019-01`, `yellow_tripdata_2019-02` and `yellow_tripdata_2019-03`.
+
+However, a closer look at the filenames reveals a pattern that is common to the 3 files. Each have `yellow_tripdata_` in the name, and have date information afterwards. These are the types of patterns that InferredAssetDataConnectors allow you to take advantage of.
+
+We could treat `yellow_tripdata_*.csv` files as batches within the `yellow_tripdata` Data Asset with a more specific regex `pattern` and adding `group_names` for `year` and `month`.
+
+**Note: ** We have chosen to be more specific in the capture groups for the `year` and `month` by specifying the integer value (using `\d`) and the number of digits, but a simpler capture group like `(.*)` would also work. For more information about capture groups, refer to the Python documentation on [regular expressions](https://docs.python.org/3/library/re.html#re.Match.group).
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L77-L94
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L105-L123
+```
+
+
+
+
+Running `test_yaml_config()` would result in 1 Data Asset `yellow_tripdata` with 3 associated data_references: `yellow_tripdata_2019-01.csv`, `yellow_tripdata_2019-02.csv` and `yellow_tripdata_2019-03.csv`, seen also in Example 1 below.
+
+A corresponding configuration for `InferredAssetS3DataConnector` would look similar but would require `bucket` and `prefix` values instead of `base_directory`.
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L147-L165
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L178-L197
+```
+
+
+
+
+The following examples will show scenarios that InferredAssetDataConnectors can help you analyze, using `InferredAssetFilesystemDataConnector`.
+
+
+### Example 1: Basic configuration for a single Data Asset
+
+Continuing the example above, imagine you have the following files in the directory ``:
+
+```
+/yellow_tripdata_2019-01.csv
+/yellow_tripdata_2019-02.csv
+/yellow_tripdata_2019-03.csv
+```
+
+Then this configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L225-L242
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L253-L271
+```
+
+
+
+
+will make available `yelow_tripdata` as a single Data Asset with the following data_references:
+
+```bash
+Available data_asset_names (1 of 1):
+ yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv']
+
+Unmatched data_references (0 of 0):[]
+```
+
+Once configured, you can get `Validators` from the `Data Context` as follows:
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L294-L303
+```
+
+Since this `BatchRequest` does not specify which data_reference to load, the `ActiveBatch` for the validator will be the last data_reference that was loaded. In this case, `yellow_tripdata_2019-03.csv` is what is being used by `validator`. We can verfiy this with:
+
+```python
+print(validator.active_batch_definition)
+```
+
+which prints:
+```bash
+{
+ "datasource_name": "taxi_datasource",
+ "data_connector_name": "default_inferred_data_connector_name",
+ "data_asset_name": "yellow_tripdata",
+ "batch_identifiers": {
+ "year": "2019",
+ "month": "03"
+ }
+}
+```
+
+Notice that the `batch_identifiers` for this `batch_definition` specify `"year": "2019", "month": "03"`. The parameter `batch_identifiers` can be used in our `BatchRequest` to return the data_reference CSV of our choosing using the `group_names` defined in our `DataConnector`:
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L308-L318
+```
+
+```python
+print(validator.active_batch_definition)
+```
+
+which prints:
+```bash
+{
+ "datasource_name": "taxi_datasource",
+ "data_connector_name": "default_inferred_data_connector_name",
+ "data_asset_name": "yellow_tripdata",
+ "batch_identifiers": {
+ "year": "2019",
+ "month": "02"
+ }
+}
+```
+
+This ability to access specific Batches using `batch_identifiers` is very useful when validating Data Assets that span multiple files.
+For more information on `batches` and `batch_identifiers`, please refer to the [Core Concepts document](../../reference/dividing_data_assets_into_batches.md).
+
+### Example 2: Basic configuration with more than one Data Asset
+
+Here’s a similar example, but this time two Data Assets are mixed together in one folder.
+
+**Note**: For an equivalent configuration using `ConfiguredAssetFilesSystemDataconnector`, please see Example 2
+in [How to configure a ConfiguredAssetDataConnector](./how_to_configure_a_configuredassetdataconnector).
+
+```
+/yellow_tripdata_2019-01.csv
+/green_tripdata_2019-01.csv
+/yellow_tripdata_2019-02.csv
+/green_tripdata_2019-02.csv
+/yellow_tripdata_2019-03.csv
+/green_tripdata_2019-03.csv
+```
+
+The same configuration as Example 1:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L225-L242
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L253-L271
+```
+
+
+
+
+will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references:
+
+```bash
+Available data_asset_names (2 of 2):
+ green_tripdata (3 of 3): ['green_tripdata_2019-01.csv', 'green_tripdata_2019-02.csv', 'green_tripdata_2019-03.csv']
+ yellow_tripdata (3 of 3): ['yellow_tripdata_2019-01.csv', 'yellow_tripdata_2019-02.csv', 'yellow_tripdata_2019-03.csv']
+
+Unmatched data_references (0 of 0): []
+```
+
+
+### Example 3: Nested directory structure with the data_asset_name on the inside
+
+Here’s a similar example, with a nested directory structure:
+
+```
+/2018/10/yellow_tripdata.csv
+/2018/10/green_tripdata.csv
+/2018/11/yellow_tripdata.csv
+/2018/11/green_tripdata.csv
+/2018/12/yellow_tripdata.csv
+/2018/12/green_tripdata.csv
+/2019/01/yellow_tripdata.csv
+/2019/01/green_tripdata.csv
+/2019/02/yellow_tripdata.csv
+/2019/02/green_tripdata.csv
+/2019/03/yellow_tripdata.csv
+/2019/03/green_tripdata.csv
+```
+
+Then this configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L328-L346
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L357-L376
+```
+
+
+
+
+will now make `yellow_tripdata` and `green_tripdata` both available as Data Assets, with the following data_references:
+
+```bash
+Available data_asset_names (2 of 2):
+ green_tripdata (3 of 6): ['2018/10/green_tripdata.csv', '2018/11/green_tripdata.csv', '2018/12/green_tripdata.csv']
+ yellow_tripdata (3 of 6): ['2018/10/yellow_tripdata.csv', '2018/11/yellow_tripdata.csv', '2018/12/yellow_tripdata.csv']
+
+Unmatched data_references (0 of 0):[]
+```
+
+The `glob_directive` is provided to give the `DataConnector` information about the directory structure to expect for each Data Asset. The default `glob_directive` for the `InferredAssetFileSystemDataConnector` is `"*"` and therefore must be overridden when your data_references exist in subdirectories.
+
+### Example 4: Nested directory structure with the data_asset_name on the outside
+
+In the following example, files are placed in a folder structure with the `data_asset_name` defined by the folder name (`yellow_tripdata` or `green_tripdata`)
+
+```
+/yellow_tripdata/yellow_tripdata_2019-01.csv
+/yellow_tripdata/yellow_tripdata_2019-02.csv
+/yellow_tripdata/yellow_tripdata_2019-03.csv
+/green_tripdata/2019-01.csv
+/green_tripdata/2019-02.csv
+/green_tripdata/2019-03.csv
+```
+
+Then this configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L406-L425
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L436-L460
+```
+
+
+
+
+will now make `yellow_tripdata` and `green_tripdata` into Data Assets, with each containing 3 data_references
+
+```bash
+Available data_asset_names (2 of 2):
+ green_tripdata (3 of 3): ['green_tripdata/2019-01.csv', 'green_tripdata/2019-02.csv', 'green_tripdata/2019-03.csv']
+ yellow_tripdata (3 of 3): ['yellow_tripdata/yellow_tripdata_2019-01.csv', 'yellow_tripdata/yellow_tripdata_2019-02.csv', 'yellow_tripdata/yellow_tripdata_2019-03.csv']
+
+Unmatched data_references (0 of 0):[]
+```
+
+### Example 5: Redundant information in the naming convention
+
+In the following example, files are placed in a folder structure with the `data_asset_name` defined by the folder name (`yellow_tripdata` or `green_tripdata`), but then the term `yellow_tripdata` is repeated in some filenames.
+
+```
+/yellow_tripdata/yellow_tripdata_2019-01.csv
+/yellow_tripdata/yellow_tripdata_2019-02.csv
+/yellow_tripdata/yellow_tripdata_2019-03.csv
+/green_tripdata/2019-01.csv
+/green_tripdata/2019-02.csv
+/green_tripdata/2019-03.csv
+```
+
+Then this configuration:
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L490-L508
+```
+
+
+
+
+```python file=../../../tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py#L519-L542
+```
+
+
+
+
+will not display the redundant information:
+
+```bash
+Available data_asset_names (2 of 2):
+ green_tripdata (3 of 3): ['green_tripdata/*2019-01.csv', 'green_tripdata/*2019-02.csv', 'green_tripdata/*2019-03.csv']
+ yellow_tripdata (3 of 3): ['yellow_tripdata/*2019-01.csv', 'yellow_tripdata/*2019-02.csv', 'yellow_tripdata/*2019-03.csv']
+
+Unmatched data_references (0 of 0):[]
+```
+
+### Additional Notes
+To view the full script used in this page, see it on GitHub:
+- [how_to_configure_an_inferredassetdataconnector.py](https://github.com/great-expectations/great_expectations/tree/develop/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py)
diff --git a/great_expectations/datasource/data_connector/configured_asset_filesystem_data_connector.py b/great_expectations/datasource/data_connector/configured_asset_filesystem_data_connector.py
index 188b8bc8d822..4a275aac0f57 100644
--- a/great_expectations/datasource/data_connector/configured_asset_filesystem_data_connector.py
+++ b/great_expectations/datasource/data_connector/configured_asset_filesystem_data_connector.py
@@ -49,7 +49,7 @@ def __init__(
assets (dict): configured assets as a dictionary. These can each have their own regex and sorters
execution_engine (ExecutionEngine): ExecutionEngine object to actually read the data
default_regex (dict): Optional dict the filter and organize the data_references.
- glob_directive (str): glob for selecting files in directory (defaults to *)
+ glob_directive (str): glob for selecting files in directory (defaults to **/*) or nested directories (e.g. */*/*.csv)
sorters (list): Optional list if you want to sort the data_references
batch_spec_passthrough (dict): dictionary with keys that will be added directly to batch_spec
diff --git a/great_expectations/datasource/data_connector/inferred_asset_filesystem_data_connector.py b/great_expectations/datasource/data_connector/inferred_asset_filesystem_data_connector.py
index f75388131fa6..be8e01e54cf9 100644
--- a/great_expectations/datasource/data_connector/inferred_asset_filesystem_data_connector.py
+++ b/great_expectations/datasource/data_connector/inferred_asset_filesystem_data_connector.py
@@ -48,6 +48,7 @@ def __init__(
base_directory(str): base_directory for DataConnector to begin reading files
execution_engine (ExecutionEngine): ExecutionEngine object to actually read the data
default_regex (dict): Optional dict the filter and organize the data_references.
+ glob_directive (str): glob for selecting files in directory (defaults to *) or nested directories (e.g. */*.csv)
sorters (list): Optional list if you want to sort the data_references
batch_spec_passthrough (dict): dictionary with keys that will be added directly to batch_spec
"""
diff --git a/sidebars.js b/sidebars.js
index c2509d305ec6..04735f9a0ac5 100644
--- a/sidebars.js
+++ b/sidebars.js
@@ -94,6 +94,10 @@ module.exports = {
type: 'category',
label: 'Core skills',
items: [
+ 'guides/connecting_to_your_data/how_to_choose_which_dataconnector_to_use',
+ 'guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector',
+ 'guides/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector',
+ 'guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector',
'guides/connecting_to_your_data/how_to_configure_a_dataconnector_to_introspect_and_partition_a_file_system_or_blob_store',
'guides/connecting_to_your_data/how_to_configure_a_dataconnector_to_introspect_and_partition_tables_in_sql',
'guides/connecting_to_your_data/how_to_create_a_batch_of_data_from_an_in_memory_spark_or_pandas_dataframe',
diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py
new file mode 100644
index 000000000000..1f35ad232ad2
--- /dev/null
+++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py
@@ -0,0 +1,175 @@
+from ruamel import yaml
+
+import great_expectations as ge
+
+context = ge.get_context()
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_inferred_data_connector_name:
+ class_name: InferredAssetFilesystemDataConnector
+ base_directory: /
+ glob_directive: "*/*.csv"
+ default_regex:
+ group_names:
+ - data_asset_name
+ - year
+ - month
+ pattern: (.*)/.*(\d{4})-(\d{2})\.csv
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/nested_directories_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_inferred_data_connector_name": {
+ "class_name": "InferredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "glob_directive": "*/*.csv",
+ "default_regex": {
+ "group_names": [
+ "data_asset_name",
+ "year",
+ "month",
+ ],
+ "pattern": "(.*)/.*(\d{4})-(\d{2})\.csv",
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "base_directory"
+] = "../data/nested_directories_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert test_yaml == test_python
+
+context.add_datasource(**datasource_config)
+
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
+assert "green_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_configured_data_connector_name:
+ class_name: ConfiguredAssetFilesystemDataConnector
+ base_directory: /
+ assets:
+ yellow_tripdata:
+ base_directory: yellow_tripdata/
+ pattern: yellow_tripdata_(\d{4})-(\d{2})\.csv
+ group_names:
+ - year
+ - month
+ green_tripdata:
+ base_directory: green_tripdata/
+ pattern: (\d{4})-(\d{2})\.csv
+ group_names:
+ - year
+ - month
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/nested_directories_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_configured_data_connector_name": {
+ "class_name": "ConfiguredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "assets": {
+ "yellow_tripdata": {
+ "base_directory": "yellow_tripdata/",
+ "pattern": "yellow_tripdata_(\d{4})-(\d{2})\.csv",
+ "group_names": ["year", "month"],
+ },
+ "green_tripdata": {
+ "base_directory": "green_tripdata/",
+ "pattern": "(\d{4})-(\d{2})\.csv",
+ "group_names": ["year", "month"],
+ },
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_configured_data_connector_name"][
+ "base_directory"
+] = "../data/nested_directories_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert test_yaml == test_python
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
+assert "green_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py
new file mode 100644
index 000000000000..6af92cd5d71a
--- /dev/null
+++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py
@@ -0,0 +1,589 @@
+from ruamel import yaml
+
+import great_expectations as ge
+from great_expectations.core.batch import BatchRequest
+
+context = ge.get_context()
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_configured_data_connector_name:
+ class_name: ConfiguredAssetFilesystemDataConnector
+ base_directory: /
+ assets:
+ yellow_tripdata:
+ pattern: yellow_tripdata_(.*)\.csv
+ group_names:
+ - month
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/single_directory_one_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_configured_data_connector_name": {
+ "class_name": "ConfiguredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "assets": {
+ "yellow_tripdata": {
+ "pattern": "yellow_tripdata_(.*)\.csv",
+ "group_names": ["month"],
+ }
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_configured_data_connector_name"][
+ "base_directory"
+] = "../data/single_directory_one_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+assert test_yaml == test_python
+
+context.add_datasource(**datasource_config)
+
+batch_request = BatchRequest(
+ datasource_name="taxi_datasource",
+ data_connector_name="default_configured_data_connector_name",
+ data_asset_name="yellow_tripdata",
+)
+
+context.create_expectation_suite(
+ expectation_suite_name="", overwrite_existing=True
+)
+
+validator = context.get_validator(
+ batch_request=batch_request,
+ expectation_suite_name="",
+ batch_identifiers={"month": "2019-02"},
+)
+print(validator.head())
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert isinstance(validator, ge.validator.validator.Validator)
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_inferred_data_connector_name:
+ class_name: InferredAssetS3DataConnector
+ bucket: /
+ prefix: /
+ default_regex:
+ group_names:
+ - month
+ pattern: yellow_tripdata_(.*)\.csv
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace("/", "superconductive-public")
+datasource_yaml = datasource_yaml.replace(
+ "/", "data/taxi_yellow_trip_data_samples/"
+)
+
+# TODO: Uncomment once S3 testing in Azure Pipelines is re-enabled
+# test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_inferred_data_connector_name": {
+ "class_name": "InferredAssetFilesystemDataConnector",
+ "bucket": "/",
+ "prefix": "/",
+ "default_regex": {
+ "group_names": ["month"],
+ "pattern": "yellow_tripdata_(.*)\.csv",
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "bucket"
+] = "superconductive-public"
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "prefix"
+] = "data/taxi_yellow_trip_data_samples/"
+
+# TODO: Uncomment once S3 testing in Azure Pipelines is re-enabled
+# test_python = context.test_yaml_config(
+# yaml.dump(datasource_config), return_mode="report_object"
+# )
+#
+# assert test_yaml == test_python
+#
+# context.add_datasource(**datasource_config)
+#
+# assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+# assert "yellow_tripdata" in set(
+# context.get_available_data_asset_names()["taxi_datasource"][
+# "default_inferred_data_connector_name"
+# ]
+# )
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_configured_data_connector_name:
+ class_name: ConfiguredAssetFilesystemDataConnector
+ base_directory: /
+ assets:
+ yellow_tripdata:
+ pattern: (.*)\.csv
+ group_names:
+ - month
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/single_directory_one_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_configured_data_connector_name": {
+ "class_name": "ConfiguredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "assets": {
+ "yellow_tripdata": {
+ "pattern": "yellow_tripdata_(.*)\.csv",
+ "group_names": ["month"],
+ }
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_configured_data_connector_name"][
+ "base_directory"
+] = "../data/single_directory_one_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+assert test_yaml == test_python
+
+context.add_datasource(**datasource_config)
+
+batch_request = BatchRequest(
+ datasource_name="taxi_datasource",
+ data_connector_name="default_configured_data_connector_name",
+ data_asset_name="yellow_tripdata",
+)
+
+validator = context.get_validator(
+ batch_request=batch_request,
+ expectation_suite_name="",
+ batch_identifiers={"month": "2019-02"},
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert isinstance(validator, ge.validator.validator.Validator)
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_configured_data_connector_name:
+ class_name: ConfiguredAssetFilesystemDataConnector
+ base_directory: /
+ assets:
+ yellow_tripdata:
+ pattern: green_tripdata_(.*)\.csv
+ group_names:
+ - month
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/single_directory_one_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_configured_data_connector_name": {
+ "class_name": "ConfiguredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "assets": {
+ "yellow_tripdata": {
+ "pattern": "green_tripdata_(.*)\.csv",
+ "group_names": ["month"],
+ }
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_configured_data_connector_name"][
+ "base_directory"
+] = "../data/single_directory_one_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert test_yaml == test_python
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_configured_data_connector_name:
+ class_name: ConfiguredAssetFilesystemDataConnector
+ base_directory: /
+ assets:
+ yellow_tripdata:
+ pattern: yellow_tripdata_(\d{4})-(\d{2})\.csv
+ group_names:
+ - year
+ - month
+ green_tripdata:
+ pattern: green_tripdata_(\d{4})-(\d{2})\.csv
+ group_names:
+ - year
+ - month
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/single_directory_two_data_assets/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_configured_data_connector_name": {
+ "class_name": "ConfiguredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "assets": {
+ "yellow_tripdata": {
+ "pattern": "yellow_tripdata_(\d{4})-(\d{2})\.csv",
+ "group_names": ["year", "month"],
+ },
+ "green_tripdata": {
+ "pattern": "green_tripdata_(\d{4})-(\d{2})\.csv",
+ "group_names": ["year", "month"],
+ },
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_configured_data_connector_name"][
+ "base_directory"
+] = "../data/single_directory_two_data_assets/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+# TODO: Uncomment the line below once ISSUE #3589 (https://github.com/great-expectations/great_expectations/issues/3589) is resolved
+# assert test_yaml == test_python
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
+assert "green_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_configured_data_connector_name:
+ class_name: ConfiguredAssetFilesystemDataConnector
+ base_directory: /
+ assets:
+ yellow_tripdata:
+ base_directory: yellow_tripdata/
+ pattern: yellow_tripdata_(\d{4})-(\d{2})\.csv
+ group_names:
+ - year
+ - month
+ green_tripdata:
+ base_directory: green_tripdata/
+ pattern: (\d{4})-(\d{2})\.csv
+ group_names:
+ - year
+ - month
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/nested_directories_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_configured_data_connector_name": {
+ "class_name": "ConfiguredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "assets": {
+ "yellow_tripdata": {
+ "base_directory": "yellow_tripdata/",
+ "pattern": "yellow_tripdata_(\d{4})-(\d{2})\.csv",
+ "group_names": ["year", "month"],
+ },
+ "green_tripdata": {
+ "base_directory": "green_tripdata/",
+ "pattern": "(\d{4})-(\d{2})\.csv",
+ "group_names": ["year", "month"],
+ },
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_configured_data_connector_name"][
+ "base_directory"
+] = "../data/nested_directories_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert test_yaml == test_python
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
+assert "green_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_configured_data_connector_name:
+ class_name: ConfiguredAssetFilesystemDataConnector
+ base_directory: /
+ default_regex:
+ pattern: (.*)_(\d{4})-(\d{2})\.(csv|txt)$
+ group_names:
+ - data_asset_name
+ - year
+ - month
+ assets:
+ yellow_tripdata:
+ base_directory: yellow/tripdata/
+ glob_directive: "*.txt"
+ green_tripdata:
+ base_directory: green_tripdata/
+ glob_directive: "*.csv"
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/nested_directories_complex/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_configured_data_connector_name": {
+ "class_name": "ConfiguredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "default_regex": {
+ "pattern": "(.*)_(\d{4})-(\d{2})\.(csv|txt)$",
+ "group_names": ["data_asset_name", "year", "month"],
+ },
+ "assets": {
+ "yellow_tripdata": {
+ "base_directory": "yellow/tripdata/",
+ "glob_directive": "*.txt",
+ },
+ "green_tripdata": {
+ "base_directory": "green_tripdata/",
+ "glob_directive": "*.csv",
+ },
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_configured_data_connector_name"][
+ "base_directory"
+] = "../data/nested_directories_complex/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert test_yaml == test_python
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
+assert "green_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_configured_data_connector_name"
+ ]
+)
diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py
new file mode 100644
index 000000000000..dae9671b17f3
--- /dev/null
+++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py
@@ -0,0 +1,106 @@
+import pandas as pd
+from ruamel import yaml
+
+import great_expectations as ge
+from great_expectations.core.batch import RuntimeBatchRequest
+
+context = ge.get_context()
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_runtime_data_connector_name:
+ class_name: RuntimeDataConnector
+ batch_identifiers:
+ - default_identifier_name
+"""
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_runtime_data_connector_name": {
+ "class_name": "RuntimeDataConnector",
+ "batch_identifiers": ["default_identifier_name"],
+ },
+ },
+}
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+assert test_yaml == test_python
+
+context.add_datasource(**datasource_config)
+
+batch_request = RuntimeBatchRequest(
+ datasource_name="taxi_datasource",
+ data_connector_name="default_runtime_data_connector_name",
+ data_asset_name="", # This can be anything that identifies this data_asset for you
+ runtime_parameters={"path": ""}, # Add your path here.
+ batch_identifiers={"default_identifier_name": ""},
+)
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the BatchRequest above.
+batch_request.runtime_parameters[
+ "path"
+] = "./data/single_directory_one_data_asset/yellow_tripdata_2019-01.csv"
+
+validator = context.get_validator(
+ batch_request=batch_request,
+ create_expectation_suite_with_name="",
+)
+print(validator.head())
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert isinstance(validator, ge.validator.validator.Validator)
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_runtime_data_connector_name"
+ ]
+)
+
+path = ""
+# Please note this override is only to provide good UX for docs and tests.
+path = "./data/single_directory_one_data_asset/yellow_tripdata_2019-01.csv"
+df = pd.read_csv(path)
+
+batch_request = RuntimeBatchRequest(
+ datasource_name="taxi_datasource",
+ data_connector_name="default_runtime_data_connector_name",
+ data_asset_name="", # This can be anything that identifies this data_asset for you
+ runtime_parameters={"batch_data": df}, # Pass your DataFrame here.
+ batch_identifiers={"default_identifier_name": ""},
+)
+
+validator = context.get_validator(
+ batch_request=batch_request,
+ expectation_suite_name="",
+)
+print(validator.head())
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert isinstance(validator, ge.validator.validator.Validator)
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_runtime_data_connector_name"
+ ]
+)
diff --git a/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py
new file mode 100644
index 000000000000..02ed1bc20605
--- /dev/null
+++ b/tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py
@@ -0,0 +1,569 @@
+from ruamel import yaml
+
+import great_expectations as ge
+from great_expectations.core.batch import BatchRequest
+
+context = ge.get_context()
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_inferred_data_connector_name:
+ class_name: InferredAssetFilesystemDataConnector
+ base_directory: /
+ default_regex:
+ group_names:
+ - data_asset_name
+ pattern: (.*)\.csv
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/single_directory_one_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_inferred_data_connector_name": {
+ "class_name": "InferredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "default_regex": {
+ "group_names": ["data_asset_name"],
+ "pattern": "(.*)\.csv",
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "base_directory"
+] = "../data/single_directory_one_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+assert test_yaml == test_python
+
+context.add_datasource(**datasource_config)
+
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata_2019-01" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_inferred_data_connector_name:
+ class_name: InferredAssetFilesystemDataConnector
+ base_directory: /
+ default_regex:
+ group_names:
+ - data_asset_name
+ - year
+ - month
+ pattern: (.*)_(\d{4})-(\d{2})\.csv
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/single_directory_one_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_inferred_data_connector_name": {
+ "class_name": "InferredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "default_regex": {
+ "group_names": ["data_asset_name", "year", "month"],
+ "pattern": "(.*)_(\d{4})-(\d{2})\.csv",
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "base_directory"
+] = "../data/single_directory_one_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+assert test_yaml == test_python
+
+context.add_datasource(**datasource_config)
+
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_inferred_data_connector_name:
+ class_name: InferredAssetS3DataConnector
+ bucket: /
+ prefix: /
+ default_regex:
+ group_names:
+ - data_asset_name
+ - year
+ - month
+ pattern: (.*)_(\d{4})-(\d{2})\.csv
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace("/", "superconductive-public")
+datasource_yaml = datasource_yaml.replace(
+ "/", "data/taxi_yellow_trip_data_samples/"
+)
+
+# TODO: Uncomment once S3 testing in Azure Pipelines is re-enabled
+# test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_inferred_data_connector_name": {
+ "class_name": "InferredAssetFilesystemDataConnector",
+ "bucket": "/",
+ "prefix": "/",
+ "default_regex": {
+ "group_names": ["data_asset_name", "year", "month"],
+ "pattern": "(.*)_(\d{4})-(\d{2})\.csv",
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "bucket"
+] = "superconductive-public"
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "prefix"
+] = "data/taxi_yellow_trip_data_samples/"
+
+# TODO: Uncomment once S3 testing in Azure Pipelines is re-enabled
+# test_python = context.test_yaml_config(
+# yaml.dump(datasource_config), return_mode="report_object"
+# )
+#
+# assert test_yaml == test_python
+#
+# context.add_datasource(**datasource_config)
+#
+# assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+# assert "yellow_tripdata" in set(
+# context.get_available_data_asset_names()["taxi_datasource"][
+# "default_inferred_data_connector_name"
+# ]
+# )
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_inferred_data_connector_name:
+ class_name: InferredAssetFilesystemDataConnector
+ base_directory: /
+ default_regex:
+ group_names:
+ - data_asset_name
+ - year
+ - month
+ pattern: (.*)_(\d{4})-(\d{2})\.csv
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/single_directory_one_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_inferred_data_connector_name": {
+ "class_name": "InferredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "default_regex": {
+ "group_names": ["data_asset_name", "year", "month"],
+ "pattern": "(.*)_(\d{4})-(\d{2})\.csv",
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "base_directory"
+] = "../data/single_directory_one_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+assert test_yaml == test_python
+
+context.add_datasource(**datasource_config)
+
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
+
+batch_request = BatchRequest(
+ datasource_name="taxi_datasource",
+ data_connector_name="default_inferred_data_connector_name",
+ data_asset_name="yellow_tripdata",
+)
+
+validator = context.get_validator(
+ batch_request=batch_request,
+ create_expectation_suite_with_name="",
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert isinstance(validator, ge.validator.validator.Validator)
+
+batch_request = BatchRequest(
+ datasource_name="taxi_datasource",
+ data_connector_name="default_inferred_data_connector_name",
+ data_asset_name="yellow_tripdata",
+ data_connector_query={"batch_filter_parameters": {"year": "2019", "month": "02"}},
+)
+
+validator = context.get_validator(
+ batch_request=batch_request,
+ expectation_suite_name="",
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert isinstance(validator, ge.validator.validator.Validator)
+assert validator.active_batch_definition.batch_identifiers == {
+ "year": "2019",
+ "month": "02",
+}
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_inferred_data_connector_name:
+ class_name: InferredAssetFilesystemDataConnector
+ base_directory: /
+ glob_directive: "*/*/*.csv"
+ default_regex:
+ group_names:
+ - year
+ - month
+ - data_asset_name
+ pattern: (\d{4})/(\d{2})/(.*)\.csv
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/nested_directories_time/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_inferred_data_connector_name": {
+ "class_name": "InferredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "glob_directive": "*/*/*.csv",
+ "default_regex": {
+ "group_names": ["year", "month", "data_asset_name"],
+ "pattern": "(\d{4})/(\d{2})/(.*)\.csv",
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "base_directory"
+] = "../data/nested_directories_time/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert test_yaml == test_python
+
+context.add_datasource(**datasource_config)
+
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
+assert "green_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_inferred_data_connector_name:
+ class_name: InferredAssetFilesystemDataConnector
+ base_directory: /
+ glob_directive: "*/*.csv"
+ default_regex:
+ group_names:
+ - data_asset_name
+ - file_name_root
+ - year
+ - month
+ pattern: (.*)/(.*)(\d{4})-(\d{2})\.csv
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/nested_directories_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_inferred_data_connector_name": {
+ "class_name": "InferredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "glob_directive": "*/*.csv",
+ "default_regex": {
+ "group_names": [
+ "data_asset_name",
+ "file_name_root",
+ "year",
+ "month",
+ ],
+ "pattern": "(.*)/(.*)(\d{4})-(\d{2})\.csv",
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "base_directory"
+] = "../data/nested_directories_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert test_yaml == test_python
+
+context.add_datasource(**datasource_config)
+
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
+assert "green_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
+
+# YAML
+datasource_yaml = """
+name: taxi_datasource
+class_name: Datasource
+module_name: great_expectations.datasource
+execution_engine:
+ module_name: great_expectations.execution_engine
+ class_name: PandasExecutionEngine
+data_connectors:
+ default_inferred_data_connector_name:
+ class_name: InferredAssetFilesystemDataConnector
+ base_directory: /
+ glob_directive: "*/*.csv"
+ default_regex:
+ group_names:
+ - data_asset_name
+ - year
+ - month
+ pattern: (.*)/.*(\d{4})-(\d{2})\.csv
+"""
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the yaml above.
+datasource_yaml = datasource_yaml.replace(
+ "/", "../data/nested_directories_data_asset/"
+)
+
+test_yaml = context.test_yaml_config(datasource_yaml, return_mode="report_object")
+
+# Python
+datasource_config = {
+ "name": "taxi_datasource",
+ "class_name": "Datasource",
+ "module_name": "great_expectations.datasource",
+ "execution_engine": {
+ "module_name": "great_expectations.execution_engine",
+ "class_name": "PandasExecutionEngine",
+ },
+ "data_connectors": {
+ "default_inferred_data_connector_name": {
+ "class_name": "InferredAssetFilesystemDataConnector",
+ "base_directory": "/",
+ "glob_directive": "*/*.csv",
+ "default_regex": {
+ "group_names": [
+ "data_asset_name",
+ "year",
+ "month",
+ ],
+ "pattern": "(.*)/.*(\d{4})-(\d{2})\.csv",
+ },
+ },
+ },
+}
+
+# Please note this override is only to provide good UX for docs and tests.
+# In normal usage you'd set your path directly in the code above.
+datasource_config["data_connectors"]["default_inferred_data_connector_name"][
+ "base_directory"
+] = "../data/nested_directories_data_asset/"
+
+test_python = context.test_yaml_config(
+ yaml.dump(datasource_config), return_mode="report_object"
+)
+
+# NOTE: The following code is only for testing and can be ignored by users.
+assert test_yaml == test_python
+
+context.add_datasource(**datasource_config)
+
+assert [ds["name"] for ds in context.list_datasources()] == ["taxi_datasource"]
+assert "yellow_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
+assert "green_tripdata" in set(
+ context.get_available_data_asset_names()["taxi_datasource"][
+ "default_inferred_data_connector_name"
+ ]
+)
diff --git a/tests/integration/test_script_runner.py b/tests/integration/test_script_runner.py
index 2673af971618..d42375993420 100755
--- a/tests/integration/test_script_runner.py
+++ b/tests/integration/test_script_runner.py
@@ -298,6 +298,30 @@ class BackendDependencies(enum.Enum):
"util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py",
"extra_backend_dependencies": BackendDependencies.MSSQL,
},
+ {
+ "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_choose_which_dataconnector_to_use.py",
+ "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations",
+ "data_dir": "tests/test_sets/dataconnector_docs",
+ "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py",
+ },
+ {
+ "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector.py",
+ "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations",
+ "data_dir": "tests/test_sets/dataconnector_docs",
+ "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py",
+ },
+ {
+ "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_configuredassetdataconnector.py",
+ "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations",
+ "data_dir": "tests/test_sets/dataconnector_docs",
+ "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py",
+ },
+ {
+ "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/how_to_configure_a_runtimedataconnector.py",
+ "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations",
+ "data_dir": "tests/test_sets/dataconnector_docs",
+ "util_script": "tests/integration/docusaurus/connecting_to_your_data/database/util.py",
+ },
# {
# "user_flow_script": "tests/integration/docusaurus/connecting_to_your_data/database/mysql_yaml_example.py",
# "data_context_dir": "tests/integration/fixtures/no_datasources/great_expectations",
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv
new file mode 100644
index 000000000000..1168d9f9ef35
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-01.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+198236,2,2019-01-11 01:12:46,2019-01-11 01:20:37,N,1,129,129,1,1.39,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1,
+33286,1,2019-01-02 23:41:23,2019-01-02 23:49:14,N,1,255,112,1,1.5,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1,
+517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1,
+368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1,
+155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1,
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv
new file mode 100644
index 000000000000..b19b1f827f4f
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-02.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+504629,2,2019-02-25 16:05:49,2019-02-25 16:11:15,N,1,179,179,1,0.89,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,2,1,0.0
+475628,1,2019-02-23 23:18:58,2019-02-23 23:31:00,N,1,210,55,1,4.2,15.0,0.5,0.5,0.0,0.0,,0.3,16.3,2,1,0.0
+150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75
+245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0
+151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv
new file mode 100644
index 000000000000..c5f49aef4b67
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/green_tripdata/green_tripdata_2019-03.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+337596,1,2019-03-18 08:20:31,2019-03-18 09:12:55,N,1,49,35,1,6.1,33.5,0.0,0.5,0.0,0.0,,0.3,34.3,1,1,0.0
+378981,2,2019-03-20 13:03:10,2019-03-20 13:16:30,N,1,152,41,1,1.14,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0
+6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0
+76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75
+282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt
new file mode 100644
index 000000000000..14f45f9cf3bc
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-01.txt
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+4565,2,2019-01-15 02:04:57,2019-01-15 02:17:53,1,4.92,1,N,255,226,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36,
+714,2,2019-01-08 00:38:49,2019-01-08 00:52:30,1,7.07,1,N,132,197,1,21.0,0.5,0.5,4.46,0.0,0.3,26.76,
+2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0
+5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95,
+4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56,
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt
new file mode 100644
index 000000000000..dd5bcc804be2
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-02.txt
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+9873,2,2019-02-14 14:06:26,2019-02-14 14:24:36,2,1.24,1,N,68,161,2,11.5,0.0,0.5,0.0,0.0,0.3,14.8,2.5
+9976,1,2019-02-08 14:20:27,2019-02-08 14:25:30,0,0.7,1,N,186,249,2,5.0,2.5,0.5,0.0,0.0,0.3,8.3,2.5
+4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5
+1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5
+6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt
new file mode 100644
index 000000000000..085f0cc37265
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_complex/yellow/tripdata/yellow_tripdata_2019-03.txt
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+7797,2,2019-03-20 15:40:46,2019-03-20 16:15:38,1,5.8,1,N,164,13,1,26.0,0.0,0.5,5.86,0.0,0.3,35.16,2.5
+671,1,2019-03-14 17:21:03,2019-03-14 17:29:45,1,1.5,1,N,239,166,1,8.0,3.5,0.5,2.45,0.0,0.3,14.75,2.5
+7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0
+9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5
+2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-01.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-01.csv
new file mode 100644
index 000000000000..1168d9f9ef35
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-01.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+198236,2,2019-01-11 01:12:46,2019-01-11 01:20:37,N,1,129,129,1,1.39,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1,
+33286,1,2019-01-02 23:41:23,2019-01-02 23:49:14,N,1,255,112,1,1.5,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1,
+517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1,
+368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1,
+155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1,
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-02.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-02.csv
new file mode 100644
index 000000000000..b19b1f827f4f
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-02.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+504629,2,2019-02-25 16:05:49,2019-02-25 16:11:15,N,1,179,179,1,0.89,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,2,1,0.0
+475628,1,2019-02-23 23:18:58,2019-02-23 23:31:00,N,1,210,55,1,4.2,15.0,0.5,0.5,0.0,0.0,,0.3,16.3,2,1,0.0
+150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75
+245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0
+151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-03.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-03.csv
new file mode 100644
index 000000000000..c5f49aef4b67
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/green_tripdata/2019-03.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+337596,1,2019-03-18 08:20:31,2019-03-18 09:12:55,N,1,49,35,1,6.1,33.5,0.0,0.5,0.0,0.0,,0.3,34.3,1,1,0.0
+378981,2,2019-03-20 13:03:10,2019-03-20 13:16:30,N,1,152,41,1,1.14,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0
+6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0
+76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75
+282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-01.csv
new file mode 100644
index 000000000000..14f45f9cf3bc
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-01.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+4565,2,2019-01-15 02:04:57,2019-01-15 02:17:53,1,4.92,1,N,255,226,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36,
+714,2,2019-01-08 00:38:49,2019-01-08 00:52:30,1,7.07,1,N,132,197,1,21.0,0.5,0.5,4.46,0.0,0.3,26.76,
+2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0
+5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95,
+4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56,
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-02.csv
new file mode 100644
index 000000000000..dd5bcc804be2
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-02.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+9873,2,2019-02-14 14:06:26,2019-02-14 14:24:36,2,1.24,1,N,68,161,2,11.5,0.0,0.5,0.0,0.0,0.3,14.8,2.5
+9976,1,2019-02-08 14:20:27,2019-02-08 14:25:30,0,0.7,1,N,186,249,2,5.0,2.5,0.5,0.0,0.0,0.3,8.3,2.5
+4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5
+1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5
+6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-03.csv
new file mode 100644
index 000000000000..085f0cc37265
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_data_asset/yellow_tripdata/yellow_tripdata_2019-03.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+7797,2,2019-03-20 15:40:46,2019-03-20 16:15:38,1,5.8,1,N,164,13,1,26.0,0.0,0.5,5.86,0.0,0.3,35.16,2.5
+671,1,2019-03-14 17:21:03,2019-03-14 17:29:45,1,1.5,1,N,239,166,1,8.0,3.5,0.5,2.45,0.0,0.3,14.75,2.5
+7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0
+9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5
+2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/green_tripdata.csv
new file mode 100644
index 000000000000..870ec360c1ba
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/green_tripdata.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type
+448278,2,2018-10-20 13:58:45,2018-10-20 14:05:35,N,1,112,256,1,1.03,6.5,0.0,0.5,0.0,0.0,,0.3,7.3,2,1.0
+520261,1,2018-10-23 17:48:10,2018-10-23 17:55:35,Y,1,181,181,1,0.9,6.5,1.0,0.5,1.5,0.0,,0.3,9.8,1,1.0
+520094,2,2018-10-23 17:39:25,2018-10-23 18:42:28,N,1,75,133,1,14.6,48.5,1.0,0.5,11.21,5.76,,0.3,67.27,1,1.0
+465246,2,2018-10-21 03:41:56,2018-10-21 03:45:23,N,1,95,95,2,0.86,5.0,0.5,0.5,1.0,0.0,,0.3,7.3,1,1.0
+652784,2,2018-10-29 12:03:25,2018-10-29 12:14:19,N,1,18,31,1,2.24,10.0,0.0,0.5,0.0,0.0,,0.3,10.8,2,1.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/yellow_tripdata.csv
new file mode 100644
index 000000000000..61bb9b8b3f00
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/10/yellow_tripdata.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+6984,2,2018-10-30 10:59:02,2018-10-30 11:04:30,1,0.62,1,N,48,163,1,5.5,0.0,0.5,1.26,0.0,0.3,7.56,
+3030,2,2018-10-03 19:43:48,2018-10-03 20:01:51,1,3.93,1,N,137,239,1,15.0,1.0,0.5,2.5,0.0,0.3,19.3,
+9267,1,2018-10-16 07:10:28,2018-10-16 07:19:21,1,1.4,1,N,142,236,1,8.0,0.0,0.5,0.88,0.0,0.3,9.68,
+4342,1,2018-10-10 10:19:16,2018-10-10 10:33:52,1,2.1,1,N,142,50,2,11.5,0.0,0.5,0.0,0.0,0.3,12.3,
+8099,2,2018-10-01 19:18:44,2018-10-01 19:31:48,1,2.51,1,N,140,239,1,11.0,1.0,0.5,3.2,0.0,0.3,16.0,
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/green_tripdata.csv
new file mode 100644
index 000000000000..ae47bbf8b509
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/green_tripdata.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type
+206147,2,2018-11-09 21:58:22,2018-11-09 22:09:19,N,1,97,17,1,1.69,9.0,0.5,0.5,0.0,0.0,,0.3,10.3,2,1.0
+586398,2,2018-11-28 07:16:28,2018-11-28 07:40:30,N,1,81,250,1,4.39,20.5,0.0,0.5,0.0,0.0,,0.3,21.3,1,1.0
+410503,2,2018-11-19 08:36:21,2018-11-19 08:53:22,N,1,116,128,1,4.59,17.5,0.0,0.5,4.58,0.0,,0.3,22.88,1,1.0
+284524,2,2018-11-13 14:57:24,2018-11-13 15:23:48,N,1,53,121,1,5.38,21.5,0.0,0.5,0.0,0.0,,0.3,22.3,1,1.0
+652213,2,2018-11-30 20:18:38,2018-11-30 20:23:04,N,1,75,74,1,1.23,6.0,0.5,0.5,1.46,0.0,,0.3,8.76,1,1.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/yellow_tripdata.csv
new file mode 100644
index 000000000000..f61adc4debd6
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/11/yellow_tripdata.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+7201,2,2018-11-03 20:17:25,2018-11-03 20:25:26,1,1.15,1,N,166,238,1,7.5,0.5,0.5,2.2,0.0,0.3,11.0,
+2578,2,2018-11-14 09:03:52,2018-11-14 09:21:39,1,0.9,1,N,230,230,1,11.5,0.0,0.5,1.0,0.0,0.3,13.3,
+9216,1,2018-11-16 13:12:09,2018-11-16 13:24:41,1,1.5,1,N,238,143,2,9.5,0.0,0.5,0.0,0.0,0.3,10.3,
+5518,1,2018-11-15 20:44:20,2018-11-15 20:58:41,1,1.7,1,N,236,151,1,11.0,0.5,0.5,0.0,0.0,0.3,12.3,
+104,2,2018-11-07 00:01:14,2018-11-07 00:05:56,5,0.85,1,N,234,170,1,5.5,0.5,0.5,1.36,0.0,0.3,8.16,
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/green_tripdata.csv
new file mode 100644
index 000000000000..848fed127917
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/green_tripdata.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type
+644743,2,2018-12-29 22:07:39,2018-12-29 22:23:38,N,1,255,7,1,4.2,15.0,0.5,0.5,3.26,0.0,,0.3,19.56,1,1.0
+241539,1,2018-12-11 13:27:48,2018-12-11 14:01:25,N,1,244,87,1,13.4,40.5,0.0,0.5,8.25,0.0,,0.3,49.55,1,1.0
+519266,2,2018-12-22 23:20:48,2018-12-22 23:42:53,N,1,255,225,1,4.03,16.5,0.5,0.5,3.56,0.0,,0.3,21.36,1,1.0
+419676,2,2018-12-18 20:30:48,2018-12-18 20:42:37,N,1,166,116,1,2.59,11.0,0.5,0.5,0.0,0.0,,0.3,12.3,1,1.0
+110768,2,2018-12-05 21:20:39,2018-12-05 21:45:12,N,1,255,61,1,4.08,18.5,0.5,0.5,5.94,0.0,,0.3,25.74,1,1.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/yellow_tripdata.csv
new file mode 100644
index 000000000000..94f782e0e315
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2018/12/yellow_tripdata.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+6701,1,2018-12-07 22:07:55,2018-12-07 22:35:18,1,3.5,1,N,237,249,1,18.5,0.5,0.5,2.0,0.0,0.3,21.8,
+9645,2,2018-12-10 18:21:08,2018-12-10 18:33:12,1,1.38,1,N,114,158,1,9.0,1.0,0.5,2.16,0.0,0.3,12.96,
+4105,2,2018-12-15 23:27:23,2018-12-15 23:44:02,1,2.62,1,N,232,137,2,13.0,0.5,0.5,0.0,0.0,0.3,14.3,
+2743,2,2018-12-12 19:54:39,2018-12-12 19:57:53,1,0.41,1,N,237,237,1,4.0,1.0,0.5,1.16,0.0,0.3,6.96,
+4126,1,2018-12-21 18:57:55,2018-12-21 19:09:35,1,1.0,1,N,249,79,1,8.5,1.0,0.5,2.05,0.0,0.3,12.35,
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv
new file mode 100644
index 000000000000..1168d9f9ef35
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/green_tripdata.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+198236,2,2019-01-11 01:12:46,2019-01-11 01:20:37,N,1,129,129,1,1.39,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1,
+33286,1,2019-01-02 23:41:23,2019-01-02 23:49:14,N,1,255,112,1,1.5,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1,
+517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1,
+368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1,
+155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1,
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv
new file mode 100644
index 000000000000..14f45f9cf3bc
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/01/yellow_tripdata.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+4565,2,2019-01-15 02:04:57,2019-01-15 02:17:53,1,4.92,1,N,255,226,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36,
+714,2,2019-01-08 00:38:49,2019-01-08 00:52:30,1,7.07,1,N,132,197,1,21.0,0.5,0.5,4.46,0.0,0.3,26.76,
+2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0
+5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95,
+4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56,
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv
new file mode 100644
index 000000000000..b19b1f827f4f
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/green_tripdata.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+504629,2,2019-02-25 16:05:49,2019-02-25 16:11:15,N,1,179,179,1,0.89,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,2,1,0.0
+475628,1,2019-02-23 23:18:58,2019-02-23 23:31:00,N,1,210,55,1,4.2,15.0,0.5,0.5,0.0,0.0,,0.3,16.3,2,1,0.0
+150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75
+245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0
+151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv
new file mode 100644
index 000000000000..dd5bcc804be2
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/02/yellow_tripdata.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+9873,2,2019-02-14 14:06:26,2019-02-14 14:24:36,2,1.24,1,N,68,161,2,11.5,0.0,0.5,0.0,0.0,0.3,14.8,2.5
+9976,1,2019-02-08 14:20:27,2019-02-08 14:25:30,0,0.7,1,N,186,249,2,5.0,2.5,0.5,0.0,0.0,0.3,8.3,2.5
+4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5
+1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5
+6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv
new file mode 100644
index 000000000000..c5f49aef4b67
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/green_tripdata.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+337596,1,2019-03-18 08:20:31,2019-03-18 09:12:55,N,1,49,35,1,6.1,33.5,0.0,0.5,0.0,0.0,,0.3,34.3,1,1,0.0
+378981,2,2019-03-20 13:03:10,2019-03-20 13:16:30,N,1,152,41,1,1.14,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0
+6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0
+76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75
+282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0
diff --git a/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv
new file mode 100644
index 000000000000..085f0cc37265
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/nested_directories_time/2019/03/yellow_tripdata.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+7797,2,2019-03-20 15:40:46,2019-03-20 16:15:38,1,5.8,1,N,164,13,1,26.0,0.0,0.5,5.86,0.0,0.3,35.16,2.5
+671,1,2019-03-14 17:21:03,2019-03-14 17:29:45,1,1.5,1,N,239,166,1,8.0,3.5,0.5,2.45,0.0,0.3,14.75,2.5
+7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0
+9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5
+2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0
diff --git a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-01.csv
new file mode 100644
index 000000000000..14f45f9cf3bc
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-01.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+4565,2,2019-01-15 02:04:57,2019-01-15 02:17:53,1,4.92,1,N,255,226,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36,
+714,2,2019-01-08 00:38:49,2019-01-08 00:52:30,1,7.07,1,N,132,197,1,21.0,0.5,0.5,4.46,0.0,0.3,26.76,
+2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0
+5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95,
+4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56,
diff --git a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-02.csv
new file mode 100644
index 000000000000..dd5bcc804be2
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-02.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+9873,2,2019-02-14 14:06:26,2019-02-14 14:24:36,2,1.24,1,N,68,161,2,11.5,0.0,0.5,0.0,0.0,0.3,14.8,2.5
+9976,1,2019-02-08 14:20:27,2019-02-08 14:25:30,0,0.7,1,N,186,249,2,5.0,2.5,0.5,0.0,0.0,0.3,8.3,2.5
+4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5
+1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5
+6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5
diff --git a/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-03.csv
new file mode 100644
index 000000000000..085f0cc37265
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/single_directory_one_data_asset/yellow_tripdata_2019-03.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+7797,2,2019-03-20 15:40:46,2019-03-20 16:15:38,1,5.8,1,N,164,13,1,26.0,0.0,0.5,5.86,0.0,0.3,35.16,2.5
+671,1,2019-03-14 17:21:03,2019-03-14 17:29:45,1,1.5,1,N,239,166,1,8.0,3.5,0.5,2.45,0.0,0.3,14.75,2.5
+7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0
+9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5
+2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0
diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-01.csv
new file mode 100644
index 000000000000..1168d9f9ef35
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-01.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+198236,2,2019-01-11 01:12:46,2019-01-11 01:20:37,N,1,129,129,1,1.39,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1,
+33286,1,2019-01-02 23:41:23,2019-01-02 23:49:14,N,1,255,112,1,1.5,7.5,0.5,0.5,0.0,0.0,,0.3,8.8,2,1,
+517697,2,2019-01-26 15:32:49,2019-01-26 15:36:34,N,1,7,193,1,0.61,-4.5,0.0,-0.5,0.0,0.0,,-0.3,-5.3,3,1,
+368190,1,2019-01-19 00:30:30,2019-01-19 00:38:22,N,1,33,97,2,1.1,7.0,0.5,0.5,0.0,0.0,,0.3,8.3,2,1,
+155607,2,2019-01-09 08:43:12,2019-01-09 08:49:26,N,1,74,41,1,1.04,6.5,0.0,0.5,1.46,0.0,,0.3,8.76,1,1,
diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-02.csv
new file mode 100644
index 000000000000..b19b1f827f4f
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-02.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+504629,2,2019-02-25 16:05:49,2019-02-25 16:11:15,N,1,179,179,1,0.89,5.5,1.0,0.5,0.0,0.0,,0.3,7.3,2,1,0.0
+475628,1,2019-02-23 23:18:58,2019-02-23 23:31:00,N,1,210,55,1,4.2,15.0,0.5,0.5,0.0,0.0,,0.3,16.3,2,1,0.0
+150917,2,2019-02-07 23:32:04,2019-02-07 23:54:23,N,1,25,68,1,5.5,20.0,0.5,0.5,4.81,0.0,,0.3,28.86,1,1,2.75
+245133,2,2019-02-12 14:04:30,2019-02-12 14:24:53,N,1,25,225,1,3.45,14.5,0.0,0.5,2.5,0.0,,0.3,17.8,1,1,0.0
+151476,2,2019-02-08 00:39:15,2019-02-08 00:51:56,N,1,130,139,1,4.04,14.0,0.5,0.5,0.0,0.0,,0.3,15.3,2,1,0.0
diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-03.csv
new file mode 100644
index 000000000000..c5f49aef4b67
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/green_tripdata_2019-03.csv
@@ -0,0 +1,6 @@
+,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
+337596,1,2019-03-18 08:20:31,2019-03-18 09:12:55,N,1,49,35,1,6.1,33.5,0.0,0.5,0.0,0.0,,0.3,34.3,1,1,0.0
+378981,2,2019-03-20 13:03:10,2019-03-20 13:16:30,N,1,152,41,1,1.14,9.5,0.0,0.5,0.0,0.0,,0.3,10.3,1,1,0.0
+6063,2,2019-03-01 10:48:22,2019-03-01 10:56:05,N,1,260,7,1,1.58,7.5,0.0,0.5,0.0,0.0,,0.3,8.3,2,1,0.0
+76887,1,2019-03-05 08:07:59,2019-03-05 08:30:54,N,1,75,143,2,3.6,16.5,2.75,0.5,4.0,0.0,,0.3,24.05,1,1,2.75
+282323,2,2019-03-15 11:31:28,2019-03-15 12:08:48,N,5,72,150,1,7.98,24.14,0.0,0.5,0.0,0.0,,0.0,24.64,1,2,0.0
diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-01.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-01.csv
new file mode 100644
index 000000000000..14f45f9cf3bc
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-01.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+4565,2,2019-01-15 02:04:57,2019-01-15 02:17:53,1,4.92,1,N,255,226,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36,
+714,2,2019-01-08 00:38:49,2019-01-08 00:52:30,1,7.07,1,N,132,197,1,21.0,0.5,0.5,4.46,0.0,0.3,26.76,
+2748,1,2019-01-28 23:01:36,2019-01-28 23:17:58,1,3.5,1,N,142,107,1,14.0,0.5,0.5,3.05,0.0,0.3,18.35,0.0
+5424,1,2019-01-18 17:27:01,2019-01-18 17:38:41,1,1.5,1,N,79,232,1,9.0,1.0,0.5,2.15,0.0,0.3,12.95,
+4893,2,2019-01-07 18:37:42,2019-01-07 18:40:24,1,0.56,1,N,107,137,1,4.5,1.0,0.5,1.26,0.0,0.3,7.56,
diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-02.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-02.csv
new file mode 100644
index 000000000000..dd5bcc804be2
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-02.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+9873,2,2019-02-14 14:06:26,2019-02-14 14:24:36,2,1.24,1,N,68,161,2,11.5,0.0,0.5,0.0,0.0,0.3,14.8,2.5
+9976,1,2019-02-08 14:20:27,2019-02-08 14:25:30,0,0.7,1,N,186,249,2,5.0,2.5,0.5,0.0,0.0,0.3,8.3,2.5
+4907,2,2019-02-20 14:58:32,2019-02-20 15:08:43,0,0.71,1,N,230,163,1,7.5,0.0,0.5,2.7,0.0,0.3,13.5,2.5
+1781,2,2019-02-25 08:53:07,2019-02-25 09:03:02,1,1.2,1,N,13,231,1,8.0,0.0,0.5,2.26,0.0,0.3,13.56,2.5
+6888,1,2019-02-28 05:36:38,2019-02-28 05:52:50,1,4.3,1,N,239,125,1,15.5,3.0,0.5,3.85,0.0,0.3,23.15,2.5
diff --git a/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-03.csv b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-03.csv
new file mode 100644
index 000000000000..085f0cc37265
--- /dev/null
+++ b/tests/test_sets/dataconnector_docs/single_directory_two_data_assets/yellow_tripdata_2019-03.csv
@@ -0,0 +1,6 @@
+,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
+7797,2,2019-03-20 15:40:46,2019-03-20 16:15:38,1,5.8,1,N,164,13,1,26.0,0.0,0.5,5.86,0.0,0.3,35.16,2.5
+671,1,2019-03-14 17:21:03,2019-03-14 17:29:45,1,1.5,1,N,239,166,1,8.0,3.5,0.5,2.45,0.0,0.3,14.75,2.5
+7266,2,2019-03-09 00:18:20,2019-03-09 00:23:55,4,0.87,1,N,264,264,1,5.0,0.5,0.5,1.89,0.0,0.3,8.19,0.0
+9403,2,2019-03-11 19:54:08,2019-03-11 19:59:58,1,0.75,1,N,161,230,1,5.5,1.0,0.5,2.45,0.0,0.3,12.25,2.5
+2927,2,2019-03-03 19:54:42,2019-03-03 20:10:50,2,2.43,1,N,161,68,1,12.0,0.0,0.5,0.0,0.0,0.3,12.8,0.0