DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. DataOps TestGen is part of DataKitchen's Open Source Data Observability.
Install with a single command using dk-installer
.
python3 dk-installer.py tg install
You can also install using the provided docker-compose.yml
.
Make a local copy of the compose file.
curl -o docker-compose.yml 'https://raw.githubusercontent.com/DataKitchen/dataops-testgen/main/deploy/docker-compose.yml'
If you are interested in integrating TestGen with DataKitchen Observability platform, edit the compose file and set values for the environment variables OBSERVABILITY_API_URL
and OBSERVABILITY_API_KEY
.
Before running docker compose, create a .env
to hold the secrets needed to run Testgen.
touch testgen.env
The following variables are required:
TESTGEN_USERNAME=
TESTGEN_PASSWORD=
TG_DECRYPT_SALT=
TG_DECRYPT_PASSWORD=
You can learn about how each variable is used in Configuration
Then, run docker compose to start the services:
docker compose --env-file testgen.env up --detach
This will spin up a postgres service, a startup service which runs once to setup the database and, make the Testgen UI available at http://localhost:8501.
After verifying that Testgen is running, follow the steps for the quick start to start getting familiar with the tool.
Testgen includes a basic data set for you to play around.
Once Testgen is running, you can use dk-installer
to generate the demo data:
python3 dk-installer.py tg run-demo
And, if you are integrating Testgen with the DataKitchen Observability platform, you will need to pass the --export
flag:
python3 dk-installer.py tg run-demo --export
You can also generate the demo data if you installed using docker compose. Set it up by using the Testgen CLI to run the quick start command:
docker compose --env-file testgen.env exec engine testgen quick-start
It also supports setting up the integration with DataKitchen Observability:
docker compose --env-file testgen.env exec engine testgen quick-start --observability-api-url <url> --observability-api-key <key>
NOTE: You don't need to pass the Observability URL and key as arguments if you set them up as environment variables in your compose file.
After you have the demo data from the quick-start
command, follow the following steps to complete the quick start:
- Run profiling against the target demo database
docker compose --env-file testgen.env exec engine testgen run-profile --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d
- Generate tests cases for all columns in the target demo database
docker compose --env-file testgen.env exec engine testgen run-test-generation --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d
- Run the generated tests
docker compose --env-file testgen.env exec engine testgen run-tests --project-key DEFAULT --test-suite-key default-suite-1
- Export the test results to Observability
docker compose --env-file testgen.env exec engine testgen export-observability --project-key DEFAULT --test-suite-key default-suite-1
- Simulate changes to the demo data
docker compose --env-file testgen.env exec engine testgen quick-start --simulate-fast-forward
- And, export the test results over the simulated changes to Observability
docker compose --env-file testgen.env exec engine testgen export-observability --project-key DEFAULT --test-suite-key default-suite-1
Invalidates the cache with the bootstrapped application causing the changes to the routing and plugins to take effect on every render.
Also, changes the logging level for the testgen.ui
logger from INFO
to DEBUG
.
default: no
Set it to yes
to enable rotating file logs to be written under /var/log/testgen/
.
default: no
Salt used to encrypt and decrypt user secrets. Only allows ascii characters.
A minimun length of 16 characters is recommended.
Secret passcode used in combination with TG_DECRYPT_SALT
to encrypt and decrypt user secrets. Only allows ascii characters.
Username to log into the web application.
Password to log into the web application.
User to connect to the testgen application postgres database.
default: os.environ["TESTGEN_USERNAME"]
Password to connect to the testgen application postgres database.
default: os.environ["TESTGEN_PASSWORD"]
User with admin privileges in the testgen application postgres database used to create roles, users, database and schema. Required if the user in TG_METADATA_DB_USER
does not have the required privileges.
default: os.environ["TG_METADATA_DB_USER"]
|
Password for the admin user to connect to the testgen application postgres database.
default: os.environ["TG_METADATA_DB_PASSWORD"]
User to be created into the testgen application postgres database.
Will be granted:
- read/write to tables
test_results
,test_suites
andtest_definitions
- read only to all other tables.
default: testgen_execute
User to be created into the testgen application postgres database. Will be granted read_only access to all tables.
default: testgen_report
Hostname where the testgen application postgres database is running in.
default: localhost
Port at which the testgen application postgres database is exposed by the host.
default: 5432
Name of the database in postgres on which to store testgen metadata.
default: datakitchen
Name of the schema inside the postgres database on which to store testgen metadata.
default: testgen
Code used to uniquely identify the auto generated project.
default: DEFAULT
Name to assign to the auto generated project.
default: Demo
SQL flavor of the database the auto generated project will run tests against.
Supported flavors:
redshift
snowflake
mssql
postgresql
default: postgresql
Name assigned to identify the connection to the project database.
default: default
Maximum number of concurrent queries executed when fetching data from the project database.
default: 4
Determine how many tests are grouped together in a single query. Increase for better performance or decrease to better isolate test failures. Accepted values are 500 to 14 000.
default: 5000
Name of the schema to be created in the project database.
default: qc
Name of the database the auto generated project will run test against.
default: demo_db
Name of the schema inside the project database the tests will be run against.
default: demo
User to be used by the auto generated project to connect to the database under testing.
default: os.environ["TG_METADATA_DB_USER"]
Password to be used by the auto generated project to connect to the database under testing.
default: os.environ["TG_METADATA_DB_PASSWORD"]
Hostname where the database under testing is running in.
default: os.environ["TG_METADATA_DB_HOST"]
Port at which the database under testing is exposed by the host.
default: os.environ["TG_METADATA_DB_PORT"]
For supported SQL flavors, set up the SQLAlchemy connection to trust the database server certificate.
default: no
Name assigned to the auto generated table group.
default: default
Key to be assgined to the auto generated test suite.
default: default-suite-1
Description for the auto generated test suite.
default: default_suite_desc
Comma separated list of specific table names to include when running profiling for the project database.
A SQL filter supported by the project database's LIKE
operator for table names to include.
default: %%
A SQL filter supported by the project database's LIKE
operator for table names to exclude.
default: tmp%%
A SQL filter supported by the project database's LIKE
operator representing ID columns.
default: %%id
A SQL filter supported by the project database's LIKE
operator representing surrogate key columns.
default: %%sk
Toggle on to base profiling on a sample of records instead of the full table. Accepts Y
or N
.
default: N
API URL of your instance of Observability where to send events to for the project.
Authentication key with permissions to send events created in your instance of Observability.
Exporting events to your instance of Observabilty verifies SSL certificate.
default: yes
When exporting to your instance of Observabilty, the maximum number of events that will be sent to the events API on a single export.
default: 5000
When exporting to your instance of Observabilty, the type of event that will be sent to the events API.
default: dataset
When exporting to your instance of Observabilty, the key sent to the events API to identify the components.
default: default
Enables calling Docker Hub API to fetch the latest released image tag. The fetched tag is displayed in the UI menu.
default: yes
We recommend you start by going through the Data Observability Overview Demo.
For support requests, join the Data Observability Slack and ask post on #support channel.
Talk and Learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project.
Join our community here:
For details on contributing or running the project for development, check out our contributing guide.
DataKitchen DataOps TestGen is Apache 2.0 licensed.