All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog
- core: Fix exception handling in telemetry and cli (#685)
- dbt: Add logging for ignored dbt tests (#682)
- dbt: Cleanup json dump debug statements
- snowflake: quote columns in queries, fixes #679. (#680)
- spark: try using describe table instead of iterating over columns (#678)
- core: Update the telemetry attribute (#672)
- core: Pin markupsafe (#677)
- snowflake: Collation removal for REGEX functions (#673)
- hive: update connection parameters (#662)
- core: Change invalid keys message to a warning instead of error (#656)
- core: submit a utc timestamp when creating scan (#651)
- core: Remove explicit permission setting on yml files created (#642)
- core: Fix SodaOTLPExporter constructor. Fixes #627 (#639)
- core: Do not export non-soda spans in console and OTLP exporters. Fixes #627 (#632)
- core: Fix exit code when running scans (#623)
- dbt: add telemetry for ingest command (#655)
- dbt: raise error if parsed results contain only null failures (#654)
- dbt: resolve key access error when sources are not present in manifest (#646)
- dbt: get artifacts from dbt Cloud via job_id (#647)
- bigquery: fix the create command (#653)
- sqlserver: Add encrypt, and trust_server_certificate options (#643)
- core: add main module, so soda can be run using
python -m sodasql
- core: move cryptography dependency to snowflake
- bigquery: introduce use_context_auth setting
- cloud: Make sure that title is sent to Soda Cloud for sql metrics
- dbt: add ingestion of source nodes
- dbt: Ingest test artifacts from dbt Cloud
- sqlserver: Fix column names quoting
- misc: Update classifiers on PyPi
- Core: Docker x86 workaround for running soda-sql on arm based machines (#590)
- Cloud: Add database name and schema to cloud scanStart command (#584)
- dbt: Adds ingest dbt tests
- sqlserver: Support for SQLServer Dialect (#564)
- sqlserver: Fix column quoting and analyze issues (#595)
- trino: Experimental Trino dialect (#596)
- Core: Enable Open Telemetry
- dbt: Add dbt package to releases
- Core: Use abstract method instead of exceptions (#566)
- Core: Save scan results to a json file (#569)
- Core: Introduce Open Telemetry tracing (off by default) (#563)
- Core: Support pathlib.Purepath for yaml files (#573)
- dbt: Add dbt package to parse manifest and run_results (#572)
- Core: Fix redshift CI test details
- Core: Fix typo in command help
- Core/Cloud: Add Soda Cloud metrics store support (#528)
- Core: Disable samples and failed row collection based on a setting (#517)
- Core: deprecate to_json and introduce to_dict (#510)
- Core: Add short_help (#546)
- Core: Add github issue link in exception messages (#530)
- BigQuery: Fix auth scope parsing as a list (#524)
- BigQuery: remove unnecessary error during create (#520)
- Snowflake: Add support to set session parameters (#514)
- Core: Fix timeout validation
- Spark: Filter columns in spark dialect
- Core: Fix test connection method
- Spark: Add support for pyodbc/databricks
- Spark: Allow spark dialect to work without database specified
- Core: fix time option as it's always set to now by default (#473)
- Core: Update dev requirements
- Core: Update readme with dialect status (#477)
- Core: Update Tox in dev requirements to prevent version deadlock (#474)
- BigQuery: fix NoneType issue when credentials are not sufficient for BigQuery (#472)
- BigQuery: Update bigquery dependency version (#470)
- MySQL: Fix MySQL dialect issues (#475)
- Core: Implement option to limit the number of tables analyzed (#466)
- Core: Add --offline flag to scan cli to skip sending results to Soda Cloud (#192) (#448)
- Core: Update logging with module names (#447)
- Core: add --non-interactive and --time flag (#455)
- Core: Update unsupported columns message as warning (#460)
- Core/Soda Cloud: Validate warehouse connection method, basic Redshift connection validation (#125) (#454)
- Soda Cloud: truncate columns longer than a length of 200 (#453)
- Soda Cloud: Evaluate test expressions and send them to the cloud to be displayed (#239) (#449)
- BigQuery: Get bigquery account info from reading an external path (#451)
- Core: Change dependencies to use ranges (#435)
- Core: Adjust logging level with env var (#399) (#442)
- Core: add default encoding to utf8 everywhere (#441)
- Core: Implements sql_test_connection for all warehouses
- Athena: Support for AWS profile based connection (#397)
- Soda Cloud: Send error_code to backend when available (#351) (#443)
- Core: Add validation to valid_min, valid_max, and valid_values
- Core: Add support for using variables in tests
- BigQuery: Make AuthScopes configurable
- MySQL: More tests to check the completeness
- Spark: Merge and release first version of Spark dialect
- MySQL: Experimental MySQL Support
- Core: Add excluded_columns feature
- Core: Add limit to custom metrics failed_rows query (#392)
- Core: Updates log level from error to warning for columns with unsupported data types (#391)
- Core: Use the sampler failed_limit for failed rows limit (#394)
- Soda Cloud: Rename validValues -> allowedValues
- Soda Cloud: Rename semanticType to logicalType
- Soda Cloud: Update validity formats
- Soda Cloud: Add SODA_SCAN_ORIGIN env var support (#386)
- Maintenance: Update urllib3
- Maintenance: Pin dependencies to known working versions (#389)
- Maintenance: make the recreate_venv.sh script work again (#381)
- Examples/Docs: Add example AWS lambda function
- Hive: Add 'authentication' parameter to warehouse config
- Hive: Add string type
- Hive: Implement is_time function
- Soda Cloud: Fix maximumValue monitor type
- Soda Cloud: Prevent memory issues when flushing measurements
- Soda Cloud: Failed rows should add all columns
- Athena/Redshift: Add AWS profile support (Thank you! @Tonkonozhenko)
- Athena: Fix is_number (Thank you! @Tonkonozhenko)
- Docs: Document creating a new dialect (Thank you! @JCZuurmond)
- SQLServer Quoting issues
- spark-sql dialect (Kudos @JCZuurmond and @jchoekstra) https://github.com/sodadata/soda-sql/tree/feature/spark-sql-dialect
- Soda Cloud: Add origin 'external' to Soda Cloud scan start
- Athena: Add cast to decimal for numeric ops
- Soda Cloud: Send failed rows for custom cloud metrics (aka negative value metrics)
- Github Actions/CI improvements
- Athena/Redshift: Fix unwanted precision loss for doubles
- Soda Cloud: Fix failed row count measurements sent to Soda Cloud
- core: Skip tests when metrics are null
- Soda Cloud: Add support for custom metrics, filter with different semantic types
- core: Fix unspecified metrics calculation
- SQLServer test coverage
- core: Add is_temporal to identify date/time semantic type
- core: Fix valid_min and valid_max calculation for varchar with numeric data
- soda cloud: Add support for IS NULL sql expression
- snowflake: Privatekey authentication support
- core: enable passing any parameters as environment variables
- core: remove mandatory parsing of env_vars unless env_var function is used
- soda cloud: Add support for custom cloud metrics
- redshift: Update supported datatypes
- snowflake: Don't upcase or quote identifiers
- sqlserver: add trusted_connection parameter for AD/Windows Authentication
- Logs error message when invalid columns are configured in yml (#297)
- Adds nullable and semantic_type to schema metric (#294)
- Add warehouse dockerfile with pre-populated demodata
- validate scan time ISO 8601 compliance (#285)
- SQLServer: Fix soda analyze
- SQLServer: Limit (TOP) works in queries
- Metrics: scan command fails when a date validation is added
- Metrics: scan command fails when valid_format is added to
- Snowflake: role and other parameters can be configured in warehouse.yml
- Fixed metric_groups not calculating all relevant metrics
- Frequent values are calculated for al metric types
- Fix Soda CLI return code
soda create bigquery
command doesn't throw error anymore- Added scopes to Bigquery credentials
- Default scan time is set to UTC
- Soda docs are now in a separate repository
- Separated code into different modules, see RFC
- Documentation updates
- Warehouse fixture cleanup
- Update dependencies to their latest versions
- Fix JSON serialization issues with decimals
- Refactor the way how errors are shown during a scan
- Add support for Metric Groups
- Add experimental support for SQL Metrics which produce
failed_rows
- Add experimental support for dataset sampling (Soda Cloud)
- new parameter
catalog
added in Athena warehouse configuration
- Hive support dependencies have been included in the release package
- Athena complex row type fix
- Fixed metadata bug
- Fixed bug in serialization of group values
- Added Hive dialect (Thanks, @ilhamikalkan !)
- In scan, skipping columns with unknown data types
- Improved analyzer should avoid full scan (Thanks, @mmigdiso !)
- Fixed valid_values configuration
- Improved error handling for Postgres/Redshift
- Adaptations in
.editorconfig
to align with PEP-8 - Documentation improvements and additions
- Managed scan support in scan launching flow
- Error handling of failed scans improvements
- Fixed histogram query filter bug. Thanks mmigdiso for the contribution!
- Added build badge to README.md
2.0.0b18
was a broken release missing some files. This releases fixes that. It does not introduce any new features.
- Fixed bug in analyze command in BigQuery for STRUCT & ARRAY columns
- Added support for Python 3.9 (Big tnx to the Snowflake connector update!)
- Switched from yaml FullLoader to SafeLoader as per suggestion of 418sec. Tnx!
- Fixed Decimal jsonnable bug when serializing test results
- Improved logging
- Initial SQLServer dialect implementation (experimental), contributed by Eric. Tnx!
- Scan logging improvements
- Fixed BigQuery connection property docs
- Fixed json Decimal serialization bug when sending test results to Soda cloud
- Fixed validity bug (min/max)
- Add support for role assumption in Athena and Redshift
- Add support for detection of connectivity and authentication issues
- Improvements in cross-dialect quote handling
- Added create connection args / kwargs
- Fixed missing values in the soda scan start request
- Fixed Athena schema property #110 111 : Tnx Lieven!
- Improved Airflow & filtering docs
- Improved mins / maxs for text and time columns
- Renamed CLI command init to analyze!
- (Internal refactoring) Extracted dataset analyzer
- Added missing table metadata queries for athena and bigquery (#97)
- Internal changes (token authorization)
- Internal changes (scan builder, authentication and API changes)
- Fixed init files
- Fixed scan logs
- Moved the SQL metrics inside the scan YAML files, with option to specify the SQL inline or refer to a file.
- Added column level SQL metrics
- Added documentation on SQL metric variables
- Upgrade library dependencies (dev-requirements.in) to their latest patch
- In scan.yml files, extracted metric categories from
metrics
into a separatemetric_groups
element
- Fixed Snowflake CLI issue
- Packaging issue solved which blocked CLI bootstrapping
- Support for Snowflake
- Support for AWS Redshift
- Support for AWS Athena
- Support for GCP BigQuery
- Anonymous tests
- Improved docs on tests
- Support for AWS PostgreSQL
- Published docs online
Initial release