Skip to content

Releases: sdv-dev/SDGym

v0.9.1 - 2024-08-29

29 Aug 15:50
Compare
Choose a tag to compare

Bugs Fixed

  • AttributeError when running custom synthesizer with timeout - Issue #335 by @fealho

v0.9.0 - 2024-08-07

07 Aug 16:47
Compare
Choose a tag to compare

This release enables the diagnostic score to be computed in a benchmarking run. It also renames the IndependentSynthesizer to ColumnSynthesizer. Finally, it fixes a bug so that the time for all metrics will now be used to compute the Evaluate_Time column in the results.

Bugs Fixed

  • Cap numpy to less than 2.0.0 until SDGym supports - Issue #313 by @gsheni
  • The returned Evaluate_Time does not include results from all metrics - Issue #310 by @lajohn4747

New Features

  • Rename IndependentSynthesizer to ColumnSynthesizer - Issue #319 by @lajohn4747
  • Allow the ability to compute diagnostic score in a benchmarking run - Issue #311 by @lajohn4747

v0.8.0 - 2024-06-07

07 Jun 15:12
Compare
Choose a tag to compare

This release adds support for both Python 3.11 and 3.12! It also drops support for Python 3.7.

This release adds a new parameter to benchmark_single_table called run_on_ec2. When enabled, it will launch a t2.medium ec2 instance on the user's AWS account using the credentials they specify in environment variables. The benchmarking will then run on this instance. The output_filepath must be provided and must be in the format {s3_bucket_name}/{path_to_file} when run_on_ec2 is enabled.

Documentation

  • Docs for AWS integration are incorrect - Issue #304 by @srinify

Maintenance

  • Add support for Python 3.11 - Issue #250 by @fealho
  • Remove anyio usage - Issue #252 by @lajohn4747
  • Drop support for Python 3.7 - Issue #254 by @R-Palazzo
  • Switch default branch from master to main - Issue #257 by @R-Palazzo
  • Transition from using setup.py to pyproject.toml to specify project metadata - Issue #266 by @R-Palazzo
  • Remove bumpversion and use bump-my-version - Issue #267 by @R-Palazzo
  • Switch to using ruff for Python linting and code formatting - Issue #268 by @gsheni
  • Add dependency checker - Issue #277 by @lajohn4747
  • Add bandit workflow - Issue #282 by @R-Palazzo
  • Cleanup automated PR workflows - Issue #286 by @R-Palazzo
  • Add support for Python 3.12 - Issue #288 by @fealho
  • Only run unit and integration tests on oldest and latest python versions for macos - Issue #294 by @R-Palazzo
  • Bump verions SDV, SDMetrics and RDT - Issue #298

Bugs Fixed

  • The UniformSynthesizer should follow the sdtypes in metadata (not the data's dtypes) - Issue #248 by @lajohn4747
  • Fix minimum version workflow when pointing to github branch - Issue #280 by @R-Palazzo
  • Passing synthesizer as string fails if run_on_ec2 is enabled - Issue #306 by @lajohn4747

New Features

v0.7.0 - 2023-06-14

14 Jun 18:21
Compare
Choose a tag to compare

This release adds support for SDV 1.0 and PyTorch 2.0!

New Features

  • Add functions to top level import - Issue #229 by @fealho
  • Cleanup SDGym to the new SDV 1.0 metadata and synthesizers - Issue #212 by @fealho

Bugs Fixed

Internal

  • Increase code style lint - Issue #123 by @fealho
  • Remove code support for synthesizers that are not strings/classes - PR #236 by @fealho
  • Code Refactoring - Issue #215 by @fealho

Maintenance

v0.6.0 - 2023-02-01

01 Feb 18:18
Compare
Choose a tag to compare

This release introduces methods for benchmarking single table data and creating custom synthesizers, which can be based on existing SDGym-defined synthesizers or on user-defined functions. This release also adds support for Python 3.10 and drops support for Python 3.6.

New Features

  • Benchmarking progress bar should update on one line - Issue #204 by @katxiao
  • Support local additional datasets folder with zip files - Issue #186 by @katxiao
  • Enforce that each synthesizer is unique in benchmark_single_table - Issue #190 by @katxiao
  • Simplify the file names inside the detailed_results_folder - Issue #191 by @katxiao
  • Use SDMetrics silent report generation - Issue #179 by @katxiao
  • Remove arguments in get_available_datasets - Issue #197 by @katxiao
  • Accept metadata.json as valid metadata file - Issue #194 by @katxiao
  • Check if file or folder exists before writing benchmarking results - Issue #196 by @katxiao
  • Rename benchmarking argument "evaluate_quality" to "compute_quality_score" - Issue #195 by @katxiao
  • Add option to disable sdmetrics in benchmarking - Issue #182 by @katxiao
  • Prefix remote bucket with 's3' - Issue #183 by @katxiao
  • Benchmarking error handling - Issue #177 by @katxiao
  • Allow users to specify custom synthesizers' display names - Issue #174 by @katxiao
  • Update benchmarking results columns - Issue #172 by @katxiao
  • Allow custom datasets - Issue #166 by @katxiao
  • Use new datasets s3 bucket - Issue #161 by @katxiao
  • Create benchmark_single_table method - Issue #151 by @katxiao
  • Update summary metrics - Issue #134 by @katxiao
  • Benchmark individual methods - Issue #159 by @katxiao
  • Add method to create a sdv variant synthesizer - Issue #152 by @katxiao
  • Add method to generate a multi table synthesizer - Issue #149 by @katxiao
  • Add method to create single table synthesizers - Issue #148 by @katxiao
  • Updating existing synthesizers to new API - Issue #154 by @katxiao

Bug Fixes

  • Pip encounters dependency issues with ipython - Issue #187 by @katxiao
  • IndependentSynthesizer is printing out ConvergeWarning too many times - Issue #192 by @katxiao
  • Size values in benchmarking results seems inaccurate - Issue #184 by @katxiao
  • Import error in the example for benchmarking the synthesizers - Issue #139 by @katxiao
  • Updates and bugfixes - Issue #132 by @csala

Maintenance

v0.5.0 - 2021-12-13

13 Dec 21:37
Compare
Choose a tag to compare

This release adds support for Python 3.9, and updates dependencies to accept the latest versions when possible.

Issues closed

v0.4.1 - 2021-08-20

13 Dec 18:27
Compare
Choose a tag to compare

This release fixed a bug where passing a json file as configuration for a multi-table synthesizer crashed the model.
It also adds a number of fixes and enhancements, including: (1) a function and CLI command to list the available synthesizer names,
(2) a curate set of dependencies and making Gretel into an optional dependency, (3) updating Gretel to use temp directories,
(4) using nvidia-smi to get the number of gpus and (5) multiple dockerfile updates to improve functionality.

Issues closed

v0.4.0 - 2021-06-17

17 Jun 16:52
Compare
Choose a tag to compare

This release adds new synthesizers for Gretel and ydata, and creates a Docker image for SDGym.
It also includes enhancements to the accepted SDGym arguments, adds a summary command to aggregate
metrics, and adds the normalized score to the benchmark results.

New Features

v0.3.1 - 2021-05-20

21 May 17:50
Compare
Choose a tag to compare

This release adds new features to store results and cache contents into an S3 bucket
as well as a script to collect results from a cache dir and compile a single results
CSV file.

Issues closed

v0.3.0 - 2021-01-27

28 Jan 00:23
Compare
Choose a tag to compare

Major rework of the SDGym functionality to support a collection of new features:

  • Add relational and timeseries model benchmarking.
  • Use SDMetrics for model scoring.
  • Update datasets format to match SDV metadata based storage format.
  • Centralize default datasets collection in the sdv-datasets S3 bucket.
  • Add options to download and use datasets from different S3 buckets.
  • Rename synthesizers to baselines and adapt to the new metadata format.
  • Add model execution and metric computation time logging.
  • Add optional synthetic data and error traceback caching.