Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 159 new evaluation #160

Merged
merged 13 commits into from
Jun 26, 2020
Merged

Issue 159 new evaluation #160

merged 13 commits into from
Jun 26, 2020

Conversation

csala
Copy link
Contributor

@csala csala commented Jun 26, 2020

Resolve #159

Replace evaluation subpackage with new evaluation module that uses SDMetrics.

Also add new EVALUATION.md module with usage instructions

Also update the Makefile, add readme and tutorials tests and add github actions.

EVALUATION.md Outdated
# SDV Evaluation

After using SDV to model your database and generate a synthetic version of it you
might want to evaluate how similar the syntehtic data is to your real data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: syntehtic -> synthetic

EVALUATION.md Outdated
might want to evaluate how similar the syntehtic data is to your real data.

SDV has an evaluation module with a simple function that allows you to compare
the syntehtic data to your real data using [SDMetrics](https://github.com/sdv-dev/SDMetrics) and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: syntehtic -> synthetic

return synth, real, metadata


def evaluate(synth, real=None, metadata=None, root_path=None, table_name=None, get_report=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function signature needs to be updated, real/metadata should be required?

Copy link
Contributor Author

@csala csala Jun 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the rationale here is that the only thing truly required is the synthetic data.
Then, you can either pass real data without metadata (and we build the metadata on the fly), or metadata without real data (then we load the real data from the metadata) or both.

This logic is embeded in the _validate_arguments method, which tries to build whatever is necessary to evaluate from as little input from the user as possible.

@csala csala merged commit ca9d4ad into master Jun 26, 2020
@csala csala deleted the issue-159-new-evaluation branch June 29, 2020 08:43
JonathanDZiegler pushed a commit to JonathanDZiegler/SDV that referenced this pull request Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use SDMetrics for evaluation
2 participants