Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 159 new evaluation #160

Merged
merged 13 commits into from
Jun 26, 2020
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Generate Docs

on:
push:
branches: [ master ]

jobs:

docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Python
uses: actions/setup-python@v1
with:
python-version: '3.7'

- name: Build
run: |
python -m pip install --upgrade pip
pip install -e .[dev]
make docs
- name: Deploy
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{secrets.GITHUB_TOKEN}}
publish_dir: docs/_build/html
40 changes: 40 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Run Tests

on:
push:
branches: [ '*' ]
pull_request:
branches: [ master ]

jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: [3.5, 3.6, 3.7]
os: [ubuntu-latest, macos-latest]

steps:
- uses: actions/checkout@v1
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}

- if: matrix.os == 'ubuntu-latest'
name: Install graphviz - Ubuntu
run: |
sudo apt-get install graphviz

- if: matrix.os == 'macos-latest'
name: Install graphviz - MacOS
run: |
brew install graphviz

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install tox tox-gh-actions

- name: Test with tox
run: tox
26 changes: 6 additions & 20 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,32 +1,18 @@
# Config file for automatic testing at travis-ci.org
dist: trusty
dist: bionic
language: python
python:
- 3.7
- 3.6
- 3.5

matrix:
include:
- python: 3.7
dist: xenial
sudo: required

# Command to install dependencies
install: pip install -U tox-travis codecov
install:
- sudo apt-get update
- sudo apt-get install graphviz
- pip install -U tox-travis codecov

after_success: codecov

# Command to run tests
script: tox

deploy:

- provider: pages
skip-cleanup: true
github-token: "$GITHUB_TOKEN"
keep-history: true
local-dir: docs/_build/html
target-branch: gh-pages
on:
branch: master
python: 3.6
46 changes: 46 additions & 0 deletions EVALUATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# SDV Evaluation

After using SDV to model your database and generate a synthetic version of it you
might want to evaluate how similar the syntehtic data is to your real data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: syntehtic -> synthetic


SDV has an evaluation module with a simple function that allows you to compare
the syntehtic data to your real data using [SDMetrics](https://github.com/sdv-dev/SDMetrics) and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: syntehtic -> synthetic

generate a simple standardized score.

## Evaluating your synthetic data

After you have modeled your databased and generated samples out of the SDV models
you will be left with a dictionary that contains table names and dataframes.

For exmple, if we model and sample the demo dataset:

```python3
from sdv import SDV
from sdv.demo import load_demo

metadata, tables = load_demo(metadata=True)

sdv = SDV()
sdv.fit(metadata, tables)

samples = sdv.sample_all(10)
```

`samples` will contain a dictionary with three tables, just like the `tables` dict.


At this point, you can evaluate how similar the two sets of tables are by using the
`sdv.evaluation.evaluate` function as follows:

```
from sdv.evaluation import evaluate

score = evaluate(samples, tables, metadata)
```

The output will be a maximization score that will indicate how good the modeling was:
the higher the value, the more similar the sets of table are. Notice that in most cases
the value will be negative.

For further options, including visualizations and more detailed reports, please refer to
the [SDMetrics](https://github.com/sdv-dev/SDMetrics) library.
Loading