Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor - Record the adjusted dtype in Metadata for datetime columns to correct the Synthesizer error #528

Merged

Conversation

matheme-justyn
Copy link
Contributor

@matheme-justyn matheme-justyn commented May 10, 2024

for #466

  • feat

    • PETsARD/util/safe_dtype.py, init.py
      • create safe_infer_dtype - e0f5425
    • PETsARD/processor/base.py
      • add _adjust_metadata() in transform() - 6969af1
    • PETsARD/loader/metadata.py
      • take infer_dtype_after_preproc in to_sdv() - 9086e34
  • test

    • demo/dev/Issue466.ipynb

Solution evidence

#465

from PETsARD import Loader, Processor, Synthesizer


for benchmark in ['bike-sales', 'energydata_complete', 'olist']:
    load = Loader(filepath=f'benchmark://{benchmark}')
    load.load()

    proc = Processor(metadata=load.metadata)
    proc.fit(data=load.data)
    preproc_data = proc.transform(data=load.data)

    syn = Synthesizer('default')
    syn.create(data=preproc_data, metadata=proc._metadata)
    syn.fit_sample()

image


#466

from PETsARD import Loader, Processor, Synthesizer


for benchmark in [
        'bike-sales',
        'energydata_complete',
        'olist'
]:
    load = Loader(filepath=f'benchmark://{benchmark}')
    load.load()

    proc = Processor(metadata=load.metadata)
    proc.fit(data=load.data)
    preproc_data = proc.transform(data=load.data)

    syn = Synthesizer('smartnoise-aim')
    syn.create(data=preproc_data, metadata=proc._metadata)
    syn.fit_sample()

image


@matheme-justyn matheme-justyn added the bug Something isn't working label May 10, 2024
@matheme-justyn matheme-justyn self-assigned this May 10, 2024
@matheme-justyn matheme-justyn changed the title 466 synthesizer same columns have dtypem8s and dtypefloat64 Processor - record adjusted dtype in Metadata for datetime column May 14, 2024
@matheme-justyn matheme-justyn changed the title Processor - record adjusted dtype in Metadata for datetime column Processor - Record the adjusted dtype in Metadata for datetime columns to correct the Synthesizer error May 14, 2024
@matheme-justyn matheme-justyn requested review from a user and mileschangmoda May 14, 2024 07:55
@matheme-justyn matheme-justyn marked this pull request as ready for review May 14, 2024 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants