Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methods to generate Metadata from DataFrames #126

Closed
csala opened this issue Nov 12, 2019 · 0 comments · Fixed by #132
Closed

Methods to generate Metadata from DataFrames #126

csala opened this issue Nov 12, 2019 · 0 comments · Fixed by #132
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@csala
Copy link
Contributor

csala commented Nov 12, 2019

The Metadata class should be modified to assist on the creation of the metadata dictionary.

First, the metadata argument from the Metadata.__init__ method should be made optional (default to None).

Then, the following methods should be added:

  • add_table(name, primary_key=None, fields=None, data=None, parent=None, foreign_key=None): a New table to the dataset.
    • all the arguments are optional.
    • If fields are given they can either be a list or dict. If they are a dict, they are validated and directly used.
    • If the data is given and fields are not, or they are given as a list, fields is built from the dataframe dtypes. If fields is a list, only those fields are used. Data can either be DataFrame or a path to a CSV file. If the path is relative, it must be relative to the Metadata root_path.
    • if a parent is given, a relationship is created using the foreign key of this table and the primary key of the parent. The parent must exist beforehand in the metadata.
    • Additional arguments such as anonymization details or regular expressions must set either inside the fields argument or afterwards.
  • remove_table(table): remove the table from metadata. If child tables exist, relationships must be removed first.
  • add_relationship(table, parent, foreign_key): Add an individual relationship between two tables. foreign_key field details are inherited from the parent table and validated against the table data.
  • remove_relationship(table, parent): Remove the relationship between two tables.
  • add_field(table, field, field_details): add a new field to the table. If the field already exists, an exception suggesting to use update_field is raised.
  • update_field(table, field, field_deatils): replace the field specification with the given values.
  • remove_field(table, field): remove a field from the table.
  • to_dict(): Return the complete metadata specification as a dictionary.
  • to_json(): Store the dict specification of the metadata in the given location. Use indent=4.
@csala csala added approved feature request Request for a new feature labels Nov 12, 2019
@csala csala added this to the 0.2.1 milestone Nov 12, 2019
JonathanDZiegler pushed a commit to JonathanDZiegler/SDV that referenced this issue Feb 7, 2022
* Update and standardize dependencies

* increase test epochs

* Simplify tvae test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
2 participants