Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Generator Request: RandomCity #144

Open
Tracked by #167
jensenbox opened this issue Jul 10, 2024 · 4 comments
Open
Tracked by #167

Data Generator Request: RandomCity #144

jensenbox opened this issue Jul 10, 2024 · 4 comments
Assignees
Labels
Milestone

Comments

@jensenbox
Copy link
Contributor

I know you call them transformers but for some reason in my mind they just seem closer to data generators than something that transforms :)

Anyway, I am working on a table that has the address broken up like:

    address_line_1 character varying(1024),
    address_line_2 character varying(1024),
    city character varying(255),
    postal_code character varying(20),
    region character varying(255),
    country character varying(2),

I can use RealAddress for the address_line_1 and random data for the others but it would be nice to have city be something interesting.

@wwoytenko
Copy link
Contributor

I know you call them transformers but for some reason in my mind, they just seem closer to data generators than something that transforms :)

Hi! Thank your feedback. I will consider the naming, but both namings are controversial, so
we need to choose the user-friendly. Maybe I will raise a vote in the future)

I named it transformer because some transformer changes original data rather than generate new ones. For instance, in the latest beta, you can generate random email and keeping part of email as was in original value:

- schema: "public"
  name: "account"
  transformers:
    - name: "RandomEmail"
      params:
        column: "email"
        engine: "hash"
        keep_original_domain: true
        local_part_template: "{{ first_name | lower }}.{{ last_name | lower }}.{{ .random_string | trunc 10 }}"

@wwoytenko
Copy link
Contributor

wwoytenko commented Jul 10, 2024

I can use RealAddress for the address_line_1 and random data for the others but it would be nice to have city be something interesting.

Good note, agree. I will try to make the RealAddress generate for useful according to your feedback.

Well I have an Idea when people can provide their own addresses or any other databases with data, for instance in json representation. The Greenmask would use that data for mapping to the columns. For instance.

- schema: "public"
  name: "account_address"
  transformers:
    - name: "RandomDataFromFile"
      params:
        file: "/path/to/your/db.json"
        columns:
          - name: "address_line_1"
            value: "{{ db.address_line1 }}"
          - name: "city"
            value: "{{ db.city }}"

And in the file might be kind of

[
 {
   "address_line_1": "val1",
   "address_line_2": "val2",
   "city": "val3",
   "postal_code": "val4",
   "region": "val5",
   "country": "val6",
 }
]

Why this way? I think this might be used not only for address but for multipurpose. Allowing users to define their own functional dependencies between attribute in the database provided.

@wwoytenko wwoytenko self-assigned this Jul 10, 2024
@jensenbox
Copy link
Contributor Author

I know you call them transformers but for some reason in my mind, they just seem closer to data generators than something that transforms :)

Hi! Thank your feedback. I will consider the naming, but both namings are controversial, so we need to choose the user-friendly. Maybe I will raise a vote in the future)

I named it transformer because some transformer changes original data rather than generate new ones. For instance, in the latest beta, you can generate random email and keeping part of email as was in original value:

- schema: "public"
  name: "account"
  transformers:
    - name: "RandomEmail"
      params:
        column: "email"
        engine: "hash"
        keep_original_domain: true
        local_part_template: "{{ first_name | lower }}.{{ last_name | lower }}.{{ .random_string | trunc 10 }}"

I was actually thinking the same thing when I wrote it - I see both sides for sure. There are data generators and data transformers (or mutators) - When I thought of how the documentation would be written it did not make sense to put them in two sections either - so there should be a good name for both of course.

I asked the AI God what it though:
image

@jensenbox
Copy link
Contributor Author

I can use RealAddress for the address_line_1 and random data for the others but it would be nice to have city be something interesting.

Good note, agree. I will try to make the RealAddress generate for useful according to your feedback.

Well I have an Idea when people can provide their own addresses or any other databases with data, for instance in json representation. The Greenmask would use that data for mapping to the columns. For instance.

- schema: "public"
  name: "account_address"
  transformers:
    - name: "RandomDataFromFile"
      params:
        file: "/path/to/your/db.json"
        columns:
          - name: "address_line_1"
            value: "{{ db.address_line1 }}"
          - name: "city"
            value: "{{ db.city }}"

And in the file might be kind of

[
 {
   "address_line_1": "val1",
   "address_line_2": "val2",
   "city": "val3",
   "postal_code": "val4",
   "region": "val5",
   "country": "val6",
 }
]

Why this way? I think this might be used not only for address but for multipurpose. Allowing users to define their own functional dependencies between attribute in the database provided.

For ease of use, you could even replace the file with a yaml array of values. They would of course have to evaluate down to strings but you could do this with yaml anchors so you could re-use it in other parts of the configuration file.

@wwoytenko wwoytenko moved this to Open in Engineering Aug 3, 2024
@wwoytenko wwoytenko added this to the v0.2b2 milestone Aug 3, 2024
@wwoytenko wwoytenko mentioned this issue Aug 5, 2024
11 tasks
@wwoytenko wwoytenko modified the milestones: v0.2b2, v0.2rc Aug 28, 2024
@wwoytenko wwoytenko mentioned this issue Oct 8, 2024
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Open
Development

No branches or pull requests

2 participants