Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enriching Datasets and Region-Specific Functionality #413

Closed
ake2l opened this issue May 23, 2023 · 4 comments · Fixed by #429
Closed

Enriching Datasets and Region-Specific Functionality #413

ake2l opened this issue May 23, 2023 · 4 comments · Fixed by #429
Assignees
Labels
enhancement New feature or request enhancement-ee This feature will be extended in Enterprise Version

Comments

@ake2l
Copy link
Member

ake2l commented May 23, 2023

Summary

Our project currently provides limited dataset support, predominantly defaulting to US-based datasets. We aim to enhance our dataset variety for test data generation to support a diverse range of countries and regions. These enhancements are in line with the specifications found in the Benerator documentation. The primary focus of this expansion will be on the Person, Company, and Address Generators. Additionally, we should carry out rigorous testing on the region functionality to ensure it operates effectively.

Details

  • Dataset Limitations: At the moment, the test data generation capabilities of our project are hindered by the limited datasets we have at our disposal. These datasets do not sufficiently represent a wide range of countries. We are aiming to enrich these datasets, expanding their scope to cover multiple regions more robustly.
  • Default Dataset: The current system defaults to the US dataset when generating test data. We need to implement measures to prevent this automatic reversion and ensure the dataset specific to the selected region is retained.
  • Region-Specific Functionality: We intend to incorporate functionality into our data generator that allows users to select specific regions, as outlined in the Benerator documentation.
  • Person, Company, and Address Generators: As part of our commitment to enhancing our test data generator, we plan to enrich these specific data generators with comprehensive data from various countries and regions, thereby increasing their effectiveness and versatility.
  • Region Functionality Testing: To ensure the accuracy of geographical dataset mapping, we will incorporate these aspects into the EE demos. extension should look like similar to this
	<echo>Running AddressGenerator for different datasets and regions</echo>
	<generate type="AddressDE" count="50" consumer="Preview">
		<attribute name="address" generator="AddressGenerator" dataset="DE"/>
	</generate>
	<generate type="AddressBR" count="50" consumer="Preview">
		<attribute name="address" generator="AddressGenerator" dataset="BR"/>
	</generate>
	<generate type="AddressUS" count="50" consumer="Preview">
		<attribute name="address" generator="AddressGenerator" dataset="US"/>
	</generate>
	<generate type="AddressFR" count="50" consumer="Preview">
		<attribute name="address" generator="AddressGenerator" dataset="FR"/>
	</generate>
	<generate type="AddressBE" count="50" consumer="Preview">
		<attribute name="address" generator="AddressGenerator" dataset="BE"/>
	</generate>

Expected Outcome

Our primary goal is to provide enhanced dataset variety, superior region-specific functionality, and improved Person, Company, and Address Generators for more effective test data generation.

@ake2l ake2l added the enhancement New feature or request label May 23, 2023
@ake2l ake2l changed the title Enhancing Datasets and Region-Specific Functionality Enriching Datasets and Region-Specific Functionality May 23, 2023
@ake2l ake2l added the enhancement-ee This feature will be extended in Enterprise Version label May 23, 2023
@ake2l
Copy link
Member Author

ake2l commented May 23, 2023

Goal is to avoid smth like this

<generate type="AddressFR" count="50" consumer="Preview">
	<attribute name="address" generator="AddressGenerator" dataset="FR"/>
</generate>

<generate type="AddressBE" count="50" consumer="Preview">
	<attribute name="address" generator="AddressGenerator" dataset="BE"/>
</generate>

leads to this

	at com.rapiddweller.domain.address.AddressGenerator.init(AddressGenerator.java:81) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.composite.SourcedGenerationStep.init(SourcedGenerationStep.java:60) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.GenIterTask.initStatements(GenIterTask.java:302) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.GenIterTask.init(GenIterTask.java:128) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.GenIterStatement.beInitialized(GenIterStatement.java:230) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.GenIterStatement.execute(GenIterStatement.java:154) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.LazyStatement.execute(LazyStatement.java:63) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.SequentialStatement.executeSubStatements(SequentialStatement.java:72) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.SequentialStatement.execute(SequentialStatement.java:61) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.BeneratorRootStatement.execute(BeneratorRootStatement.java:65) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.DescriptorRunner.execute(DescriptorRunner.java:128) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.DescriptorRunner.runWithoutShutdownHook(DescriptorRunner.java:102) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.IncludeStatement.includeDescriptor(IncludeStatement.java:95) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.IncludeStatement.execute(IncludeStatement.java:73) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.SequentialStatement.executeSubStatements(SequentialStatement.java:72) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.statement.SequentialStatement.execute(SequentialStatement.java:61) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.BeneratorRootStatement.execute(BeneratorRootStatement.java:65) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.DescriptorRunner.execute(DescriptorRunner.java:128) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.DescriptorRunner.runWithoutShutdownHook(DescriptorRunner.java:102) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.engine.DescriptorRunner.run(DescriptorRunner.java:94) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.main.Benerator.runFile(Benerator.java:261) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator.main.Benerator.run(Benerator.java:192) ~[rapiddweller-benerator-ce-3.2.0-jdk-11-20230517.094154-27.jar:?]
	at com.rapiddweller.benerator_ee.main.EEBenerator.runWithArgs(EEBenerator.java:102) ~[rapiddweller-benerator-ee-3.2.0-jdk-11-SNAPSHOT.jar:?]
	at com.rapiddweller.benerator_ee.main.EEBenerator.main(EEBenerator.java:78) ~[rapiddweller-benerator-ee-3.2.0-jdk-11-SNAPSHOT.jar:?]
[ERROR] 2023-05-23 02:58:25.228 [main] AddressGenerator - Cannot generate addresses for FR, falling back to United States

@tunglxfast
Copy link
Contributor

tunglxfast commented Jun 6, 2023

We will make data base on list of countries at Region nesting in Benerator docs.

europe=central_europe,western_europe,southern_europe,eastern_europe,northern_europe
western_europe=french,iberia (FR,MC,ES,AD,PT)
french=FR,MC
iberia=spanish,PT
spanish=ES,AD
central_europe=BE,NL,LU,DE,CH,AT,LI
southern_europe=IT,SM,VA,GR,CY,TR
eastern_europe=AL,SI,CZ,HU,PL,RU,RO,BG,HR,BA,EE,LT,LV,SK,UA
northern_europe=GB,IE,DK,SE,NO,FI,IS
near_east=AF,IR,IL,JO,KZ,PK,QA,SA,AE
africa=EG,GH,KE,ZA
north_america=US,CA
central_america=MX,BS
america=north_america,central_america,south_america
south_america=AR,BR,EC
asia=JP,IN,ID,KR,KP,MY,SG,TW,TH
oceania=AU,NZ

@tunglxfast
Copy link
Contributor

tunglxfast commented Jun 9, 2023

  • Added dataset for 12 countries :"PT, MC, ES, AD, AU, NZ, AT, LI, TH, CA, VE, VN".
  • Nested dataset can be used are :"iberia, oceania".
  • Data's still in the sub-branch "413-enriching-datasets-and-region-specific-functionality", not merged into the main branch yet.

@tunglxfast
Copy link
Contributor

tunglxfast commented Jun 19, 2023

Added data into branch "413-enriching-datasets-and-region-specific-functionality":

  • Added dataset for 9 more countries :"NL, BE, LU, TR, IT, SM, VA, GR, CY".
  • More nested dataset can be used are :"western_europe, central_europe, southern_europe, north_america".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request enhancement-ee This feature will be extended in Enterprise Version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants