Skip to content

Python program to auto populate a database with random useful data for working with mysql

Notifications You must be signed in to change notification settings

adriano-pinaffo/populate_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

populate_data

This project delivers a Python program that populates a database. The main use-case for this is for people training with relational databases and need to generate random data in tables. Another use-case is the one for people practicing their programming skills and want to play around with some elements and modules in python. This project is a personal development that should, in no way, be used in production environments.

This program will randomly download specific data from the Internet and feed tables in a local database. The data sources in the Internet are generatedata.com and randomlists.com. This program will easily simulate a company database where you may have data from customers, employees, suppliers, cash flows, sales, etc. This means you can create as many tables as you want, from a pre-selected list of fields. Aside from the regular tables, you can also create 'transactional' tables which, in this context, represent a table that will have have foreigner keys.

For example, you may create a customer table and a product table. A transaction table could be a sales table having information about products sold, with relations to existing custormer and existing products (plus their own fields). If not using an input file, -t option will contain the table instruction. The program will accept as many -t options as wanted. Transaction tables are defined in the -x option. As with -t, you can have as many -x options as you want.

Tables and transactions can be indicated in JSON format or in a colon-formated string. Check below to see how to use them.


Populating 1 table with JSON-formatted



Customer table creation using -t option with JSON format. The format is:
{"table_name": "quantity": 10, "drop_if_exists": true, "id_increment": true, "extra": ["extra_option"], "items": ["column1", "column2", "column3"]}
  • table_name: will be the database table name. Observe that this is the object main key.
  • quantity: number of items that will be downloaded and entered in the table (limited to 100 so far)
  • drop_if_exists: Default is true. will drop the table and create a new one if true. This will overwrite the --drop_if_exists option. By having this to no you can overcome the 100 items limitation (providing you keep the same columns throughout the runs)
  • id_increment: Default is true. will create an id field for the table and auto-increment it. This will overwrite the --id_increment option.
  • extra: Default is null. extra options. Useful for the product table only so far.
  • items: column names, as listed in the Fields section

Populating 1 table with colon-formatted option. Differently from the JSON-formatted -t option, all the fields must be present



Customer table creation using -t option with colon format. The format is: table_name:quantity:drop_if_exists:id_increment:column1,column2,column3:extra
  • table_name: will be the database table name. Observe that this is the object main key.
  • quantity: number of items that will be downloaded and entered in the table (limited to 100 so far)
  • drop_if_exists: will drop the table and create a new one if true. This will overwrite the --drop_if_exists option. By having this to no you can overcome the 100 items limitation (providing you keep the same columns throughout the runs)
  • id_increment: will create an id field for the table and auto-increment it. This will overwrite the --id_increment option.
  • items: column names, as listed in the Fields section. They must be separated by commas
  • extra: extra options. Useful for the product table only so far. Notice that in the colon formatted way, the extra option comes at the end

Populating 2 tables with JSON-formatted options



Product table is populated here. Notice that the extra option is informed with the content being a list with the string "product" and for items, no product name is indicated, as this is the default item for the table.

Populating 2 tables with colon-formatted options



Product table is populated here using the extra field. Differently from the JSON-formatted option, the extra field comes at the end, after the items. Similarly to the JSON-formatted option, the product name is not indicated, but is downloaded anyways.

Populating 4 related tables with JSON-formatted options



This populates 4 tables that are related using JSON-formatted -t options. "customer", "sales_dept" and "products" are regular tables and sales is a transaction table, gluing all of the other tables together in one transaction (a sale per se). Noitce that the only difference is that you need to provide as the "foreigner keys" the name of the foreigner table and its key in an object, all in the "items" item. In the database table, that column will be named foreigner_table__foreigner_key (separated by two underscore). Check below:
{"table_name": "quantity": 10, "drop_if_exists": true, "id_increment": true, "items": [{"foreigner_table": "foreigner_key"}, {"foreigner_table2": "foreigner_key"}, "column1", "column2", "column3"]}}

Populating 3 related tables with colon-formatted options



This populates 3 related tables with colon-formatted -t options. in the items area, foreigner tables and their keys are indicated by having them as one item separated by __ (2 underscores). Check below:
table_name:quantity:drop_if_exists:id_increment:foreigner_table__foreigner_column, foreigner_table2__foreigner_column,column1,column2,column3

Populating several tables using JSON file



Instead of typing the tables in the command line, you can choose to have all of your configuration (including the database connection) in a JSON file. Use -i option to input that file and load the configuration. Check out the "input_files" folder that comes with 2 examples of such files, one with the database connection and one without it.

Requirements


This program requires:
  • mysql/mariadb installed with a database created
  • python package
    mysqlclient
    installed. This package may require Python 3 and MySQL development headers and libraries, which can be installed like this:
    
    $ sudo apt-get install python3-dev default-libmysqlclient-dev build-essential # Debian / Ubuntu
    % sudo yum install python3-devel mysql-devel # Red Hat / CentOS
    

    Then install

    mysqlclient
    via pip:

    
    $ pip install mysqlclient
    
  • This program is using the older version of generatedata.com (old v3)

Fields

FieldField Description
namefield representing first name
surnamefield representing last name
emailfield representing an email address
birthdaywill populate field with date information
phonefield representing a phone number
companywill populate with company names
user_idfield representing a user identification
addresswill populate with home/business address
citywill populate with city names (not in sync with region for country)
regionwill populate with region names (State or Province) (not in sync with city or country)
countrywill populate with country names (not in sync with city or region)
coordinateswill populate with latitude and longitude
cvisarepresents a visa card (credit)
cvisa_cvvrepresents the visa card verification value
cmastercardrepresents a mastercard card (credit)
cmastercard_cvvrepresents the mastercard card verification value
usernamewill populate with usernames (first leter of name plus surname)
passwordwill populate with random password strings
salaryan amount that represents a salary
childrenan integer representing the number of children of a person
pricean amount representing the price of a product
stockan integer representing the product availability
contactfield representing someone's contact
quantityan integer representing some quantity (amount of products sold, for exemple)
datea date field (similar to birthday)
invoicean integer representing an invoice number

About

Python program to auto populate a database with random useful data for working with mysql

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages