Skip to content

leomaurodesenv/travel-dataset-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

travel-dataset-generator

Codacy Badge


In this repository, we present the first travel dataset generator of the GitHub.

This dataset serves as a good base for Data Mining learning models, including, but not limited to, supervised learning (e.g. , Classification, Regression) and unsupervised learning (e.g. , Clustering).

This generator produces flight and hotel data. Everything is randomly generated, for example, business users, hotels, travel, etc. See the python/main.py to understand the parameters.


Python

Run dataset generator python/main.py:

python main.py

Probabilities customization:

#-----------------------------------------------------
#- Companies and Users
#-- Types of gender
defGenders = list(str, str, ...)
#-- Ages of users 
defAgesInterval = {'min': int, 'max': int}
#-- Number of flights by user
defFlightsInterval = {'min': int, 'max': int}
#-- Companies
#--- Number of users by company
defCompanies = {
    'ABC': {'usersCount': int},
    'DEC': {'usersCount': int}, ...
}
#-- Number of Places of a Company
defCompaniesPlacesInterval = {'min': int, 'max': int}

#-----------------------------------------------------
#- Flight Agencies
#-- Types of flight
#--- Weight of price by type
defFlightTypes = {
    'economic': {'price': float},
    'premium': {'price': float}, ...
}
#-- Names of agency
defAgenciesName = [str, str, ...]

#-----------------------------------------------------
#- Places
#-- Names of place
defPlacesName = [str, str, ...]
#-- Distances between cities
defDistancesInterval = {'min': float, 'max': float}
#-- Plain velocity - km/hour
defPlaceTravelKmPerHour = float 

#-----------------------------------------------------
#-- Lodges (Accommodation)
#--- Number of lodges by place
defLodgesInterval = {'min': int, 'max': int}
#--- Prices of lodges
defLodgesPrices   = {'min': float, 'max': float}

#-----------------------------------------------------
#- Travels
#-- Number of days of a travel
defTravelsDays = {'min': int, 'max': int}
#-- Flights prices
defTravelsFlightPrices = {'init': float, 'interval': float}
#-- Probabity of a flight with hotel
defTravelWithLodge = float # ranging [0, 1]
#-- Dates of the travels
defTravelDate = {'init': datetime, 'interval':{'min': int, 'max': int}}

Notebook

Step-by-step of the generator:


License

This generator is available for researchers and data scientists under the Creative Commons BY license. In case of publication and/or public use, as well as any dataset derived from it, one should acknowledge its creators by citing us.


Also look ~

About

A tool to generate synthetic dataset of corporate travels

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published