In this repository, we present the first travel dataset generator of the GitHub.
This dataset serves as a good base for Data Mining learning models, including, but not limited to, supervised learning (e.g. , Classification, Regression) and unsupervised learning (e.g. , Clustering).
This generator produces flight and hotel data. Everything is randomly generated, for example, business users, hotels, travel, etc. See the python/main.py
to understand the parameters.
Run dataset generator python/main.py
:
python main.py
Probabilities customization:
#-----------------------------------------------------
#- Companies and Users
#-- Types of gender
defGenders = list(str, str, ...)
#-- Ages of users
defAgesInterval = {'min': int, 'max': int}
#-- Number of flights by user
defFlightsInterval = {'min': int, 'max': int}
#-- Companies
#--- Number of users by company
defCompanies = {
'ABC': {'usersCount': int},
'DEC': {'usersCount': int}, ...
}
#-- Number of Places of a Company
defCompaniesPlacesInterval = {'min': int, 'max': int}
#-----------------------------------------------------
#- Flight Agencies
#-- Types of flight
#--- Weight of price by type
defFlightTypes = {
'economic': {'price': float},
'premium': {'price': float}, ...
}
#-- Names of agency
defAgenciesName = [str, str, ...]
#-----------------------------------------------------
#- Places
#-- Names of place
defPlacesName = [str, str, ...]
#-- Distances between cities
defDistancesInterval = {'min': float, 'max': float}
#-- Plain velocity - km/hour
defPlaceTravelKmPerHour = float
#-----------------------------------------------------
#-- Lodges (Accommodation)
#--- Number of lodges by place
defLodgesInterval = {'min': int, 'max': int}
#--- Prices of lodges
defLodgesPrices = {'min': float, 'max': float}
#-----------------------------------------------------
#- Travels
#-- Number of days of a travel
defTravelsDays = {'min': int, 'max': int}
#-- Flights prices
defTravelsFlightPrices = {'init': float, 'interval': float}
#-- Probabity of a flight with hotel
defTravelWithLodge = float # ranging [0, 1]
#-- Dates of the travels
defTravelDate = {'init': datetime, 'interval':{'min': int, 'max': int}}
Step-by-step of the generator:
This generator is available for researchers and data scientists under the Creative Commons BY license. In case of publication and/or public use, as well as any dataset derived from it, one should acknowledge its creators by citing us.
- Created by Argo Solutions - http://useargo.com