Skip to content

Domain: AI and Synthetic Data; How to create large dataset by our own to practice

Notifications You must be signed in to change notification settings

DataBells/create_synthetic_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataset Overview:

Domain: AI and Synthetic Data;

This dataset features 10,000 records of used car sales from 2015 to 2024, capturing various aspects of the sales process. It includes details such as car make, model, distributor, location, and pricing. The dataset is designed to aid in analyzing trends in used car sales, including price fluctuations, sales patterns, and agent performance. Each record is enriched with attributes like mileage, engine power, and sale status, providing a comprehensive view of the used car market over the decade. This dataset is ideal for data analysis projects, offering insights into the automotive sales industry. Feel free to explore the data and uncover patterns that could inform decision-making in car sales and marketing.

Why Use This Dataset?

  1. Analyzing sales trends and performance over time.
  2. Exploring relationships between car features, pricing, and sales margins.
  3. Developing machine learning models to predict sales status or customer feedback.
  4. Studying the impact of distributor strategies and sales agent performance.

Column Descriptions:

ID: Unique identifier for each record.
Distributor Name: Name of the car distributor.
Location: Location of the distributor’s office.
Car Name: The specific name of the car.
Manufacturer Name: Name of the car’s manufacturer.
Car Type: Type of car (e.g., Sedan, SUV, Hatchback, etc.).
Color: Car’s color.
Gearbox: Type of gearbox (e.g., Manual, Automatic).
Number of Seats: Total number of seats in the car.
Number of Doors: Number of doors in the car.
Energy: Fuel type used by the car (e.g., Petrol, Diesel, Electric).
Manufactured Year: Year the car was manufactured.
Price-$: Listed price of the car in USD.
Mileage-KM: Total kilometers the car has traveled.
Engine Power-HP: Horsepower (HP) of the car’s engine.
Purchased Date: Date the distributor purchased the car.
Car Sale Status: Indicates whether the car was sold to a customer (Sold/Not Sold).
Sold Date: Date the car was sold to a customer.
Purchased Price-$: Purchase price paid by the distributor.
Sold Price-$: Sale price paid by the customer.
Margin-%: Percentage margin earned by the distributor.
Sales Agent Name: Name of the sales agent who closed the deal.
Sales Rating: Rating given to the sales agent by the distributor.
Sales Commission-$: Commission paid to the sales agent by the distributor.
Feedback: Customer feedback on the sales experience.

Check my GitHub & Blog for the code and additional tweaks not included in the prompt.

Conclusion

This approach allows beginners to create realistic datasets for analysis without the hassle of finding data. Having a basic understanding of programming and data analysis will be helpful.

Thank you!

About

Domain: AI and Synthetic Data; How to create large dataset by our own to practice

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages