A detailed analysis project based on the top 2000 global companies in 2021. It focuses on countries, profit to sales and assets. The project uses Pandas, Numpy and Matplotlib.
- The dataset is courtesy of Kaggle:
https://www.kaggle.com/shivamb/fortune-global-2000-companies-till-2021/version/1
- The questions answered in this project focus on 3 areas of interest
- Countries - which countries appear most in the list for various metrics such as number of companies on the list, top 20 for average profit per company and more.
- Profit to Sales - which of the top companies have the best profit to sales ratio?
- Assets - which companies have the most assets?
- Defining business questions - (1-business-questions.ipynb)
- Extract the data and check column info, spelling errors, column formats, tidy columns and find and replace errors and any null or NaN values - (2-extract-and-clean-data.ipynb)
- Transforming the data by creating functions to convert string representation to integers and create new columns from existing data - (3.transform-the-data)
- Creating visualizations using matplotlib - (4-visualizations.ipynb)
Pandas Matplotlib Numpy
- Set up Jupyter notebooks (see below)
- Work through the examples step by step.
New to Pandas
- Make sure you learn Python and SQL basics before Pandas (my recommendation)
- This pandas intro course is useful - https://www.youtube.com/watch?v=WcDaZ67TVRo&t=7270s
Pandas docs: https://pandas.pydata.org/pandas-docs/stable/index.html
Matplotlib docs: https://matplotlib.org/stable/tutorials/index.html
Jupyter notebooks and other useful tools can be downloaded with an Anaconda package:
Check out more at the Jupyter notebook docs: