Skip to content

New York taxis study. Part of an homework, basic Python libraries with some intermediate implementations.

License

Notifications You must be signed in to change notification settings

Edoardoba/NYC-taxis-study

Repository files navigation

Homework 02

Taxi

Research questions

Exploratory Data Analysis

  1. [RQ1] In what period of the year Taxis are used more? Create a plot that, for each month, shows the average number of trips recorded each day. Due to the differences among New York zones, we want to visualize the same information for each boroughs. Do you notice any difference among them? Provide comments and plausible explanations about what you observe (e.g.: what is the month with the highest daily average?).

  2. [RQ2] What are the time slots with more passengers? Set your own time slots and discover which are those when Taxis drive the highest number of passengers overall New York and repeat the analysis for each borough. Provide the results through a visualization and comment them.

  3. [RQ3] Do the all trips last the same? Let's put our attention on the distribution of trip's duration. Provide a plot for it and comment what you see. Run this analysis for NYC and for each borough (and obviously comment the results!).

  4. [RQ4] What is the most common way of payments? Discover the way payments are executed in each borough and visualize the number of payments for any possible means. Then run the Chi-squared test to see whether the method of payment is correlated to the borough. Then, comment the results.

  5. [RQ5] Does a long distance correlate with the duration of the trip on average? Make a plot that show the dependence between distance and duration of the trip. Then compute the Pearson Coefficient, is it significant? Comment the results you obtain.

Before starting our analysis, we deeply studied the legend and the taxi_information pdf to better understand our dataset. For every analysis we have done we explicitly mention the information, provided in both of the files, concerning with our results. We had to deal with a really large amount of data so we decided to split the tasks and create different jupyter's notebook files to make our evaluation simplier and clearier. But let's drive deep into it. Here is how our homework is divided:

  1. Code_before_starting: This is all the code that covers the first section of the homework. We explain the choices we have made and the reason why we have chosen some methods instead of others.
  2. RQ1-2-3-4-5: These .ipynb files contain all the code relative to the research questions with comments
  3. Final_core.ipynb: All the code related to the first core question. T.student and p values are discussed in here.
  4. Finalmap: All the code related to the second core question. Choropleth map and comments about it.

And different images( We think no description is needed for those )

PArticipant's name: Nagham Almagout,Edoardo Cantagallo, Giulia Maslov

About

New York taxis study. Part of an homework, basic Python libraries with some intermediate implementations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published