We want to use NYC yellow taxi trip dataset in January 2019 and taxi_zone_lookup from this link: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page (Links to an external site.)

We want to analyse each of the variables in this data such as trip duration, passenger counts, fare amount, location of drop off and pick up, store_and_fwd_flag, and Vendor IDs.

For example we want to know which regions have most pickups and drop-offs, When are the peak hours and off-peak hours for taking taxi, When are the peak days for taking taxi, What is the relationship between each variable and trip duration, Which company gets the highest fare amounts,Is there a relationship between zero passenger count and store_and_fwd_flag variable, What are the most frequent trip duration of each Vendor, and analyzing outliers.