Do you ride a bike?

Bike sharing is the idea that you can rent a bike at one station and ride it to another. Users are charged by the amount of time that they take out a bike. The data set contains information about a bike sharing service in Washington DC (here is the data source: Capital Bikeshare, but you should not need any of the information for the exam).

Each row in the dataset consists of one rental/trip. You will be asked questions about this dataset. For all of your answers provide the R code necessary for a complete solution. Write your answers in form of an R markdown script (this script already contains the questions).

Question 1

Load the data at https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/exams/data/bikesharing/bikes.csv into your R session. The dataset contains information for each trip. How many trips were there overall? How many factor variables are in the data, how many others (describe which ones)?

bikes <- read.csv("https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/exams/data/bikesharing/bikes.csv")
# here is where you write your code for question #1

Replace this text by your answer.

Question 2

The variable Duration contains the length of the bike rental in seconds. How long was the longest rental (convert into days)? What other information is in the data on this trip? How many trips lasted more than one day?

# your code goes here

Replace this text by your answer.

Question 3

Start.Station describes the start of each trip. From which station did most trips get started? How many trips? Is that the same station at which most trips ended (End.Station)?

When a bike is not returned, the End.Station is marked as "". How often do bikes not get returned? What is reported for the duration of those trips? Change the value of Duration to NA.

# your code goes here

Replace this text by your answer.

Question 4

Plot barcharts of the number of trips on each day of the week, facet by Subscriber.Type. Make sure that the days of the week (wday) are in the usual order (start with Mondays). Describe any patterns you see in one to two sentences.

Introduce a new variable weekend into the data set that is TRUE for all Saturdays and Sundays and FALSE for all other days of the week. Draw the same plot as before but substitute wday by weekend. Describe the pattern.

# your code goes here

Replace this text by your answer.

Question 5

The first ten minutes of each trip are free. What is the percentage of free trips by Subscriber.Type? Calculate precisely, then visualize.

# your code goes here

Replace this text by your answer.

Question 6

Using methods from the dplyr package come up with summaries of the bike trip data for the two types of subscribers: Over the course of the day (as given by hour), calculate 1. how many trips are done,
2. average length of trips (in minutes), 3. standard deviation of the trip duration,

Draw a plot showing at least four of these variables. Describe patterns in the plot in one to two sentences.

# your code goes here

Replace this text by your answer.

Question 7

Write a function moment (x,k, na.rm=F) to calculate the kth central moment \(m_k\) of sample x as given in \(m_k = 1/n \cdot \sum_i (x_i - \mu)^k\), where \(\mu\) is the sample mean of numeric vector x and \(n\) is its length.

Make sure that the function deals with the parameter na.rm appropriately.

Use your function in the following framework: Construct a dataset for all combinations of Start.Station and End.Station and get the following statistics for the durations of trips between each pair of stations: 1) the number of rentals/trips 2) the average duration of a trip 3) the median duration of the trips 4) the second moment of the durations 5) the third moment of the durations

Skewness \(\gamma\) of a distribution is defined as the ratio of third moment and the second moment raised to the power of 1.5, i.e.  \[ \gamma = \frac{m_3}{m_2^{1.5}}. \] For between station summaries that are based on at least 50 trips, plot the difference in median and mean (on the horizontal axis) and skewness (vertical axis) in a scatterplot. Describe the pattern.

# your code goes here

Replace this text by your answer.