Bike sharing is the idea that you can rent a bike at one station and ride it to another. Users are charged by the amount of time that they take out a bike. The data set contains information about a bike sharing service in Washington DC (here is the data source: Capital Bikeshare, but you should not need any of the information for the exam).
Each row in the dataset consists of one rental/trip. You will be asked questions about this dataset. For all of your answers provide the R code necessary for a complete solution. Write your answers in form of an R markdown script (this script already contains the questions).
Load the data at https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/exams/data/bikesharing/bikes.csv into your R session. The dataset contains information for each trip. How many trips were there overall? How many factor variables are in the data, how many others (describe which ones)?
bikes <- read.csv("https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/exams/data/bikesharing/bikes.csv")
# here is where you write your code for question #1
Replace this text by your answer.
The variable
Duration
contains the length of the bike rental in seconds. How long was the longest rental (convert into days)? What other information is in the data on this trip? How many trips lasted more than one day?
# your code goes here
Replace this text by your answer.
Start.Station
describes the start of each trip. From which station did most trips get started? How many trips? Is that the same station at which most trips ended (End.Station
)?
When a bike is not returned, the
End.Station
is marked as""
. How often do bikes not get returned? What is reported for the duration of those trips? Change the value ofDuration
toNA
.
# your code goes here
Replace this text by your answer.
Plot barcharts of the number of trips on each day of the week, facet by
Subscriber.Type
. Make sure that the days of the week (wday
) are in the usual order (start with Mondays). Describe any patterns you see in one to two sentences.
Introduce a new variable
weekend
into the data set that isTRUE
for all Saturdays and Sundays andFALSE
for all other days of the week. Draw the same plot as before but substitutewday
byweekend
. Describe the pattern.
# your code goes here
Replace this text by your answer.
The first ten minutes of each trip are free. What is the percentage of free trips by
Subscriber.Type
? Calculate precisely, then visualize.
# your code goes here
Replace this text by your answer.
Using methods from the dplyr package come up with summaries of the bike trip data for the two types of subscribers: Over the course of the day (as given by
hour
), calculate 1. how many trips are done,
2. average length of trips (in minutes), 3. standard deviation of the trip duration,
Draw a plot showing at least four of these variables. Describe patterns in the plot in one to two sentences.
# your code goes here
Replace this text by your answer.
Write a function
moment (x,k, na.rm=F)
to calculate the kth central moment \(m_k\) of sample x as given in \(m_k = 1/n \cdot \sum_i (x_i - \mu)^k\), where \(\mu\) is the sample mean of numeric vector x and \(n\) is its length.
Make sure that the function deals with the parameter
na.rm
appropriately.
Use your function in the following framework: Construct a dataset for all combinations of
Start.Station
andEnd.Station
and get the following statistics for the durations of trips between each pair of stations: 1) the number of rentals/trips 2) the average duration of a trip 3) the median duration of the trips 4) the second moment of the durations 5) the third moment of the durations
Skewness \(\gamma\) of a distribution is defined as the ratio of third moment and the second moment raised to the power of 1.5, i.e. \[ \gamma = \frac{m_3}{m_2^{1.5}}. \] For between station summaries that are based on at least 50 trips, plot the difference in median and mean (on the horizontal axis) and skewness (vertical axis) in a scatterplot. Describe the pattern.
# your code goes here
Replace this text by your answer.