Flying etiquette

FiveThirtyEight is a website that was founded by Statistician and writer Nate Silver to publish results from opinion poll analysis, politics, economics, and sports blogging. (now taken over by Disney, and Silver was ousted) One of the featured articles considers flying etiquette. This article is based on data collected by FiveThirtyEight and publicly available on github. Use the code below to read in the data from the survey:

fly <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/flying-etiquette-survey/flying-etiquette.csv")

The following couple of lines of code provide a bit of cleanup of the demographic information by reaordering the levels of the corresponding factor variables. Run this code in your session.

fly$Age <- factor(fly$Age, levels=c("18-29", "30-44", "45-60", "> 60", ""))
fly$Household.Income <- factor(fly$Household.Income, levels = c("$0 - $24,999","$25,000 - $49,999", "$50,000 - $99,999", "$100,000 - $149,999", "150000", ""))
fly$Education <- factor(fly$Education, levels = c("Less than high school degree", "High school degree", "Some college or Associate degree", "Bachelor degree",  "Graduate degree", ""))
  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.

  2. Some people do not travel often by plane. Provide a (visual) breakdown of travel frequency (use variable How.often.do.you.travel.by.plane.). Reorder the levels in the variable by travel frequency from least frequent travel to most frequent. Draw a barchart of travel frequency and comment on it. Exclude all respondents who never fly from the remainder of the analysis. How many records does the data set have now?

  3. In the demographic variables (Education, Age, and Houshold.Income), replace all occurrences of the empty string “” by a missing value NA. How many responses in each variable do not have any missing values? How many responses have no missing values in any of the three variables? (Hint: think of the function is.na)

  4. Run the command below and interpret the output. What potential purpose can you see for the chart? What might be a problem with the chart? Find at least one purpose and one problem.

library(ggplot2)
fly$Education = with(fly, factor(Education, levels = rev(levels(Education))))

ggplot(data = fly, aes(x = 1)) + 
  geom_bar(aes(fill=Education), position="fill") + 
  coord_flip() +
  theme(legend.position="bottom") +
  scale_fill_brewer() + 
  xlab("Ratio") 

  1. Rename the variable In.general..is.itrude.to.bring.a.baby.on.a.plane. to baby.on.plane.. How many levels does the variable baby.on.plane have, and what are these levels? Rename the level labeled “” to “Not answered”. Bring the levels of baby.on.plane in an order from least rude to most rude. Put the level “Not answered” last. Draw a barchart of variable baby.on.plane. Interpret the result. (This question is very similar to question 2, but preps the data for the next question)

  2. Investigate the relationship between gender and the variables Do.you.have.any.children.under.18. and baby.on.plane. How is the attitude towards babies on planes shaped by gender and own children under 18? Find a plot that summarises your findings (use ggplot2).

For the submission: submit your solution in an R Markdown file and (just for insurance) submit the corresponding html/word file with it.