Star Wars

FiveThirtyEight is a website founded by Statistician and writer Nate Silver to publish results from opinion poll analysis, politics, economics, and sports blogging. One of the featured articles discusses popularity of movies in the Star Wars Franchise

This article is based on a survey collected by FiveThirtyEight and publicly available on github. Use the code below to read in the data from the survey:

library(dplyr)
library(ggplot2)
library(readr)
starwars <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/star-wars-survey/StarWars.csv")

# the following lines are necessary to fix the multibyte problem and make proper names
# part of the names:
line1 <- names(starwars)
line2 <- unlist(starwars[1,])
varnames <- paste(line1, line2)
# clean up some of the multibyte characters:
names(starwars) <- enc2native(stringi::stri_trans_general(varnames, "latin-ascii"))

starwars <- starwars[-1,]
head(starwars)
## # A tibble: 6 x 38
##   `RespondentID N… `Have you seen … `Do you conside… `Which of the f…
##              <dbl> <chr>            <chr>            <chr>           
## 1       3292879998 Yes              Yes              Star Wars: Epis…
## 2       3292879538 No               <NA>             <NA>            
## 3       3292765271 Yes              No               Star Wars: Epis…
## 4       3292763116 Yes              Yes              Star Wars: Epis…
## 5       3292731220 Yes              Yes              Star Wars: Epis…
## 6       3292719380 Yes              Yes              Star Wars: Epis…
## # … with 34 more variables: `X5 Star Wars: Episode II Attack of the
## #   Clones` <chr>, `X6 Star Wars: Episode III Revenge of the Sith` <chr>, `X7
## #   Star Wars: Episode IV A New Hope` <chr>, `X8 Star Wars: Episode V The
## #   Empire Strikes Back` <chr>, `X9 Star Wars: Episode VI Return of the
## #   Jedi` <chr>, `Please rank the Star Wars films in order of preference with 1
## #   being your favorite film in the franchise and 6 being your least favorite
## #   film. Star Wars: Episode I The Phantom Menace` <chr>, `X11 Star Wars:
## #   Episode II Attack of the Clones` <chr>, `X12 Star Wars: Episode III Revenge
## #   of the Sith` <chr>, `X13 Star Wars: Episode IV A New Hope` <chr>, `X14 Star
## #   Wars: Episode V The Empire Strikes Back` <chr>, `X15 Star Wars: Episode VI
## #   Return of the Jedi` <chr>, `Please state whether you view the following
## #   characters favorably, unfavorably, or are unfamiliar with him/her. Han
## #   Solo` <chr>, `X17 Luke Skywalker` <chr>, `X18 Princess Leia Organa` <chr>,
## #   `X19 Anakin Skywalker` <chr>, `X20 Obi Wan Kenobi` <chr>, `X21 Emperor
## #   Palpatine` <chr>, `X22 Darth Vader` <chr>, `X23 Lando Calrissian` <chr>,
## #   `X24 Boba Fett` <chr>, `X25 C-3P0` <chr>, `X26 R2 D2` <chr>, `X27 Jar Jar
## #   Binks` <chr>, `X28 Padme Amidala` <chr>, `X29 Yoda` <chr>, `Which character
## #   shot first? Response` <chr>, `Are you familiar with the Expanded Universe?
## #   Response` <chr>, `Do you consider yourself to be a fan of the Expanded
## #   Universe?�� Response` <chr>, `Do you consider yourself to be a fan of the
## #   Star Trek franchise? Response` <chr>, `Gender Response` <chr>, `Age
## #   Response` <chr>, `Household Income Response` <chr>, `Education
## #   Response` <chr>, `Location (Census Region) Response` <chr>
  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.

  2. How many people responded to the survey? How many people have seen at least one of the movies? Use the variable Have you seen any of the 6 films in the Star Wars franchise? Response to answer this question. Only consider responses of participants who have seen at least one of the Star Wars films for the remainder of the homework.
  3. Variables Gender Response and Age Response are two of the demographic variables collected. Use dplyr to provide a frequency break down for each variable. Does the result surprise you? Comment. Reorder the levels in the variable Age Response from youngest to oldest and plot in a barchart.
  4. Variables 10 through 15 answer the question: “Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.” for each of the films. Bring the data set into a long form. Introduce a variable for the star wars episode and the corresponding ranking. Find the average rank for each of the films. Are average ranks different between mens’ and womens’ rankings? On how many responses are the averages based (make sure to not overlook missing values)? Show these numbers together with the averages.
  5. R2 D2 or C-3P0? Which of these two characters is the more popular one? Use responses to variables 25 and 26 to answer this question. Note: first you need to define what you mean by “popularity” based on the available data. Include your definition of popularity in words.
  6. Popularity contest: which of the surveyed characters is the most popular? use the popularity measure you defined in the previous question to evaluate responses for characters 16 through 29. Use an appropriate long form of the data to get to your answer. Visualize the result.

Due date: please refer to the website and Canvas for the due date.

For the submission: submit your solution in an R Markdown file and (just for insurance) submit the corresponding html/word file with it.