FiveThirtyEight is a website founded by Statistician and writer Nate Silver to publish results from opinion poll analysis, politics, economics, and sports blogging. One of the featured articles discusses popularity of movies in the Star Wars Franchise
This article is based on a survey collected by FiveThirtyEight and publicly available on github. Use the code below to read in the data from the survey:
library(dplyr)
library(ggplot2)
library(readr)
starwars <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/star-wars-survey/StarWars.csv", locale = readr::locale(encoding = "latin1"))
# the following lines are necessary to fix the multibyte problem and make proper names
# part of the names:
line1 <- names(starwars)
line1 <- gsub("...[0-9]+", "", line1) # get rid of all the created names
line2 <- unlist(starwars[1,])
varnames <- paste(line1, line2)
# clean up some of the multibyte characters:
#names(starwars) <- enc2native(stringi::stri_trans_general(varnames, "latin-ascii"))
starwars <- starwars[-1,]
head(starwars)
## # A tibble: 6 × 38
## RespondentID Have you seen any of the 6 films in the …¹ Do you consider your…²
## <dbl> <chr> <chr>
## 1 3292879998 Yes Yes
## 2 3292879538 No <NA>
## 3 3292765271 Yes No
## 4 3292763116 Yes Yes
## 5 3292731220 Yes Yes
## 6 3292719380 Yes Yes
## # ℹ abbreviated names:
## # ¹`Have you seen any of the 6 films in the Star Wars franchise?`,
## # ²`Do you consider yourself to be a fan of the Star Wars film franchise?`
## # ℹ 35 more variables:
## # `Which of the following Star Wars films have you seen? Please select all that apply.` <chr>,
## # ...5 <chr>, ...6 <chr>, ...7 <chr>, ...8 <chr>, ...9 <chr>,
## # `Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.` <chr>, …
Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.
How many people responded to the survey? How many people have
seen at least one of the movies? Use the variable
Have you seen any of t films in the Star Wars franchise? Response
to answer this question. Only consider responses of participants who
have seen at least one of the Star Wars films for the remainder of the
homework.
Variables Gender
and Age
are two of the
demographic variables collected. Use dplyr
to provide a
frequency break down for each variable. Does the result surprise you?
Comment. Reorder the levels in the variable Age
from
youngest to oldest.
Variables 10 through 15 answer the question: “Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.” for each of the films. Bring the data set into a long form. Introduce a variable for the star wars episode and the corresponding ranking. Find the average rank for each of the films. Are average ranks different between mens’ and womens’ rankings? On how many responses are the averages based? Show these numbers together with the averages.
R2 D2 or C-3P0? Which of these two characters is the more popular
one? Use responses to variables 25 and 26 to answer this question. Note:
first you need to define what you mean by “popularity” based on the
available data.
Popularity contest: which of the surveyed characters is the most popular? use the popularity measure you defined in the previous question to evaluate responses for characters 16 through 29. Use an appropriate long form of the data to get to your answer. Visualize the result.
Due date: please refer to the website and Canvas for the due date.
For the submission: submit your solution in an R Markdown file and (just for insurance) submit the corresponding html/word file with it.