For all of the questions below incorporate the necessary R code directly into your answers.
The Behavioral Risk Factor Surveillance System (BRFSS) is an annual survey provided by the Center for Disease Control (CDC) to assess behavioral and chronic diseases. The center surveys six individual-level behavioral health risk factors associated with the leading causes of premature mortality and morbidity among adults: 1) cigarette smoking, 2) alcohol use, 3) physical activity, 4) diet, 5) hypertension, and 6) safety belt use.
A subset of the data concentrating on Iowa with records for 2012 is given at https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/exams/data/iowa-brfss-2012.csv
A codebook describing the survey and a listing of all variables is available at http://www.cdc.gov/brfss/annual_data/2012/pdf/CODEBOOK12_LLCP.pdf. You should be able to answer all of the following questions without the help of the codebook.
iowa <- read.csv("https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/exams/data/iowa-brfss-2012.csv")
# your code goes here
Make the variable a factor variable and change the level names accordingly.
# your code goes here
# your code goes here
HEIGHT3
and WEIGHT2
are reported height
and weight of survey participants. What would you (roughly) expect for a
relationship between the two variables? Draw a scatterplot. Does the
result surprise you? Comment.# your code goes here
Introduce a new variable ‘height’ into the dataset that corresponds to reported height in centimeters [cm] (i.e. for metric measurements you need to subtract 9000, but for ft/inches you need to do a bit more). For your convenience: 1 ft equals 30.48 cm, 1 in equals 2.54 cm. After converting, round to the nearest centimeter. Introduce NAs as appropriate.
Using the ggplot2 package, draw a histogram of the resulting variable facetted by gender. Make sure that the histograms are displayed on top of each other. Get rid of all warning messages. Comment on the result.
# your code goes here
Which category did respondents pick most often?
Introduce a new variable into the data set that is (in case of a valid answer to SEATBELT) TRUE, if a respondent always wears a seatbelt, and FALSE if not. Deal with missing values appropriately. What percentage of women (SEX = 2) always wear a seatbelt compared to men (SEX = 1)? Using ggplot2, draw a plot that corresponds to these percentages.
# your code goes here
Show two plots: using ggplot2, show the relationship between age and percentage of drink driving, in a separate plot show the relationship between exercise and the number of poor health days. In both plots, incorporate information on the number of respondents and gender.
Comment on both plots.
# your code goes here
summary(clean(iowa$MENTHLTH, c(88, 77, 99), c(0, NA, NA))) iowa iowa %>% mutate(
HLTHPLN1 = clean(HLTHPLN1, c(1,2,7,9), c("Yes", "No", NA, NA))
) %>% count(HLTHPLN1)
# your code goes here
m1 <- glm(DRNKDRI2 > 0 ~ AGE, data=iowa, family=binomial(logit))
This creates an object m1 in your R session. Investigate the object,
find out if the object contains an element named ‘aic’. Report its
value, if it does.# your code goes here