Another look at the Behavioral Risk Factor Surveillance System

We are, again, using the data from the Behavioral Risk Factor Surveillance System (BRFSS). Just as a reminder, the BRFSS surveys six individual-level behavioral health risk factors associated with the leading causes of premature mortality and morbidity among adults: 1) cigarette smoking, 2) alcohol use, 3) physical activity, 4) diet, 5) hypertension, and 6) safety belt use.

A subset of the data concentrating on Iowa with records for 2022 is given at

url <- "https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/homework/data/brfss-iowa-2022.csv"

The following code reads the data into your R session:

iowa <- read.csv(url)

A codebook describing the survey and a listing of all variables is available at https://www.cdc.gov/brfss/annual_data/2022/zip/codebook22_llcp-v2-508.zip. Download it, and unzip it. Open the file in a browser.

For each of the questions, show the code necessary to retrieve the answer. Make sure to also write the answer to the question in a sentence.

  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.
  2. Load the dataset into your session and store it in the object iowa.
  3. Check the codebook for an explanation of the variable DRNK3GE5. Do a frequency breakdown of the variable DRNK3GE5 (visually). Comment (remember the three sentences!). Introduce a variable bingedays into the iowa data set that encodes 88 as 0, and 77 and 99 as NA.
    Hint: a combination of mutate and ifelse might be helpful.

Find the following summaries:

a. What is the average number of times respondents admitted to binge drinking in the past 30 days?
b. On how many reports is this average based (exclude missing values)?
  1. Current smoking status is imputed in the variable X_SMOKER3 (corresponds to variable _SMOKER3 in the codebook). Make X_SMOKER3 a factor. Relabel levels 1, 2, 3, 4 to Current Smoker, Current Smoker, Former Smoker and Never Smoked (yes, Current Smoker is repeated on purpose) and level 9 to NA. Describe the relationship between smoking status and age (use X_AGE_G - read up on _AGE_G in the codebook) based on an appropriate visualization.

  2. What percentage of the population has never smoked? Calculate this percentage by age groups (X_AGE_G) and gender (SEX1). Report also on the number of respondents these percentages are based on (exclude any missing values).

For the submission: submit your solution in an R Markdown file and (just for insurance) submit the corresponding html/word file with it.