Behavioral Risk Factor Surveillance System

The Behavioral Risk Factor Surveillance System (BRFSS) is an annual survey provided by the Center for Disease Control (CDC) to assess behavioral and chronic diseases. The center surveys six individual-level behavioral health risk factors associated with the leading causes of premature mortality and morbidity among adults: 1) cigarette smoking, 2) alcohol use, 3) physical activity, 4) diet, 5) hypertension, and 6) safety belt use.

A subset of the data concentrating on Iowa with records for 2022 is given at

url <- "https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/homework/data/brfss-iowa-2022.csv"

The following code reads the data into your R session:

iowa <- read.csv(url)

A codebook describing the survey and a listing of all variables is available at https://www.cdc.gov/brfss/annual_data/2022/zip/codebook22_llcp-v2-508.zip. Download it, and unzip it. Open the file in a browser.

For each of the questions, show the code necessary to get to the answer. Make sure to also write the answer to the question in a sentence.

  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.

  2. Load the dataset into your session and store it in the object iowa.

  3. How many rows does that data set have, how many columns? Which types of variables does the data set have?

  4. Use ggplot2 to draw a scatterplot of height (HEIGHT3) and weight (WEIGHT2), facet by gender (SEXVAR). State your expectation regarding the relationship between the variables, comment on the plot you see.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
iowa %>% ggplot(aes(x = HEIGHT3, y = WEIGHT2)) + geom_point()
## Warning: Removed 221 rows containing missing values (`geom_point()`).

5. Temporarily restrict weight and height to below 2500, then plot the values again. Describe the plot you see.

It turns out, that the following coding scheme is used for HEIGHT3:

HEIGHT3 value Interpretation
200 - 711 Height (ft/inches), i.e. 410 is 4 feet, 10 inches
7777 Don’t know/Not sure
9000 - 9998 Height (meters/centimeters), where the first 9 indicates that the measurement was metric, 9165 is 1 meter 65 cm
9999 Refused
BLANK Not asked or Missing
  1. Use a combination of filter and logical expressions to answer the following questions:
  1. What proportion of the height measurements are taken in metric measurements?
  2. What is the range of (standard) height measurements for men, what is the range of (standard) height measurements for women? Draw histograms of (standard) height, facet by gender. Make sure to choose an appropriate bin width. Comment on the histograms and your choice of bin width.
  3. How many missing values (?is.na) does the variable HEIGHT3 have?
  1. Fact finding - you might have to make use of the codebook to answer the following questions, if you use the codebook, please include this in a comment in your answer. For all answers, include the code you used to get to the answer.
  1. What is the mode of the number of adults (NUMADULT) in a household in Iowa in 2022?
  2. EDUCA is the variable containing the highest grade or year of school completed. Is the percentage of college graduates in Iowa higher or lower than the nation’s average (based on the BRFSS sample)?
  3. Out of the people asked, what percentage of people got their flu shot (FLSHTMY3) in July 2022 or after?

For the submission: submit your solution in an R Markdown file and (just for insurance) submit the corresponding html/word file with it.