We are, again, using the data from the Behavioral Risk Factor Surveillance System (BRFSS). Just as a reminder, the BRFSS surveys six individual-level behavioral health risk factors associated with the leading causes of premature mortality and morbidity among adults: 1) cigarette smoking, 2) alcohol use, 3) physical activity, 4) diet, 5) hypertension, and 6) safety belt use.
A subset of the data concentrating on Iowa with records for 2022 is given at
url <- "https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/homework/data/brfss-iowa-2022.csv"
The following code reads the data into your R session:
iowa <- read.csv(url)
A codebook describing the survey and a listing of all variables is available at https://www.cdc.gov/brfss/annual_data/2022/zip/codebook22_llcp-v2-508.zip. Download it, and unzip it. Open the file in a browser.
For each of the questions, show the code necessary to retrieve the answer. Make sure to also write the answer to the question in a sentence.
iowa
.DRNK3GE5
. Do a frequency breakdown of the variable
DRNK3GE5
(visually). Comment (remember the three
sentences!). Introduce a variable bingedays
into the
iowa
data set that encodes 88 as 0, and 77 and 99 as
NA
. mutate
and
ifelse
might be helpful. Find the following summaries:
a. What is the average number of times respondents admitted to binge drinking in the past 30 days?
b. On how many reports is this average based (exclude missing values)?
Current smoking status is imputed in the variable
X_SMOKER3
(corresponds to variable _SMOKER3
in
the codebook). Make X_SMOKER3
a factor. Relabel levels 1,
2, 3, 4 to Current Smoker
, Current Smoker
,
Former Smoker
and Never Smoked
(yes,
Current Smoker
is repeated on purpose) and level 9 to NA.
Describe the relationship between smoking status and age (use
X_AGE_G
- read up on _AGE_G
in the codebook)
based on an appropriate visualization.
What percentage of the population has never smoked? Calculate
this percentage by age groups (X_AGE_G
) and gender
(SEX1
). Report also on the number of respondents these
percentages are based on (exclude any missing values).
For the submission: submit your solution in an R Markdown file and (just for insurance) submit the corresponding html/word file with it.