class: center, middle, inverse, title-slide .title[ # dplyr examples: happiness ] .author[ ### Heike Hofmann ] --- # The Happy data from GSS The General Social Survey (GSS) has been run by NORC every other year since 1972 to keep track of current opinions across the United States. An excerpt of the GSS data called `happy` is available from the `classdata` package: ```r remotes::install_github("heike/classdata") ``` ```r library(classdata) head(happy) ``` ``` ## year age degree finrela happy health marital ## 1 1972 23 bachelor average not too happy good never married ## 2 1972 70 lt high school above average not too happy fair married ## 3 1972 48 high school average pretty happy excellent married ## 4 1972 27 bachelor average not too happy good married ## 5 1972 61 high school above average pretty happy good married ## 6 1972 26 high school above average pretty happy good never married ## sex polviews partyid wtssall wtssnr ## 1 female <NA> ind,near dem 0.4446 NA ## 2 male <NA> not str democrat 0.8893 NA ## 3 female <NA> independent 0.8893 NA ## 4 female <NA> not str democrat 0.8893 NA ## 5 female <NA> strong democrat 0.8893 NA ## 6 male <NA> ind,near dem 0.4446 NA ``` You can find a codebook with explanations for each of the variables at https://gssdataexplorer.norc.org/ --- class: inverse # Your Turn Load the `happy` data from the `classdata` package. - how many variables, how many observations does the data have? What do the variables mean? - Plot the variable `happy`. Introduce a new variable `nhappy` that has values 1 for `not too happy`, 2 for `pretty happy`, 3 for `very happy` and `NA` for missing values. There are multiple ways to get to that. Avoid `for` loops. - Based on the newly introduced numeric scores, what is the average happiness of respondents? --- class: inverse # Your turn - how does average happiness change over the course of a life time? Is this relationship different for men and women? Draw plots. - are people now happier than ten years ago? How is happiness related to time? --- class: inverse # Your Turn - Are Republicans or Democrats happier? Compare average happiness levels over `partyid`. - How are financial relations associated with average happiness levels? Is this association different for men and women? - Find a plot that shows the differences for each one of the summaries. --- class: inverse # Your turn: asking questions - What other variable(s) might be associated with happiness? Plot it. - Submit your code in Canvas for one point of extra credit. --- # Helper functions (1) - `n()` provides the number of rows of a subset: ```r library(dplyr) happy %>% group_by(sex) %>% summarise(n = n()) ``` ``` ## # A tibble: 3 × 2 ## sex n ## <fct> <int> ## 1 male 31977 ## 2 female 40301 ## 3 <NA> 112 ``` - `tally()` is a combination of `summarise` and `n` ```r happy %>% group_by(sex) %>% tally() ``` ``` ## # A tibble: 3 × 2 ## sex n ## <fct> <int> ## 1 male 31977 ## 2 female 40301 ## 3 <NA> 112 ``` --- # Helper functions (2) - `count()` is a further shortcut of `group_by` and `tally`: ```r happy %>% count(sex, degree) ``` ``` ## # A tibble: 18 × 3 ## sex degree n ## <fct> <fct> <int> ## 1 male lt high school 6181 ## 2 male high school 15611 ## 3 male junior college 1786 ## 4 male bachelor 5286 ## 5 male graduate 3028 ## 6 male <NA> 85 ## 7 female lt high school 7960 ## 8 female high school 20804 ## 9 female junior college 2565 ## 10 female bachelor 5952 ## 11 female graduate 2916 ## 12 female <NA> 104 ## 13 <NA> lt high school 51 ## 14 <NA> high school 31 ## 15 <NA> junior college 4 ## 16 <NA> bachelor 10 ## 17 <NA> graduate 9 ## 18 <NA> <NA> 7 ``` - `count()` doesn't introduce any grouping --- # Grouping and Ungrouping - `ungroup` removes a grouping structure from a data set - necessary to make changes to a grouping variable (such as re-ordering or re-labelling)