dplyr examples: happiness

class: center, middle, inverse, title-slide

.title[
# dplyr examples: happiness
]
.author[
### Heike Hofmann
]

---

# The Happy data from GSS

The General Social Survey (GSS) has been run by NORC every other year since 1972 to keep track of current opinions across the United States.

An excerpt of the GSS data called `happy` is available from the `classdata` package:

```r
remotes::install_github("heike/classdata")
```

```r
library(classdata)
head(happy)
```

```
##   year age         degree       finrela         happy    health       marital
## 1 1972  23       bachelor       average not too happy      good never married
## 2 1972  70 lt high school above average not too happy      fair       married
## 3 1972  48    high school       average  pretty happy excellent       married
## 4 1972  27       bachelor       average not too happy      good       married
## 5 1972  61    high school above average  pretty happy      good       married
## 6 1972  26    high school above average  pretty happy      good never married
##      sex polviews          partyid wtssall wtssnr
## 1 female     <NA>     ind,near dem  0.4446     NA
## 2   male     <NA> not str democrat  0.8893     NA
## 3 female     <NA>      independent  0.8893     NA
## 4 female     <NA> not str democrat  0.8893     NA
## 5 female     <NA>  strong democrat  0.8893     NA
## 6   male     <NA>     ind,near dem  0.4446     NA
```

You can find a codebook with explanations for each of the variables at https://gssdataexplorer.norc.org/

---
class: inverse
# Your Turn

Load the `happy` data from the `classdata` package.

- how many variables, how many observations does the data have? What do the variables mean?

- Plot the variable `happy`. Introduce a new variable `nhappy` that has values 1 for `not too happy`, 2 for `pretty happy`, 3 for `very happy` and `NA` for missing values. There are multiple ways to get to that. Avoid `for` loops.

- Based on the newly introduced numeric scores, what is the average happiness of respondents?

---
class: inverse
# Your turn

- how does average happiness change over the course of a life time? Is this relationship different for men and women? Draw plots.

- are people now happier than ten years ago? How is happiness related to time?

---
class: inverse
# Your Turn

- Are Republicans or Democrats happier? Compare average happiness levels over `partyid`.

- How are financial relations associated with average happiness levels? Is this association different for men and women?

- Find a plot that shows the differences for each one of the summaries.

---
class: inverse
# Your turn: asking questions

- What other variable(s) might be associated with happiness? Plot it.

- Submit your code in Canvas for one point of extra credit.

---

# Helper functions (1)

- `n()` provides the number of rows of a subset:

```r
library(dplyr)
happy %>% group_by(sex) %>% summarise(n = n())
```

```
## # A tibble: 3 × 2
##   sex        n
##   <fct>  <int>
## 1 male   31977
## 2 female 40301
## 3 <NA>     112
```

- `tally()` is a combination of `summarise` and `n`

```r
happy %>% group_by(sex) %>% tally()
```

```
## # A tibble: 3 × 2
##   sex        n
##   <fct>  <int>
## 1 male   31977
## 2 female 40301
## 3 <NA>     112
```

---

# Helper functions (2)

- `count()` is a further shortcut of `group_by` and `tally`:

```r
happy %>% count(sex, degree)
```

```
## # A tibble: 18 × 3
##    sex    degree             n
##    <fct>  <fct>          <int>
##  1 male   lt high school  6181
##  2 male   high school    15611
##  3 male   junior college  1786
##  4 male   bachelor        5286
##  5 male   graduate        3028
##  6 male   <NA>              85
##  7 female lt high school  7960
##  8 female high school    20804
##  9 female junior college  2565
## 10 female bachelor        5952
## 11 female graduate        2916
## 12 female <NA>             104
## 13 <NA>   lt high school    51
## 14 <NA>   high school       31
## 15 <NA>   junior college     4
## 16 <NA>   bachelor          10
## 17 <NA>   graduate           9
## 18 <NA>   <NA>               7
```

- `count()` doesn't introduce any grouping

---

# Grouping and Ungrouping

- `ungroup` removes a grouping structure from a data set

- necessary to make changes to a grouping variable (such as re-ordering or re-labelling)