class: center, middle, inverse, title-slide .title[ # dplyr examples: mutate + group_by ] .author[ ### Heike Hofmann ] --- class: inverse, middle # FBI data --- class: inverse, middle # `group_by` and `mutate` --- # Working with `fbi` `fbi` data included in the `classdata` package upgrade the package: `remotes::install_github("heike/classdata")` ```r library(tidyverse) library(classdata) data("fbi", package="classdata") tail(fbi) ``` ``` ## # A tibble: 6 × 8 ## state state_id state_abbr year population type count violent_crime ## <chr> <int> <chr> <int> <int> <chr> <int> <lgl> ## 1 Puerto Rico 43 PR 2020 3159343 robbery 1177 TRUE ## 2 Puerto Rico 43 PR 2020 3159343 aggravat… 3342 TRUE ## 3 Puerto Rico 43 PR 2020 3159343 burglary 2952 FALSE ## 4 Puerto Rico 43 PR 2020 3159343 larceny 8311 FALSE ## 5 Puerto Rico 43 PR 2020 3159343 motor_ve… 1978 FALSE ## 6 Puerto Rico 43 PR 2020 3159343 arson NA FALSE ``` --- class: inverse # Your turn For this your turn use the `fbi` data from the `classdata` package - Use `mutate` to introduce a variable `rate` into the `fbi` data - Use `mutate` to reorder types of crimes by (median) rate. - Plot crime rates by type in side-by-side boxplots. Medians of the boxplots should be ordered. --- # `group_by` and `mutate` Introduce a ranking by rate for each type of crime. ```r fbi <- fbi %>% group_by(type) %>% mutate( rate = count/population*100000, rank = rank(rate) # ranks from lowest rate to highest rate ) fbi %>% filter(rank == 1) %>% select(type, state, year, rate) ``` ``` ## # A tibble: 8 × 4 ## # Groups: type [8] ## type state year rate ## <chr> <chr> <int> <dbl> ## 1 robbery North Dakota 1997 6.40 ## 2 aggravated_assault North Dakota 1983 31.3 ## 3 homicide North Dakota 1994 0.157 ## 4 motor_vehicle_theft Vermont 2016 29.5 ## 5 rape_legacy Puerto Rico 2013 0.723 ## 6 rape_revised Puerto Rico 2013 0.946 ## 7 burglary Puerto Rico 2020 93.4 ## 8 larceny Puerto Rico 2020 263. ``` --- class: inverse # Your turn For this your turn use the `fbi` data from the `classdata` package - Introduce a ranking by rate for states for each type of crime and year. - Focus on the top states. Find a visual that shows how often each state managed to take the top spot since 1961. ![](03_dplyr-examples_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- # Keywords: which function is it? - `mutate`: introduce, replace, reorder, ... - `summarise`: calculate, average, summary, ... - `group_by`: for each, across, ... - `filter`: exclude, only consider, ... --- # Avoiding potential traps - using the $ notation in tidyverse can lead to strange behavior and error messages - don't forget to save statements back into the dataset (`mutate`, `arrange`) or new data objects (`summarise`, `filter`) - when using the pipe `%>%`: what is output from lhs, first parameter on rhs?