class: center, middle, inverse, title-slide .title[ # Stat 579: Logical variables and Filters ] .author[ ### Heike Hofmann ] --- class: inverse, center, middle # Logical variables, filters, and updating data --- ## Logical vectors - Vectors consisting of values `TRUE` and `FALSE` - Usually created with a logical comparison - `<, >, ==, !=, <=, >=` - `x %in% c(1, 4, 3, 7)` - Used in `subset` or `dplyr::filter` --- ## Combining logical expressions - `&` and `|` are the logical *and* and *or* - `!` is the logical negation - use parentheses () when linking expressions to avoid mis-interpretation --- ## Logical Operations ![](images/logical.png) --- class: inverse ## Your turn Define vector `a` to be `a <- c(1, 15, 3, 20, 5, 8, 9, 10, 1, 3)` Find the expression for the logical vector that is TRUE where `a` is: - less than 20 - squared value is at least 100 or less than 10 - equals 1 or 3 - even <br> Hint: have a look at the help for the operator `%%` --- ## `filter {dplyr}` `filter` is a function in the package `dplyr` `filter(data, ...)` finds subset of `data` where conditions specified by logical expression in `...` are true, e.g. - `filter(fbi, year == 2014)` - `filter(fbi, type == "larceny", state %in% c("Iowa", "Minnesota"))` multiple expressions are implicitly combined by a logical and `&` Note that the command `subset` works very similarly. Caution! there is another function called `filter` in the `stats` package. Use `::` to make sure you use the right one: `dplyr::filter` --- class: inverse ## Your turn Use the `fbi` data from the `classdata` package - Get a subset of all crimes in Iowa, Plot incidences/rates for one type of crime over time. - Get a subset of all crimes in 2009. Plot the number or rate for one type of crime by state. - Get a subset of the data that includes number of homicides for the last five years. Find the rate of homicides, extract all states that have a rate of greater than 90% of the rates across all states, and plot (Hint: `?quantile`). Extra credit (1 point): submit your code (regardless of whether it works or not) in Canvas (yourturn-checkin-1). --- class: inverse ## Your turn - Information Retrieval Use the `fbi` data object to answer the following questions: - how many reports of Burglaries are there for California? - for any of the violent crimes, which state had the highest rate (and for which crime) in 2014? in 2020? Use the `fbiwide` data object to answer the following question: - in how many states were fewer cars stolen than robberies made in 2014? (which states are those?) --- ## Information extraction: useful commands Number of records in a data set: ``` nrow(dataset) ``` Quantiles: ``` quantile(variable, probs=0.001, na.rm=T) ``` Find all indices for which an expression is TRUE: ``` which(logical variable) ``` Retrieve index of maximum/minimum value: ``` which.max(variable) which.min(variable) ``` --- # Slicing a data set - the function `dplyr::slice` is a variant of `filter` that returns a subset of a dataset by position ``` slice(fbi, 1:6) # same as head(fbi) ``` - `slice_min` and `slice_max` return the record of a dataset with the smallest or largest value in a variable ``` slice_max(fbiwide, burglary) ``` --- ## Updating elements in a vector You can take a subset and update the original data: ```r a <- 1:4 a ``` ``` ## [1] 1 2 3 4 ``` ```r a[2:3] <- 0 a ``` ``` ## [1] 1 0 0 4 ``` ```r replace(a, a == 0, -1) ``` ``` ## [1] 1 -1 -1 4 ``` ```r ifelse(a==0, -1, a) ``` ``` ## [1] 1 -1 -1 4 ``` Very useful in combination with logical subsetting The command `case_when` allows us to combine multiple `ifelse` statements into one --- ## Updating elements in a data set data sets and their parts can be used as right hand side of an assignment: ```r library(classdata) # introduces new variable in the data set fbiwide fbiwide$homicide.rate <- fbiwide$homicide/fbiwide$population*100000 names(fbiwide) ``` ``` ## [1] "state" "state_id" "state_abbr" ## [4] "year" "population" "violent_crime" ## [7] "homicide" "rape_legacy" "rape_revised" ## [10] "robbery" "aggravated_assault" "property_crime" ## [13] "burglary" "larceny" "motor_vehicle_theft" ## [16] "arson" "homicide.rate" ``` if that variable exists before, it is being over-written/updated --- ## `mutate {dplyr}` `mutate` is a function from the `dplyr` package It allows us to introduce/upate variables in a dataset ```r fbiwide <- fbiwide %>% mutate( homicide.rate = homicide/population*100000 ) ``` Allows us to focus on the WHAT, rather than the HOW. --- class: inverse ## Your turn Use the fbi data from the package classdata - introduce a variable `personal` into the dataset that is `TRUE` for personal crimes, and `FALSE` for property crimes. Do not use the variable `Violent.crime`. - now introduce a variable `class` into the dataset that has two levels: `personal` and `property` classifying the types of crimes reported. Think of `ifelse` or `replace`. --- class: inverse ## Your turn 1. Load the 2022 BRFSS data for Iowa from https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/homework/data/brfss-iowa-2022.csv 2. How many observations, how many variables does the data set have? 3. Draw a barchart of the variable `SLEPTIM1` 4. Read up on the variable encoding in the [codebook](https://www.cdc.gov/brfss/annual_data/2022/zip/codebook22_llcp-v2-508.zip) 5. Some of the values should be encoded as `NA`. Use `mutate` to fix those values. Then re-draw the barchart. Include gender (`SEXVAR`) into the chart.