Stat 579: Logical variables and Filters

class: center, middle, inverse, title-slide

.title[
# Stat 579: Logical variables and Filters
]
.author[
### Heike Hofmann
]

---

class: inverse, center, middle
# Logical variables, filters, and updating data

---

## Logical vectors

- Vectors consisting of values `TRUE` and `FALSE`

- Usually created with a logical comparison

- `<, >, ==, !=, <=, >=`

- `x %in% c(1, 4, 3, 7)`

- Used in `subset` or `dplyr::filter`

---

## Combining logical expressions

- `&` and `|` are the logical *and* and *or*

- `!` is the logical negation

- use parentheses () when linking expressions to avoid mis-interpretation

---

## Logical Operations

![](images/logical.png)

---
class: inverse
## Your turn

Define vector `a` to be `a <- c(1, 15, 3, 20, 5, 8, 9, 10, 1, 3)`

Find the expression for the logical vector that is TRUE where `a` is:

- less than 20

- squared value is at least 100 or less than 10

- equals 1 or 3

- even <br>
Hint: have a look at the help for the operator `%%`

---

## `filter {dplyr}`

`filter` is a function in the package `dplyr`

`filter(data, ...)` finds subset of `data` where conditions specified by logical expression in `...` are true, e.g.

- `filter(fbi, year == 2014)`
  - `filter(fbi, type == "larceny", state %in% c("Iowa", "Minnesota"))`

multiple expressions are implicitly combined by a logical and `&`

Note that the command `subset` works very similarly.

Caution! there is another function called `filter` in the `stats` package. Use `::` to make sure you use the right one: `dplyr::filter`

---
class: inverse
## Your turn

Use the `fbi` data from the `classdata` package

- Get a subset of all crimes in Iowa, Plot incidences/rates for one type of crime over time.

- Get a subset of all crimes in 2009. Plot the number or rate for one type of crime by state.

- Get a subset of the data that includes number of homicides for the last five years. Find the rate of homicides, extract all states that have a rate of greater than 90% of the rates across all states, and plot (Hint: `?quantile`).

Extra credit (1 point): submit your code (regardless of whether it works or not) in Canvas (yourturn-checkin-1).

---
class: inverse
## Your turn - Information Retrieval

Use the `fbi` data object to answer the following questions:

- how many reports of Burglaries are there for California?

- for any of the violent crimes, which state had the highest rate (and for which crime) in 2014? in 2020?

Use the `fbiwide` data object to answer the following question:

- in how many states were fewer cars stolen than robberies made in 2014? (which states are those?)

---

## Information extraction: useful commands

Number of records in a data set:

```
nrow(dataset)  
```

Quantiles:

```
quantile(variable, probs=0.001, na.rm=T)  
```

Find all indices for which an expression is TRUE:
```
which(logical variable)  
```

Retrieve index of maximum/minimum value:
```
which.max(variable)
which.min(variable) 
```

---

# Slicing a data set

- the function `dplyr::slice` is a variant of `filter` that returns a subset of a dataset by position

```
slice(fbi, 1:6) # same as head(fbi)
```
- `slice_min` and `slice_max` return the record of a dataset with the smallest or largest value in a variable

```
slice_max(fbiwide, burglary)
```

---

## Updating elements in a vector

You can take a subset and update the original data:

```r
a <- 1:4
a
```

```
## [1] 1 2 3 4
```

```r
a[2:3] <- 0
a
```

```
## [1] 1 0 0 4
```

```r
replace(a, a == 0, -1)
```

```
## [1]  1 -1 -1  4
```

```r
ifelse(a==0, -1, a)
```

```
## [1]  1 -1 -1  4
```

Very useful in combination with logical subsetting

The command `case_when` allows us to combine multiple `ifelse` statements into one

---

## Updating elements in a data set

data sets and their parts can be used as right hand side of an assignment:

```r
library(classdata)

# introduces new variable in the data set fbiwide
fbiwide$homicide.rate <- fbiwide$homicide/fbiwide$population*100000

names(fbiwide)
```

```
##  [1] "state"               "state_id"            "state_abbr"         
##  [4] "year"                "population"          "violent_crime"      
##  [7] "homicide"            "rape_legacy"         "rape_revised"       
## [10] "robbery"             "aggravated_assault"  "property_crime"     
## [13] "burglary"            "larceny"             "motor_vehicle_theft"
## [16] "arson"               "homicide.rate"
```

if that variable exists before, it is being over-written/updated

---

## `mutate {dplyr}`

`mutate` is a function from the `dplyr` package

It allows us to introduce/upate variables in a dataset

```r
fbiwide <- fbiwide %>% 
  mutate(
    homicide.rate = homicide/population*100000
  )
```

Allows us to focus on the WHAT, rather than the HOW.

---
class: inverse
## Your turn

Use the fbi data from the package classdata

- introduce a variable `personal` into the dataset that is `TRUE` for personal crimes, and `FALSE` for property crimes. Do not use the variable `Violent.crime`.
- now introduce a variable `class` into the dataset that has two levels: `personal` and `property` classifying the types of crimes reported. Think of `ifelse` or `replace`.

---
class: inverse
## Your turn

1. Load the 2022 BRFSS data for Iowa from https://raw.githubusercontent.com/Stat579-at-ISU/stat579-at-isu.github.io/master/homework/data/brfss-iowa-2022.csv

2. How many observations, how many variables does the data set have?

3. Draw a barchart of the variable `SLEPTIM1`

4. Read up on the variable encoding in the [codebook](https://www.cdc.gov/brfss/annual_data/2022/zip/codebook22_llcp-v2-508.zip)

5. Some of the values should be encoded as `NA`. Use `mutate` to fix those values. Then re-draw the barchart. Include gender (`SEXVAR`) into the chart.