alr3
contains a data set called
banknote
, consisting of (physical) measurements on 200
Swiss bank notes, 100 of which are genuine, while the other half is
counterfeit. Load this data set (you might have to install the package)
using the code below. Also run the cryptic third line - it will make
your life a lot easier for the rest of the homework. This turns variable
Y explicitly into a factor variable, i.e. makes it discrete. We will
discuss this in the course material later in more detail.# install.packages("alr3", repos = c("http://R-Forge.R-project.org"), dep = TRUE)
library(alr3) # if this throws an error of the form 'there is no package called alr3', uncomment the line above, run it once, then comment the line out again and run the code chunk again.
data(banknote)
banknote$Y <- factor(banknote$Y)
ggplot2
to draw a barchart of Y (0 is
genuine, 1 is counterfeit). Map Y to the fill color of the
barchart.ggplot2
to draw a histogram of one of
the variables in the dataset that shows a distinction between genuine
and counterfeit banknotes. Use fill color to show this difference.
Choose the binwidth such that there are no gaps in the middle range of
the histogram.ggplot2
to draw a scatterplot of two
(continuous) measurements, color by Y. Try to find a pair of
measurements that allow you to separate perfectly between genuine and
counterfeit banknotes.For the submission: submit your solution in an R Markdown file and (just for insurance) submit the corresponding html/word file with it.