class: center, middle, inverse, title-slide # Visualisation
and other cool things with R ### Antoine Bichat ### Mines Nancy
December 15, 2017 --- class: center, middle, inverse #Good practices --- ##Writing code * Use the arrow for assignation and equal sign for argument setting * Leave space after commas, and around arrows and equal signs * Use line breaks -- ```r x <- rnorm(n = 100, mean = 3, sd = 0.1) hist(x, col = "grey", border = "black", breaks = 30, main = "Normal distribution", xlab = "", ylab = "Frequency") ``` <img src="index_files/figure-html/writing-1.svg" style="display: block; margin: auto;" /> --- ## Pipe operator %>% ```r # install.packages("magrittr") library(magrittr) ``` * To express clearly a sequence of multiple operations * `x %>% f` is equivalent to `f(x)` * `x %>% f(y)` is equivalent to `f(x, y)` -- ```r round(mean(sqrt(1:9), na.rm = TRUE), digits = 2) ``` ``` ## [1] 2.15 ``` -- ```r 1:9 %>% sqrt %>% mean(na.rm = TRUE) %>% round(digits = 2) ``` ``` ## [1] 2.15 ``` --- class: center, middle <blockquote class="twitter-tweet" data-lang="fr" align="center"><p lang="en" dir="ltr">This is a reason for pipes. <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> <a href="https://t.co/SFKVN2GpZZ">pic.twitter.com/SFKVN2GpZZ</a></p>— Tajti AndrĂ¡s (@atajti) <a href="https://twitter.com/atajti/status/889112630673235968?ref_src=twsrc%5Etfw">23 juillet 2017</a></blockquote> <br> <blockquote class="twitter-tweet" data-lang="fr" align="center"><p lang="en" dir="ltr">And once again, Stackoverflow reminds us why we REALLY need the pipe. <a href="https://twitter.com/hashtag/RStats?src=hash&ref_src=twsrc%5Etfw">#RStats</a> <a href="https://twitter.com/hashtag/tidyverse?src=hash&ref_src=twsrc%5Etfw">#tidyverse</a> <a href="https://t.co/diXgQiVMhO">pic.twitter.com/diXgQiVMhO</a></p>— Colin Fay (@_ColinFay) <a href="https://twitter.com/_ColinFay/status/889901747480793090?ref_src=twsrc%5Etfw">25 juillet 2017</a></blockquote> --- <img src="so-logo.png" alt="Drawing" style="width: 410px;"/> .center[ <img src="so-meme.png" alt="Drawing" style="width: 380px;"/> ] --- class: middle, center, inverse #Tips with R and RStudio --- ### Keybord shortcuts * Ctrl+L: clears console * Ctrl+Enter: runs selection * Ctrl+Shift+C: (un)comments selection * Ctrl+D: deletes selection * Alt+- : inserts arrow with spaces .footnote[ [*] It's Command on Macs. ] -- ### Code shortcuts * `rm(list = ls())` removes all objects from the current workspace * `graphics.off()` shuts down all open graphics devices * `cat("\014")` clears console --- ## Not recommended code tips ```r sqrt(x <- 4) ; x ; 3 -> y ; y ``` ``` ## [1] 2 ``` ``` ## [1] 4 ``` ``` ## [1] 3 ``` --- ## Useful code tips ```r (x <- LETTERS[1:4]) ``` ``` ## [1] "A" "B" "C" "D" ``` ```r x %<>% tolower x ``` ``` ## [1] "a" "b" "c" "d" ``` ```r dput(x) ``` ``` ## c("a", "b", "c", "d") ``` --- ## Time and {beepr} ```r T1 <- Sys.time() m4 <- rnorm(n = 100000)^4 T2 <- Sys.time() difftime(T2, T1) ``` ``` ## Time difference of 0.05402994 secs ``` ```r mean(m4) ``` ``` ## [1] 2.982003 ``` ```r # install.packages("beepr") library(beepr) beep() ``` --- ##TeleR .center[ <img src="teleR1.png" alt="Drawing" style="height: 490px;"/> <img src="teleR2.png" alt="Drawing" style="height: 490px;"/> ] --- class: middle, center, inverse #Visualisation ##With [**ggplot2**](http://ggplot2.org/) --- ## Iris dataset
--- ##Scatterplot ```r # install.packages("ggplot2") library(ggplot2) ggplot(iris) + geom_point(aes(x = Sepal.Length, y = Petal.Length)) ``` <img src="index_files/figure-html/iris2-1.svg" style="display: block; margin: auto;" /> --- ##Scatterplot with legend ```r ggplot(iris) + geom_point(aes(x = Sepal.Length, y = Petal.Length, color = Species, shape = Species)) + labs(x = "Sepal length", y = "Petal Length") + theme_bw() ``` <img src="index_files/figure-html/iris3-1.svg" style="display: block; margin: auto;" /> --- ## Histogram ```r ggplot(iris) + geom_histogram(aes(x = Sepal.Length), fill = "grey50", color = "black") + labs(x = "Sepal length", y = "Count") + theme_bw() ``` <img src="index_files/figure-html/iris4-1.svg" style="display: block; margin: auto;" /> --- ## Histogram with facetting ```r ggplot(iris) + geom_histogram(aes(x = Sepal.Length, fill = Species), color = "black") + facet_grid(.~Species) + labs(x = "Sepal length", y = "Count") + theme_bw() ``` <img src="index_files/figure-html/iris5-1.svg" style="display: block; margin: auto;" /> --- ## Boxplot ```r p <- ggplot(iris) + geom_boxplot(aes(x = Species, y = Sepal.Length, fill = Species), alpha = 0.7) + scale_fill_manual(values = c("firebrick", "forestgreen", "royalblue")) + labs(x = "Species", y = "Sepal length") p + theme_bw() ``` <img src="index_files/figure-html/iris6-1.svg" style="display: block; margin: auto;" /> --- ##Theory of data visualisation ggplot stands for __Grammar of Graphics__ <br> ``` ggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>), stat = <STAT>, position = <POSITION>) + <COORDINATE_FUNCTION> + <FACET_FUNCTION> ``` <br> You can uniquely describe any plot as a combination of these 7 parameters. --- <br> <br> <blockquote class="twitter-tweet" data-lang="fr" align="center"><p lang="en" dir="ltr">Then comes the inevitable "rotate ggplot2 axes" google, which brings me to this life-saving <a href="https://twitter.com/StackOverflow?ref_src=twsrc%5Etfw">@StackOverflow</a> post: <a href="https://t.co/ZuZH52wsYZ">https://t.co/ZuZH52wsYZ</a></p>— Tanya Cashorali (@tanyacash21) <a href="https://twitter.com/tanyacash21/status/910241426453270528?ref_src=twsrc%5Etfw">19 septembre 2017</a></blockquote> --- class: center, middle, inverse # Data manipulation ##with the [**tidyverse**](https://www.tidyverse.org/) --- ##Tidyverse ? ```r # install.packages("tidyverse") library(tidyverse) ``` Tidyverse is a collection of packages designed for data science <br> We will use {ggplot2}, {dplyr} and {tidyr} <br> It imports the `%>%` pipe from {magrittr} but not the `%<>%` pipe <br> There are also the {purrr}, {readr} and {tibble} packages --- ## Diamonds dataset ```r data(diamonds) # {ggplot2} ``` <table> <thead> <tr> <th style="text-align:right;"> carat </th> <th style="text-align:left;"> cut </th> <th style="text-align:left;"> color </th> <th style="text-align:left;"> clarity </th> <th style="text-align:right;"> depth </th> <th style="text-align:right;"> table </th> <th style="text-align:right;"> price </th> <th style="text-align:right;"> x </th> <th style="text-align:right;"> y </th> <th style="text-align:right;"> z </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.23 </td> <td style="text-align:left;"> Ideal </td> <td style="text-align:left;"> E </td> <td style="text-align:left;"> SI2 </td> <td style="text-align:right;"> 61.5 </td> <td style="text-align:right;"> 55 </td> <td style="text-align:right;"> 326 </td> <td style="text-align:right;"> 3.95 </td> <td style="text-align:right;"> 3.98 </td> <td style="text-align:right;"> 2.43 </td> </tr> <tr> <td style="text-align:right;"> 0.21 </td> <td style="text-align:left;"> Premium </td> <td style="text-align:left;"> E </td> <td style="text-align:left;"> SI1 </td> <td style="text-align:right;"> 59.8 </td> <td style="text-align:right;"> 61 </td> <td style="text-align:right;"> 326 </td> <td style="text-align:right;"> 3.89 </td> <td style="text-align:right;"> 3.84 </td> <td style="text-align:right;"> 2.31 </td> </tr> <tr> <td style="text-align:right;"> 0.23 </td> <td style="text-align:left;"> Good </td> <td style="text-align:left;"> E </td> <td style="text-align:left;"> VS1 </td> <td style="text-align:right;"> 56.9 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:right;"> 327 </td> <td style="text-align:right;"> 4.05 </td> <td style="text-align:right;"> 4.07 </td> <td style="text-align:right;"> 2.31 </td> </tr> <tr> <td style="text-align:right;"> 0.29 </td> <td style="text-align:left;"> Premium </td> <td style="text-align:left;"> I </td> <td style="text-align:left;"> VS2 </td> <td style="text-align:right;"> 62.4 </td> <td style="text-align:right;"> 58 </td> <td style="text-align:right;"> 334 </td> <td style="text-align:right;"> 4.20 </td> <td style="text-align:right;"> 4.23 </td> <td style="text-align:right;"> 2.63 </td> </tr> <tr> <td style="text-align:right;"> 0.31 </td> <td style="text-align:left;"> Good </td> <td style="text-align:left;"> J </td> <td style="text-align:left;"> SI2 </td> <td style="text-align:right;"> 63.3 </td> <td style="text-align:right;"> 58 </td> <td style="text-align:right;"> 335 </td> <td style="text-align:right;"> 4.34 </td> <td style="text-align:right;"> 4.35 </td> <td style="text-align:right;"> 2.75 </td> </tr> <tr> <td style="text-align:right;"> 0.24 </td> <td style="text-align:left;"> Very Good </td> <td style="text-align:left;"> J </td> <td style="text-align:left;"> VVS2 </td> <td style="text-align:right;"> 62.8 </td> <td style="text-align:right;"> 57 </td> <td style="text-align:right;"> 336 </td> <td style="text-align:right;"> 3.94 </td> <td style="text-align:right;"> 3.96 </td> <td style="text-align:right;"> 2.48 </td> </tr> <tr> <td style="text-align:right;"> 0.24 </td> <td style="text-align:left;"> Very Good </td> <td style="text-align:left;"> I </td> <td style="text-align:left;"> VVS1 </td> <td style="text-align:right;"> 62.3 </td> <td style="text-align:right;"> 57 </td> <td style="text-align:right;"> 336 </td> <td style="text-align:right;"> 3.95 </td> <td style="text-align:right;"> 3.98 </td> <td style="text-align:right;"> 2.47 </td> </tr> <tr> <td style="text-align:right;"> 0.26 </td> <td style="text-align:left;"> Very Good </td> <td style="text-align:left;"> H </td> <td style="text-align:left;"> SI1 </td> <td style="text-align:right;"> 61.9 </td> <td style="text-align:right;"> 55 </td> <td style="text-align:right;"> 337 </td> <td style="text-align:right;"> 4.07 </td> <td style="text-align:right;"> 4.11 </td> <td style="text-align:right;"> 2.53 </td> </tr> <tr> <td style="text-align:right;"> 0.22 </td> <td style="text-align:left;"> Fair </td> <td style="text-align:left;"> E </td> <td style="text-align:left;"> VS2 </td> <td style="text-align:right;"> 65.1 </td> <td style="text-align:right;"> 61 </td> <td style="text-align:right;"> 337 </td> <td style="text-align:right;"> 3.87 </td> <td style="text-align:right;"> 3.78 </td> <td style="text-align:right;"> 2.49 </td> </tr> <tr> <td style="text-align:right;"> 0.23 </td> <td style="text-align:left;"> Very Good </td> <td style="text-align:left;"> H </td> <td style="text-align:left;"> VS1 </td> <td style="text-align:right;"> 59.4 </td> <td style="text-align:right;"> 61 </td> <td style="text-align:right;"> 338 </td> <td style="text-align:right;"> 4.00 </td> <td style="text-align:right;"> 4.05 </td> <td style="text-align:right;"> 2.39 </td> </tr> </tbody> </table> --- ## group_by() and summarise() ```r diamonds %>% group_by(cut) %>% summarise(Mean = mean(carat), Max = max(carat)) %>% knitr::kable(format = 'html') ``` <table> <thead> <tr> <th style="text-align:left;"> cut </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Max </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Fair </td> <td style="text-align:right;"> 1.0461366 </td> <td style="text-align:right;"> 5.01 </td> </tr> <tr> <td style="text-align:left;"> Good </td> <td style="text-align:right;"> 0.8491847 </td> <td style="text-align:right;"> 3.01 </td> </tr> <tr> <td style="text-align:left;"> Very Good </td> <td style="text-align:right;"> 0.8063814 </td> <td style="text-align:right;"> 4.00 </td> </tr> <tr> <td style="text-align:left;"> Premium </td> <td style="text-align:right;"> 0.8919549 </td> <td style="text-align:right;"> 4.01 </td> </tr> <tr> <td style="text-align:left;"> Ideal </td> <td style="text-align:right;"> 0.7028370 </td> <td style="text-align:right;"> 3.50 </td> </tr> </tbody> </table> --- ## filter(), mutate(), arrange(), and select() ```r diamonds %>% filter(x>0 & y>0 & z>0) %>% mutate(vol = x*y*z) %>% arrange(vol) %>% select(carat, cut, color, x, y, z, vol) %>% head %>% knitr::kable(format = 'html') ``` <table> <thead> <tr> <th style="text-align:right;"> carat </th> <th style="text-align:left;"> cut </th> <th style="text-align:left;"> color </th> <th style="text-align:right;"> x </th> <th style="text-align:right;"> y </th> <th style="text-align:right;"> z </th> <th style="text-align:right;"> vol </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> Premium </td> <td style="text-align:left;"> D </td> <td style="text-align:right;"> 3.73 </td> <td style="text-align:right;"> 3.68 </td> <td style="text-align:right;"> 2.31 </td> <td style="text-align:right;"> 31.70798 </td> </tr> <tr> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> Premium </td> <td style="text-align:left;"> F </td> <td style="text-align:right;"> 3.73 </td> <td style="text-align:right;"> 3.71 </td> <td style="text-align:right;"> 2.33 </td> <td style="text-align:right;"> 32.24324 </td> </tr> <tr> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> Premium </td> <td style="text-align:left;"> E </td> <td style="text-align:right;"> 3.81 </td> <td style="text-align:right;"> 3.78 </td> <td style="text-align:right;"> 2.24 </td> <td style="text-align:right;"> 32.26003 </td> </tr> <tr> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> Premium </td> <td style="text-align:left;"> E </td> <td style="text-align:right;"> 3.79 </td> <td style="text-align:right;"> 3.75 </td> <td style="text-align:right;"> 2.27 </td> <td style="text-align:right;"> 32.26237 </td> </tr> <tr> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> Premium </td> <td style="text-align:left;"> E </td> <td style="text-align:right;"> 3.79 </td> <td style="text-align:right;"> 3.77 </td> <td style="text-align:right;"> 2.26 </td> <td style="text-align:right;"> 32.29156 </td> </tr> <tr> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> Premium </td> <td style="text-align:left;"> D </td> <td style="text-align:right;"> 3.77 </td> <td style="text-align:right;"> 3.72 </td> <td style="text-align:right;"> 2.31 </td> <td style="text-align:right;"> 32.39636 </td> </tr> </tbody> </table> --- ## Use with {ggplot2} ```r color_cut <- diamonds %>% group_by(color, cut) %>% summarise(price = mean(price)) ggplot(color_cut, aes(x = color, y = price)) + geom_line(aes(group = cut), color = "grey80") + geom_point(aes(color = cut)) + theme_bw() ``` <img src="index_files/figure-html/diamonds5-1.svg" style="display: block; margin: auto;" /> --- ```r diamonds_count <- diamonds %>% group_by(color) %>% summarise(count = n()) ggplot() + geom_bar(data = diamonds, aes(x = color, fill = color), color = "black") + geom_text(data = diamonds_count, aes(label = count, x = color, y = count/2)) + theme_bw() + theme(axis.title.x = element_blank(), legend.position = "none") + labs(y = "Count") ``` <img src="index_files/figure-html/diamonds6-1.svg" style="display: block; margin: auto;" /> --- ## Tidy data .center[ <img src="tidy.png" alt="Drawing" style="height: 240px;"/> ] 1. Each variable forms a column 2. Each observation forms a row 3. Each type of observational unit forms a table --- ##Messy data .center[ <img src="messydata.png" alt="Drawing" style="height: 490px;"/> ] --- ## Grades dataset ```r set.seed(1) Name <- c("Jasmine", "Kate", "Mike", "Peter", "Thomas") Test1 <- rnorm(n = 5, mean = c(11, 15, 16, 10, 8), sd = 2) %>% round Test2 <- rnorm(n = 5, mean = c(11, 15, 16, 10, 8), sd = 2) %>% round Test3 <- rnorm(n = 5, mean = c(11, 15, 16, 10, 8), sd = 2) %>% round Test4 <- rnorm(n = 5, mean = c(11, 15, 16, 10, 8), sd = 2) %>% round Test5 <- rnorm(n = 5, mean = c(11, 15, 16, 10, 8), sd = 2) %>% round grades <- data.frame(Name, Test1, Test2, Test3, Test4, Test5) ``` <table> <thead> <tr> <th style="text-align:left;"> Name </th> <th style="text-align:right;"> Test1 </th> <th style="text-align:right;"> Test2 </th> <th style="text-align:right;"> Test3 </th> <th style="text-align:right;"> Test4 </th> <th style="text-align:right;"> Test5 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Jasmine </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 13 </td> </tr> <tr> <td style="text-align:left;"> Kate </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 16 </td> <td style="text-align:right;"> 16 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 17 </td> </tr> <tr> <td style="text-align:left;"> Mike </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 16 </td> </tr> <tr> <td style="text-align:left;"> Peter </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 6 </td> </tr> <tr> <td style="text-align:left;"> Thomas </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 9 </td> </tr> </tbody> </table> --- ## Tidy datasets with {tidyr} ```r grades_gather <- grades %>% gather(key = Test, value = Grade, -Name) ```
--- ```r ggplot(grades_gather) + geom_bar(aes(x = Test, y = Grade, fill = Name), color = "black", stat = "identity") + facet_grid(.~Name) + guides(fill = FALSE) + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) ``` <img src="index_files/figure-html/grades5-1.svg" style="display: block; margin: auto;" /> --- class: middle, center, inverse #Practice makes perfect --- ##Glucose dataset ```r Glucose <- read.table("Glucose.txt") ```
--- <img src="index_files/figure-html/glucose3-1.svg" style="display: block; margin: auto;" /> --- ```r Glucose %>% mutate(Subject = as.factor(Subject)) %>% ggplot(aes(Time, conc, color = Subject)) + geom_point() + geom_line() + facet_wrap(~Meal) + theme_bw() + labs(x = "Time", y = "Concentration") ``` <img src="index_files/figure-html/glucose3c-1.svg" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/glucose4-1.svg" style="display: block; margin: auto;" /> --- ```r Glucose_stat <- Glucose %>% group_by(Time) %>% summarise(mean = mean(conc, na.rm = TRUE), median = median(conc, na.rm = TRUE), q5 = quantile(conc, probs = 0.05, na.rm = TRUE), q95 = quantile(conc, probs = 0.95, na.rm = TRUE)) ggplot(Glucose_stat, aes(x = Time)) + geom_errorbar(aes(ymin = q5, ymax = q95)) + geom_line(aes(y = mean), color = "blue") + geom_point(aes(y = median), color = "red", size = 5) + scale_x_continuous(breaks = c(-0.25, 0, 0.5, 1, 1.5, 2, 3, 4, 5, 6, 7)) + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(y = "Concentration", title = "Summary of Concentration over time", caption = "5th and 95th percentile are in black, median is in red and mean in blue.") ``` <img src="index_files/figure-html/glucose4c-1.svg" style="display: block; margin: auto;" /> --- ## Pokemon dataset ```r Pokemon <- read.table("Pokemon.txt") Pokemon %<>% mutate(Generation = as.factor(Generation)) ``` <table> <thead> <tr> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Type.1 </th> <th style="text-align:left;"> Type.2 </th> <th style="text-align:right;"> Total </th> <th style="text-align:right;"> HP </th> <th style="text-align:right;"> Attack </th> <th style="text-align:left;"> Generation </th> <th style="text-align:left;"> Legendary </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Bulbasaur </td> <td style="text-align:left;"> Grass </td> <td style="text-align:left;"> Poison </td> <td style="text-align:right;"> 318 </td> <td style="text-align:right;"> 45 </td> <td style="text-align:right;"> 49 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> FALSE </td> </tr> <tr> <td style="text-align:left;"> Ivysaur </td> <td style="text-align:left;"> Grass </td> <td style="text-align:left;"> Poison </td> <td style="text-align:right;"> 405 </td> <td style="text-align:right;"> 60 </td> <td style="text-align:right;"> 62 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> FALSE </td> </tr> <tr> <td style="text-align:left;"> Venusaur </td> <td style="text-align:left;"> Grass </td> <td style="text-align:left;"> Poison </td> <td style="text-align:right;"> 525 </td> <td style="text-align:right;"> 80 </td> <td style="text-align:right;"> 82 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> FALSE </td> </tr> <tr> <td style="text-align:left;"> VenusaurMega Venusaur </td> <td style="text-align:left;"> Grass </td> <td style="text-align:left;"> Poison </td> <td style="text-align:right;"> 625 </td> <td style="text-align:right;"> 80 </td> <td style="text-align:right;"> 100 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> FALSE </td> </tr> <tr> <td style="text-align:left;"> Charmander </td> <td style="text-align:left;"> Fire </td> <td style="text-align:left;"> NA </td> <td style="text-align:right;"> 309 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> 52 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> FALSE </td> </tr> <tr> <td style="text-align:left;"> Charmeleon </td> <td style="text-align:left;"> Fire </td> <td style="text-align:left;"> NA </td> <td style="text-align:right;"> 405 </td> <td style="text-align:right;"> 58 </td> <td style="text-align:right;"> 64 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> FALSE </td> </tr> <tr> <td style="text-align:left;"> Charizard </td> <td style="text-align:left;"> Fire </td> <td style="text-align:left;"> Flying </td> <td style="text-align:right;"> 534 </td> <td style="text-align:right;"> 78 </td> <td style="text-align:right;"> 84 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> FALSE </td> </tr> <tr> <td style="text-align:left;"> CharizardMega Charizard X </td> <td style="text-align:left;"> Fire </td> <td style="text-align:left;"> Dragon </td> <td style="text-align:right;"> 634 </td> <td style="text-align:right;"> 78 </td> <td style="text-align:right;"> 130 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> FALSE </td> </tr> </tbody> </table> --- <img src="index_files/figure-html/pkmn3-1.svg" style="display: block; margin: auto;" /> --- ```r ggplot(Pokemon, aes(x = Generation, y = Total, fill = Generation)) + geom_violin(aes(fill = Generation)) + geom_boxplot(alpha = 0) + geom_jitter(width = 0.3) + scale_fill_discrete(guide = FALSE) + labs(x = "Generation", y = "Total statistics") + theme_bw() ``` <img src="index_files/figure-html/pkmn3c-1.svg" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pkmn4-1.svg" style="display: block; margin: auto;" /> --- ```r Pokemon_poison <- Pokemon %>% filter(Type.1 == "Poison" | Type.2 == "Poison") %>% arrange(desc(Total)) ggplot(Pokemon_poison, aes(x = Attack, y = Sp..Atk)) + geom_smooth(method = "lm", color = "purple") + geom_point(color = "purple") + geom_text(aes(label = Name), check_overlap = TRUE) + theme_bw() + labs(y = "Special attack") ``` <img src="index_files/figure-html/pkmn4c-1.svg" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pkmn5-1.svg" style="display: block; margin: auto;" /> --- ```r ggplot(Pokemon, aes(x = 1, fill = Legendary)) + geom_bar(position = "stack") + coord_polar(theta = "y") + theme_void() + labs(x = NULL, y = NULL) ``` <img src="index_files/figure-html/pkmn5c-1.svg" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pkmn6-1.svg" style="display: block; margin: auto;" /> --- ```r pkmn_total_summary <- Pokemon %>% group_by(Generation) %>% summarise(Mean = mean(Total), Median = median(Total), q25 = quantile(Total, 0.25), q75 = quantile(Total, 0.75)) %>% gather(key = Statistic, value = Value, -Generation) ggplot() + geom_histogram(data = Pokemon, aes(x = Total, fill = Generation), bins = 80, col = "black") + geom_vline(data = pkmn_total_summary, aes(xintercept = Value, linetype = Statistic)) + facet_grid(Generation~., scales = "free_y") + guides(fill = FALSE) + labs(x = "Total statistics", y = "Count") + theme_bw() ``` <img src="index_files/figure-html/pkmn6c-1.svg" style="display: block; margin: auto;" /> --- class: middle, center, inverse # Going further and references --- ### ggplot2 * ggplot2: Elegant Graphics for Data Analysis (Hadley Wickham) * [**R User Grenoble presentation**](https://privefl.github.io/R-presentation/ggplot2.html) ### Tidy data * [**Tidy data**](http://tidyr.tidyverse.org/articles/tidy-data.html) * [**Messy data**](https://medium.com/@miles.mcbain/tidying-the-australian-same-sex-marriage-postal-survey-data-with-r-5d35cea07962) ### Visualisation * [**The Worst Chart In The World**](http://www.businessinsider.fr/us/pie-charts-are-the-worst-2013-6/) --- class: center, middle, inverse # Thanks for your attention :) <a href="mailto:antoine.bichat@mines-nancy.org">antoine.bichat@mines-nancy.org</a> .footnote[Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan).]