class: center, middle, inverse, title-slide # Metagenomics with R ## R Users Grenoble ### Antoine Bichat ### November 23, 2017 --- class: center, middle, inverse #Metagenomics --- # What is this? * Metagenomics is the study of the genomes of all species living in a given environment .footnote[ [*] This presentation is simplified to make the topic understable in 5 minutes. ] -- * One wants to know the composition in micro-organisms of different samples -- * Principally _Bacteria_, _Archea_ and _Fungi_ -- * More and more companies are interested in metagenomics .center[ ![](Nestle-Research.jpg) ![](enterome.jpg) ] --- # Abundance table * Output from bioinformatic pipeline * Input for statistical analysis <br> <table> <thead> <tr> <th style="text-align:left;"> Taxon </th> <th style="text-align:right;"> Sample1 </th> <th style="text-align:right;"> Sample2 </th> <th style="text-align:right;"> Sample3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Escherichia coli </td> <td style="text-align:right;"> 27.8 </td> <td style="text-align:right;"> 22.1 </td> <td style="text-align:right;"> 19.0 </td> </tr> <tr> <td style="text-align:left;"> Enterobacter cloacae </td> <td style="text-align:right;"> 23.8 </td> <td style="text-align:right;"> 16.5 </td> <td style="text-align:right;"> 24.2 </td> </tr> <tr> <td style="text-align:left;"> Bifidobacterium longum </td> <td style="text-align:right;"> 7.9 </td> <td style="text-align:right;"> 21.2 </td> <td style="text-align:right;"> 27.9 </td> </tr> <tr> <td style="text-align:left;"> Klebsiella sp </td> <td style="text-align:right;"> 3.6 </td> <td style="text-align:right;"> 10.5 </td> <td style="text-align:right;"> 11.5 </td> </tr> <tr> <td style="text-align:left;"> Staphylococcus aureus </td> <td style="text-align:right;"> 1.7 </td> <td style="text-align:right;"> 13.4 </td> <td style="text-align:right;"> 9.0 </td> </tr> <tr> <td style="text-align:left;"> Bacteroidetes fragilis </td> <td style="text-align:right;"> 19.3 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 0.8 </td> </tr> <tr> <td style="text-align:left;"> Other </td> <td style="text-align:right;"> 15.9 </td> <td style="text-align:right;"> 16.3 </td> <td style="text-align:right;"> 7.6 </td> </tr> </tbody> </table> --- class: middle, inverse, center # R Pipeline --- # Biological process * One gene (16S) present in all bacteria with variation is isolated and sequenced <br> -- * Output of the sequencing: One FASTA file per sample ``` > GTCGATCGATGCCCTAGCCGATAGATCCCGATATAGCCGATAGAAAATATACGA... > GTCGATCGATGCCCTAGCCGATAGATCGCGATATAGCCGATAGAAAATATACGT... > GTCGATCGATGCCCTAGCCGATAGATCGCGATATAGCCGATAGAAAATATACGA... > GTCGATCGATGCCATAGCCGATAGATCCCGATATAGCCGATAGAAAATATACGA... ... ``` --- # Clustering Similar sequences are grouped * Similarity threshold in genomes * Error correction of sequences <br> ``` > GTCGATCGATGCCCTAGCCGATAGATCCCGATATAGCCGATAGAAAATATACGA... -> Group 1 > GTCGATCGATGCCCTAGCCGATAGATCGCGATATAGCCGATAGAAAATATACGT... -> Group 2 > GTCGATCGATGCCCTAGCCGATAGATCGCGATATAGCCGATAGAAAATATACGA... -> Group 1 > GTCGATCGATGCCATAGCCGATAGATCCCGATATAGCCGATAGAAAATATACGA... -> Group 3 ... ``` --- # Annotation A species is assignated to each group * NBC (Naive Bayesian Classifier) * BLAST (Basic Local Alignment Search Tool) <br> ``` Group 1 -> Escherichia coli Group 2 -> Enterobacter cloacae Group 3 -> Bifidobacterium longum ... ``` --- # DADA2 The R package [**dada2**](https://benjjneb.github.io/dada2/) uses error correction and NBC <br> <br> <br> <br> .center[ <img src="bioconductor_logo.jpg" alt="Drawing" style="width: 500px;"/> ] --- # Other bioinformatic tools <br> <blockquote class="twitter-tweet" data-lang="fr" align="center"><p lang="en" dir="ltr">The R package implementing our new method for identifying contaminants in amplicon/metagenomics data is available now, w/ documentation and tutorial vignette: <a href="https://t.co/lL1PsF6HnB">https://t.co/lL1PsF6HnB</a> <a href="https://twitter.com/hashtag/kitome?src=hash&ref_src=twsrc%5Etfw">#kitome</a> <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a></p>— Benjamin Callahan (@bejcal) <a href="https://twitter.com/bejcal/status/932619941622239232?ref_src=twsrc%5Etfw">20 novembre 2017</a></blockquote> --- class: middle, center, inverse # Data visualisation ## With [**ggplot2**](http://ggplot2.tidyverse.org/) --- # Sample composition... <div style="margin-top: -30px"></div> according to delivery mode .center[ <img src="barplot_compo.png" alt="Drawing" style="width: 650px;"/> ] --- # Repartition of groups ... <div style="margin-top: -30px"></div> according to delivery mode and age <br> <br> Each sample is assignated to one group according to its composition .center[ <img src="barplot_groups.png" alt="Drawing" style="width: 600px;"/> ] --- class: center, middle, inverse # Thanks for your attention :) .footnote[Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan).]