Metagenomics with R

# Metagenomics with R
## R Users Grenoble
### Antoine Bichat
### November 23, 2017

---

#Metagenomics

---

# What is this?

* Metagenomics is the study of the genomes of all species living in a given environment
.footnote[
[*] This presentation is simplified to make the topic understable in 5 minutes.
]

* One wants to know the composition in micro-organisms of different samples

* Principally _Bacteria_, _Archea_ and _Fungi_

* More and more companies are interested in metagenomics
.center[
![](Nestle-Research.jpg)
![](enterome.jpg)
]

---

# Abundance table

* Output from bioinformatic pipeline

* Input for statistical analysis

<br>

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Taxon </th>
   <th style="text-align:right;"> Sample1 </th>
   <th style="text-align:right;"> Sample2 </th>
   <th style="text-align:right;"> Sample3 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Escherichia coli </td>
   <td style="text-align:right;"> 27.8 </td>
   <td style="text-align:right;"> 22.1 </td>
   <td style="text-align:right;"> 19.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Enterobacter cloacae </td>
   <td style="text-align:right;"> 23.8 </td>
   <td style="text-align:right;"> 16.5 </td>
   <td style="text-align:right;"> 24.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Bifidobacterium longum </td>
   <td style="text-align:right;"> 7.9 </td>
   <td style="text-align:right;"> 21.2 </td>
   <td style="text-align:right;"> 27.9 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Klebsiella sp </td>
   <td style="text-align:right;"> 3.6 </td>
   <td style="text-align:right;"> 10.5 </td>
   <td style="text-align:right;"> 11.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Staphylococcus aureus </td>
   <td style="text-align:right;"> 1.7 </td>
   <td style="text-align:right;"> 13.4 </td>
   <td style="text-align:right;"> 9.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Bacteroidetes fragilis </td>
   <td style="text-align:right;"> 19.3 </td>
   <td style="text-align:right;"> 0.0 </td>
   <td style="text-align:right;"> 0.8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Other </td>
   <td style="text-align:right;"> 15.9 </td>
   <td style="text-align:right;"> 16.3 </td>
   <td style="text-align:right;"> 7.6 </td>
  </tr>
</tbody>
</table>

---
class: middle, inverse, center

# R Pipeline

---
# Biological process

* One gene (16S) present in all bacteria with variation is isolated and sequenced

<br>

* Output of the sequencing: One FASTA file per sample
```
> GTCGATCGATGCCCTAGCCGATAGATCCCGATATAGCCGATAGAAAATATACGA...
> GTCGATCGATGCCCTAGCCGATAGATCGCGATATAGCCGATAGAAAATATACGT...
> GTCGATCGATGCCCTAGCCGATAGATCGCGATATAGCCGATAGAAAATATACGA...
> GTCGATCGATGCCATAGCCGATAGATCCCGATATAGCCGATAGAAAATATACGA...
...
```
---
# Clustering

Similar sequences are grouped

* Similarity threshold in genomes

* Error correction of sequences
  
<br>

```
> GTCGATCGATGCCCTAGCCGATAGATCCCGATATAGCCGATAGAAAATATACGA... -> Group 1
> GTCGATCGATGCCCTAGCCGATAGATCGCGATATAGCCGATAGAAAATATACGT... -> Group 2
> GTCGATCGATGCCCTAGCCGATAGATCGCGATATAGCCGATAGAAAATATACGA... -> Group 1
> GTCGATCGATGCCATAGCCGATAGATCCCGATATAGCCGATAGAAAATATACGA... -> Group 3 
...
```
---
# Annotation

A species is assignated to each group

* NBC (Naive Bayesian Classifier)
  
* BLAST (Basic Local Alignment Search Tool)

<br>

```
Group 1 -> Escherichia coli
Group 2 -> Enterobacter cloacae
Group 3 -> Bifidobacterium longum
...
```
---
# DADA2

The R package [**dada2**](https://benjjneb.github.io/dada2/) uses error correction and NBC

<br>
<br>
<br>
<br>
.center[
<img src="bioconductor_logo.jpg" alt="Drawing" style="width: 500px;"/>
]

---
# Other bioinformatic tools
<br>

<blockquote class="twitter-tweet" data-lang="fr" align="center"><p lang="en" dir="ltr">The R package implementing our new method for identifying contaminants in amplicon/metagenomics data is available now, w/ documentation and tutorial vignette: <a href="https://t.co/lL1PsF6HnB">https://t.co/lL1PsF6HnB</a> <a href="https://twitter.com/hashtag/kitome?src=hash&amp;ref_src=twsrc%5Etfw">#kitome</a> <a href="https://twitter.com/hashtag/rstats?src=hash&amp;ref_src=twsrc%5Etfw">#rstats</a></p>&mdash; Benjamin Callahan (@bejcal) <a href="https://twitter.com/bejcal/status/932619941622239232?ref_src=twsrc%5Etfw">20 novembre 2017</a></blockquote>

---
class: middle, center, inverse

# Data visualisation
## With [**ggplot2**](http://ggplot2.tidyverse.org/)

---
# Sample composition...
<div style="margin-top: -30px"></div>
according to delivery mode

---
# Repartition of groups ...
<div style="margin-top: -30px"></div>
according to delivery mode and age
<br>
<br>
Each sample is assignated to one group according to its composition

---

# Thanks for your attention :)