Skip to contents

Preliminaries

This vignette illustrates the most useful functions of yatah.

Data

For this example, we use data from Zeller et al. (2014). It is the abundances of bacteria present in 199 stool samples.

abundances <- as_tibble(yatah::abundances)
print(abundances, max_extra_cols = 2)
#> # A tibble: 1,585 × 200
#>    lineages       `CCIS00146684ST-4-0` `CCIS00281083ST-3-0` `CCIS02124300ST-4-0`
#>    <chr>                         <dbl>                <dbl>                <dbl>
#>  1 k__Bacteria               100.                  99.8                   96.3  
#>  2 k__Viruses                  0.00697              0.128                  3.70 
#>  3 k__Bacteria|p…             66.2                 24.6                   74.2  
#>  4 k__Bacteria|p…             19.1                 74.4                   11.9  
#>  5 k__Bacteria|p…             12.1                  0.0428                 7.22 
#>  6 k__Bacteria|p…              1.86                 0.428                  0.765
#>  7 k__Bacteria|p…              0.758                0.388                  2.28 
#>  8 k__Viruses|p_…              0.00697              0.128                  3.70 
#>  9 k__Bacteria|p…              0.00155              0.00415                0    
#> 10 k__Bacteria|p…             62.4                 21.7                   62.3  
#> # ℹ 1,575 more rows
#> # ℹ 196 more variables: `CCIS02379307ST-4-0` <dbl>, `CCIS02856720ST-4-0` <dbl>,
#> #   …
taxonomy <- select(abundances, lineages)
taxonomy
#> # A tibble: 1,585 × 1
#>    lineages                                  
#>    <chr>                                     
#>  1 k__Bacteria                               
#>  2 k__Viruses                                
#>  3 k__Bacteria|p__Firmicutes                 
#>  4 k__Bacteria|p__Bacteroidetes              
#>  5 k__Bacteria|p__Actinobacteria             
#>  6 k__Bacteria|p__Verrucomicrobia            
#>  7 k__Bacteria|p__Proteobacteria             
#>  8 k__Viruses|p__Viruses_noname              
#>  9 k__Bacteria|p__Candidatus_Saccharibacteria
#> 10 k__Bacteria|p__Firmicutes|c__Clostridia   
#> # ℹ 1,575 more rows

Filtering

Here, we have all the present bacteria at all different ranks. As we are just interested in genera that belong to the Gammaproteobacteria class, we filter() the lineages with is_clade() and is_rank(). The genus name is accessible with get_last_clade().

gammap_genus <-
  taxonomy %>% 
  filter(is_clade(lineages, "Gammaproteobacteria"),
         is_rank(lineages, "genus")) %>% 
  mutate(genus = get_last_clade(lineages))
gammap_genus
#> # A tibble: 26 × 2
#>    lineages                                                                genus
#>    <chr>                                                                   <chr>
#>  1 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteria… Esch…
#>  2 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pasteurellales… Haem…
#>  3 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteria… Ente…
#>  4 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadale… Pseu…
#>  5 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteria… Ente…
#>  6 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pasteurellales… Aggr…
#>  7 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteria… Hafn…
#>  8 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pasteurellales… Acti…
#>  9 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Xanthomonadale… Sino…
#> 10 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteria… Citr…
#> # ℹ 16 more rows

Taxonomic table

It is useful to have a taxonomic table. taxtable() do the job.

gammaprot_table <-
  gammap_genus %>% 
  pull(lineages) %>% 
  taxtable()
as_tibble(gammaprot_table)
#> # A tibble: 26 × 6
#>    kingdom  phylum         class               order             family    genus
#>    <chr>    <chr>          <chr>               <chr>             <chr>     <chr>
#>  1 Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enteroba… Esch…
#>  2 Bacteria Proteobacteria Gammaproteobacteria Pasteurellales    Pasteure… Haem…
#>  3 Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enteroba… Ente…
#>  4 Bacteria Proteobacteria Gammaproteobacteria Pseudomonadales   Pseudomo… Pseu…
#>  5 Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enteroba… Ente…
#>  6 Bacteria Proteobacteria Gammaproteobacteria Pasteurellales    Pasteure… Aggr…
#>  7 Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enteroba… Hafn…
#>  8 Bacteria Proteobacteria Gammaproteobacteria Pasteurellales    Pasteure… Acti…
#>  9 Bacteria Proteobacteria Gammaproteobacteria Xanthomonadales   Sinobact… Sino…
#> 10 Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enteroba… Citr…
#> # ℹ 16 more rows

Taxonomic tree

To have a tree, use taxtree() with a taxonomic table in input. By default, it collapses ranks with only one subrank.

gammaprot_tree <- taxtree(gammaprot_table)
gammaprot_tree
#> 
#> Phylogenetic tree with 26 tips and 7 internal nodes.
#> 
#> Tip labels:
#>   Escherichia, Enterobacteriaceae_noname, Enterobacter, Hafnia, Citrobacter, Pantoea, ...
#> Node labels:
#>   Gammaproteobacteria, Enterobacteriaceae, Pasteurellaceae, Pseudomonadales, Moraxellaceae, Xanthomonadales, ...
#> 
#> Rooted; includes branch lengths.
plot(gammaprot_tree, show.node.label = TRUE, cex = 0.7, 
     main = "Taxonomy of Gammaproteobacteria")

References

Zeller, Georg, Julien Tap, Anita Y Voigt, Shinichi Sunagawa, Jens Roat Kultima, Paul I Costea, Aurélien Amiot, et al. 2014. “Potential of Fecal Microbiota for Early-Stage Detection of Colorectal Cancer.” Molecular Systems Biology 10 (11): 766.