Skip to contents

Select features that exceed a background level in at least a defined number of samples.

Usage

step_select_background(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  background_level = NULL,
  n_samples = NULL,
  prop_samples = NULL,
  res = NULL,
  skip = FALSE,
  id = rand_id("select_background")
)

# S3 method for class 'step_select_background'
tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose variables for this step. See recipes::selections() for more details.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

background_level

Background level to exceed.

n_samples, prop_samples

Count or proportion of samples in which a feature exceeds background_level to be retained.

res

This parameter is only produced after the recipe has been trained.

skip

A logical. Should the step be skipped when the recipe is baked by recipes::bake()? While all operations are baked when recipes::prep() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

id

A character string that is unique to this step to identify it.

x

A step_select_background object.

Value

An updated version of recipe with the new step added to the sequence of any existing operations.

Author

Antoine Bichat

Examples

rec <-
  iris %>%
  recipe(formula = Species ~ .) %>%
  step_select_background(all_numeric_predictors(),
                         background_level = 4, prop_samples = 0.5) %>%
  prep()
rec
#> 
#> ── Recipe ──────────────────────────────────────────────────────────────────────
#> 
#> ── Inputs 
#> Number of variables by role
#> outcome:   1
#> predictor: 4
#> 
#> ── Training information 
#> Training data contained 150 data points and no incomplete rows.
#> 
#> ── Operations 
#>  Background filtering on: Sepal.Length Sepal.Width, ... | Trained
tidy(rec, 1)
#> # A tibble: 4 × 3
#>   terms        kept  id                     
#>   <chr>        <lgl> <chr>                  
#> 1 Sepal.Length TRUE  select_background_fdVHI
#> 2 Sepal.Width  FALSE select_background_fdVHI
#> 3 Petal.Length TRUE  select_background_fdVHI
#> 4 Petal.Width  FALSE select_background_fdVHI
bake(rec, new_data = NULL)
#> # A tibble: 150 × 3
#>    Sepal.Length Petal.Length Species
#>           <dbl>        <dbl> <fct>  
#>  1          5.1          1.4 setosa 
#>  2          4.9          1.4 setosa 
#>  3          4.7          1.3 setosa 
#>  4          4.6          1.5 setosa 
#>  5          5            1.4 setosa 
#>  6          5.4          1.7 setosa 
#>  7          4.6          1.4 setosa 
#>  8          5            1.5 setosa 
#>  9          4.4          1.4 setosa 
#> 10          4.9          1.5 setosa 
#> # ℹ 140 more rows