Select variables with the lowest (adjusted) p-value of a Kruskal-Wallis test against an outcome.
Arguments
- recipe
A recipe object. The step will be added to the sequence of operations for this recipe.
- ...
One or more selector functions to choose variables for this step. See
recipes::selections()
for more details.- role
Not used by this step since no new variables are created.
- trained
A logical to indicate if the quantities for preprocessing have been estimated.
- outcome
Name of the variable to perform the test against.
- n_kept
Number of variables to keep.
- prop_kept
A numeric value between 0 and 1 representing the proportion of variables to keep.
n_kept
andprop_kept
are mutually exclusive.- cutoff
Threshold beyond which (below or above) the variables are discarded.
- correction
Multiple testing correction method. One of
p.adjust.methods
. Default to"none"
.- res
This parameter is only produced after the recipe has been trained.
- skip
A logical. Should the step be skipped when the recipe is baked by
recipes::bake()
? While all operations are baked whenrecipes::prep()
is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when usingskip = TRUE
as it may affect the computations for subsequent operations.- id
A character string that is unique to this step to identify it.
- x
A
step_select_kruskal
object.
Value
An updated version of recipe with the new step added to the sequence of any existing operations.
Examples
rec <-
iris %>%
recipe(formula = Species ~ .) %>%
step_select_kruskal(all_numeric_predictors(), outcome = "Species",
correction = "fdr", prop_kept = 0.5) %>%
prep()
rec
#>
#> ── Recipe ──────────────────────────────────────────────────────────────────────
#>
#> ── Inputs
#> Number of variables by role
#> outcome: 1
#> predictor: 4
#>
#> ── Training information
#> Training data contained 150 data points and no incomplete rows.
#>
#> ── Operations
#> • Kruskal filtering against Species on: Sepal.Length, ... | Trained
tidy(rec, 1)
#> # A tibble: 4 × 5
#> terms pv qv kept id
#> <chr> <dbl> <dbl> <lgl> <chr>
#> 1 Sepal.Length 8.92e-22 1.19e-21 FALSE select_kruskal_jF0Hq
#> 2 Sepal.Width 1.57e-14 1.57e-14 FALSE select_kruskal_jF0Hq
#> 3 Petal.Length 4.80e-29 9.61e-29 TRUE select_kruskal_jF0Hq
#> 4 Petal.Width 3.26e-29 9.61e-29 TRUE select_kruskal_jF0Hq
bake(rec, new_data = NULL)
#> # A tibble: 150 × 3
#> Petal.Length Petal.Width Species
#> <dbl> <dbl> <fct>
#> 1 1.4 0.2 setosa
#> 2 1.4 0.2 setosa
#> 3 1.3 0.2 setosa
#> 4 1.5 0.2 setosa
#> 5 1.4 0.2 setosa
#> 6 1.7 0.4 setosa
#> 7 1.4 0.3 setosa
#> 8 1.5 0.2 setosa
#> 9 1.4 0.2 setosa
#> 10 1.5 0.1 setosa
#> # ℹ 140 more rows