Uses very generic dplyr code to create boxplot calculations. Because of this approach, the calculations automatically run inside the database if `data` has a database or sparklyr connection. The `class()` of such tables in R are: tbl_sql, tbl_dbi, tbl_spark

It currently only works with Spark, Hive, and SQL Server connections.

Note that this function supports input tbl that already contains grouping variables. This can be useful when creating faceted boxplots.

db_compute_boxplot(data, x, var, coef = 1.5)

Arguments

data

A table (tbl), can already contain additional grouping vars specified

x

A discrete variable in which to group the boxplots

var

A continuous variable

coef

Length of the whiskers as multiple of IQR. Defaults to 1.5

Examples

mtcars %>% db_compute_boxplot(am, mpg)
#> # A tibble: 2 x 12 #> am n lower middle upper max_raw min_raw iqr min_iqr max_iqr ymax #> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 0 19 15.0 17.3 19.2 24.4 10.4 6.38 8.57 25.6 24.4 #> 2 1 13 21 22.8 30.4 33.9 15 14.1 6.9 44.5 33.9 #> # … with 1 more variable: ymin <dbl>