Uses very generic dplyr code to create boxplot calculations. Because of this approach, the calculations automatically run inside the database if `data` has a database or sparklyr connection. The `class()` of such tables in R are: tbl_sql, tbl_dbi, tbl_spark
It currently only works with Spark, Hive, and SQL Server connections.
Note that this function supports input tbl that already contains grouping variables. This can be useful when creating faceted boxplots.
db_compute_boxplot(data, x, var, coef = 1.5)
data | A table (tbl), can already contain additional grouping vars specified |
---|---|
x | A discrete variable in which to group the boxplots |
var | A continuous variable |
coef | Length of the whiskers as multiple of IQR. Defaults to 1.5 |
mtcars %>% db_compute_boxplot(am, mpg)#> # A tibble: 2 x 12 #> am n lower middle upper max_raw min_raw iqr min_iqr max_iqr ymax #> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 0 19 15.0 17.3 19.2 24.4 10.4 6.38 8.57 25.6 24.4 #> 2 1 13 21 22.8 30.4 33.9 15 14.1 6.9 44.5 33.9 #> # … with 1 more variable: ymin <dbl>