Uses dplyr operations to create boxplot calculations. Because of this approach, the calculations automatically run inside the database if `data` has a database or sparklyr connection. The `class()` of such tables in R are: tbl_sql, tbl_dbi, tbl_spark
Requires database support for percentile/quantile functions. Confirmed to work with:
DuckDB (recommended for local examples) - uses quantile()
Spark/Hive (via sparklyr) - uses percentile_approx()
SQL Server (2012+) - uses PERCENTILE_CONT()
PostgreSQL (9.4+) - uses percentile_cont()
Oracle (9i+) - uses PERCENTILE_CONT()
Does NOT work with SQLite, MySQL < 8.0, or MariaDB (no percentile support).
Note that this function supports input tbl that already contains grouping variables. This can be useful when creating faceted boxplots.
Value
An ungrouped data.frame with boxplot statistics including columns for the grouping variable(s), quartiles (lower, middle, upper), whisker bounds (ymin, ymax), and the count of observations per group.
Examples
# \dontrun{
library(DBI)
library(dplyr)
con <- dbConnect(duckdb::duckdb(), ":memory:")
db_mtcars <- copy_to(con, mtcars, "mtcars")
db_mtcars |>
db_compute_boxplot(am, mpg)
#> # A tibble: 2 × 12
#> am n lower middle upper max_raw min_raw iqr min_iqr max_iqr ymax
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 13 21 22.8 30.4 33.9 15 14.1 6.9 44.5 33.9
#> 2 0 19 15.0 17.3 19.2 24.4 10.4 6.38 8.57 25.6 24.4
#> # ℹ 1 more variable: ymin <dbl>
dbDisconnect(con)
# }