Uses very generic dplyr code to create histogram bins. Because of this approach, the calculations automatically run inside the database if `data` has a database or sparklyr connection. The `class()` of such tables in R are: tbl_sql, tbl_dbi, tbl_spark
db_compute_bins(data, x, bins = 30, binwidth = NULL)
data | A table (tbl) |
---|---|
x | A continuous variable |
bins | Number of bins. Defaults to 30. |
binwidth | Single value that sets the side of the bins, it overrides bins |
# Returns record count for 30 bins in mpg mtcars %>% db_compute_bins(mpg)#> # A tibble: 19 x 2 #> mpg count #> <dbl> <int> #> 1 10.4 2 #> 2 12.8 1 #> 3 13.5 1 #> 4 14.3 2 #> 5 15.1 4 #> 6 15.9 1 #> 7 16.7 1 #> 8 17.4 2 #> 9 18.2 1 #> 10 19.0 3 #> 11 20.6 2 #> 12 21.4 3 #> 13 22.2 2 #> 14 23.7 1 #> 15 25.3 1 #> 16 26.8 1 #> 17 30.0 2 #> 18 32.3 1 #> 19 33.1 1# Returns record count for bins of size 10 mtcars %>% db_compute_bins(mpg, binwidth = 10)#> # A tibble: 2 x 2 #> mpg count #> <dbl> <int> #> 1 10.4 18 #> 2 20.4 14