Using RStudio Workbench inside of Qubole
Overview
Qubole users can request access to RStudio Server Pro. This allows users to use sparklyr
to interact directly with Spark from within the Qubole cluster.
Advantages and limitations
Advantages:
- Ability for users to connect
sparklyr
directly to Spark within Qubole - Provides a high-bandwidth connection between R and the Spark JVM processes because they are running on the same machine
- Can load data from the cluster directly into an R session since RStudio Workbench is installed within the Qubole cluster
- A unique, persistent home directory for each user
Limitations:
- Persistent packages must be managed using Qubole Environments, not directly from within RStudio
- RStudio Workbench installed within a Qubole cluster will be limited to the compute resources and lifecycle of that particular Spark cluster
- Non-Spark jobs will use CPU and RAM resources within the Qubole cluster
Access RStudio Workbench
RStudio Workbench can be accessed from the cluster resources menu:
Use sparklyr
Use the following R code to establish a connection from sparklyr
to the Qubole cluster:
library(sparklyr)
sc <- spark_connect(method = "qubole")
Additional information
For more information on using RStudio Workbench inside of Qubole, refer to the Qubole documentation.