::key_set(service = "my-database",
keyringusername = "myusername")
Securing Credentials
As with every programming language, it is important to avoid publishing code with your credentials in plain text. There are several options to protect your credentials in R. In order of preference, here are the methods that we will cover:
Integrated security with DSN
Integrated security without DSN
Encrypt credentials with the
keyring
packageUse a configuration file with the
config
packageEnvironment variables using the
.Renviron
fileUsing the
options
base R commandPrompt for credentials using the RStudio IDE.
Integrated security with DSN
The very best solution consists of two pieces that are usually out of the hands of the analyst:
There is integrated security between the terminal and the database, usually via Kerberos.
An ODBC connection has been pre-configured in the Desktop or Server by someone with sufficient access rights. The ODBC connection will have a unique Data Source Name, or DSN.
<- DBI::dbConnect(odbc::odbc(), "My DSN Name") con
This is considered the best option, because no information about the database, other than an internal alias, is available in plain text code.
Integrated security without DSN
In cases where the analyst is not able to setup a DSN, or a quick prototype is needed, it is possible to pass the necessary connection settings as arguments to the DBI::dbConnect
command. With integrated security, there are three arguments that are typically required: the driver name, the network path to the server (or cluster), and the database name. Other arguments may be required depending on the type of database; for instance, in the example below, the port number is required for a connection with an Apache Impala database.
<- DBI::dbConnect(odbc::odbc(),
con Driver = "impala",
Host = "hadoop-cluster",
Schema = "default",
Port = 21050)
Encrypt credentials with keyring
The keyring package uses the operating system’s credential storage. It currently supports:
- Keychain on MacOS
- Credential Store on Windows
- the Secret Service API on Linux
You can now install from CRAN ::: {.cell}
install.packages("keyring")
:::
or the dev version from github ::: {.cell}
::install_github("r-lib/keyring") devtools
:::
Keyrings and keys
Inside the credential store, there are keyrings
that contain multiple keys
. A keyring
is secured with a single password, which then grants access to to all of its keys
. In our case, we will store a single database connection credential per key
. We will use the default keyring, which is automatically unlocked when a user signs in.
Our first key
A key
has four main attributes:
service
- The key’s unique identifier; we will use the database server’s path for our keys.keyring
- The key’s ‘parent’ keyring. If not specified, the default keyring is used.username
password
The way keyring
retrieves data is by passing the keyring
name and a service
name. If no keyring
is passed, the default keyring is used. The keyring::key_set()
function is used to create the key; a prompt will appear asking for the password that should be used for the key:
Retrieve credentials
The key_list()
command is used to retrieve the username
:
::key_list("my-database") keyring
To extract the only the username
, so as to pass it inside the connection string, use:
::key_list("my-database")[1,2] keyring
To extract the password
, use key_get()
:
::key_get("my-database", "myusername") keyring
These functions can be used to supply the database credentials without storing them in plain text or an environment variable:
<- dbConnect(odbc::odbc(),
con Driver = "SQLServer",
Server = "my-database",
Port = 1433,
Database = "default",
UID = keyring::key_list("my-database")[1,2],
PWD = keyring::key_get("my-database", "myusername"))
The default keyring is unlocked anytime the user is signed in. If a new keyring is created and used, the Operating System will prompt the user for the keyring
password when the code executes.
Stored in a file with config
The config
package is meant to make it easier to deploy content, but we also use it to keep the credentials outside of the R script by saving them in the config.yml
file. In fact, all of the connection arguments can be saved in the yml
file.
install.packages("config")
Here is an example yml
file:
default:
datawarehouse:
driver: 'Postgres'
server: 'mydb-test.company.com'
uid: 'local-account'
pwd: 'my-password'
port: 5432
database: 'regional-sales'
This is how the connection arguments would be called inside the R script:
<- config::get("datawarehouse")
dw
<- DBI::dbConnect(odbc::odbc(),
con Driver = dw$driver,
Server = dw$server,
UID = dw$uid,
PWD = dw$pwd,
Port = dw$port,
Database = dw$database
)
Use Environment variables
The .Renviron
file can be used to store the credentials, which can then be retrieved with Sys.getenv()
. Here are the steps:
- Create a new file defining the credentials:
= "username"
userid = "password" pwd
Save it in your home directory with the file name
.Renviron
. If you are asked whether you want to save a file whose name begins with a dot, say YES.Note that by default, dot files are usually hidden. However, within RStudio, the file browser will make .Renviron visible and therefore easy to edit in the future.
Restart R. .Renviron is processed only at the start of an R session.
Retrieve the credentials using
Sys.getenv()
while opening the connection: ::: {.cell}
<- DBI::dbConnect(odbc::odbc(),
con Driver = "impala",
Host = "database.rstudio.com",
UID = Sys.getenv("userid"),
PWD = Sys.getenv("pwd")
)
:::
Using options()
You can record the user name and password as a global option in R. Use the options()
command to set a custom option, and then use getOption()
to retrieve it.
The following example code sets credentials. When trying this out, be sure to remove these lines from your published work:
options(database_userid = "myuserid")
options(database_password = "mypassword")
This is how the credentials can be called within the published work:
<- DBI::dbConnect(odbc::odbc(),
con Driver = "impala",
Host = "database.rstudio.com",
UID = getOption("database_userid"),
PWD = getOption("database_password")
)
Prompt for Credentials
The RStudio IDE’s API can be used to prompt the user to enter the credentials in a popup box that masks what is typed:
<- DBI::dbConnect(odbc::odbc(),
con Driver = "impala",
Host = "database.rstudio.com",
UID = rstudioapi::askForPassword("Database user"),
PWD = rstudioapi::askForPassword("Database password")
)