R and Rstudio
R is a popular statistical programming language for data science and analytics. We rely on Singularity containers to deploy R and you can get a refresher on modules and containers.
#
RWe encourage users to employ the containerized versions of R instead of compiling from source and running bare-metal. We'll use Docker hub containers as that is where the most regular updates to R come from.
#
User EnvironmentIf you use a non-custom R container you'll likely want to run install.packages()
at some point. Usually on a non-shared platform like your local setup (where you have full administrative privileges) R will install things into central paths. You don't want to do that on HYAK so you need to specify user paths.
caution
If you plan on using multiple R versions you will want to set R_LIBS
appropriately with each different container (i.e., R version) used so packages compiled against one version of R don't conflict with another. Using sub-folders with names matching that version of R is sufficient.
You can set custom R environment variables with the .Renviron
file. I set the R_LIBS
environment variable to point to a folder I created in "scrubbed" as an example but you will want to use a shared lab space or other path unique to your environment.
info
If you set R_LIBS
to your home directory you can quickly run out of inodes as R likes to create a lot of files. Use your lab directory instead.
#
Base ContainerLet's say we wanted to use R-4.0.3 from Docker hub [www].
Be sure to do this from a build node, you need to be routed to the internet to resolve Dockerhub so you can download and have compute resources to do the image conversion from a Docker to Singularity container.
The command will take a minute and create the SIF file in your current directory.
You can run the R binary within the container like below.
You can run install.packages()
as you normally would if you were working with R locally and it will install all the files to whatever path you set R_LIBS
to in the user environment instructions.
#
Tidyverse ContainerThe most popular library for R is the Tidyverse [www], which includes packages like ggplot2
, dplyr
, and others. As you can see in the previous section, it doesn't exist if we use the r-base
Docker hub container.
Your options are to:
- run
install.packages("tidyverse")
or - use a Docker container with it pre-installed.
Option 1, while ok, uses a lot (and I mean a lot) of inodes as well as taking a long time to compile. It's much leaner on the cluster and faster to use a pre-built container if you know you'll use the Tidyverse.
The Rocker Project [www] manages popular Docker containers for R, including a pre-built one with Tidyverse so you can grab the latest tagged container from Docker hub [www].
Prior instructions on R user environment apply but once downloaded (the Docker to Singularity conversion will take a few minutes), it will create a separate SIF file as shown below.
Now when you run this container's R binary you can successfully load the Tidyverse.
Success!
#
ModuleWe've since migrated from bare-metal R binaries compiled from source and provided as a module to leveraging containers. However, there are still some version 3 variants of R still available.
As a reminder all "contrib" prefixed modules are user community created and maintained (i.e., not supported by the HYAK team).
#
RstudioRstudio is an integrated development environment (IDE) for R. It's a front-end interface, historically a desktop application but it will be delivered through your browser in this instance.
Rstudio will run in a Singularity container on a compute node then be directed through the login node back to your local computer via port forwarding.
First you need to get the Rocker Rstudio container.
- Get an interactive session (e.g.,
srun
). - Load Singularity (i.e.,
module load singularity
). - Pull a version of Rocker Rstudio (e.g.,
singularity pull docker://rocker/rstudio:4.1.0
).
You will need to get our Github gist [www] which was adopted for KLONE from the tutorial by Rocker [www]. View this as "Raw" then download the text to your home directory. I set this as a rstudio-server.job
file. You will need to modify a few things:
- The
GSCRATCH
path, I set it to my scrubbed directory on KLONE but if you have a persitent lab folder you should use that instead. - Set your
R_LIBS_USER
path, I set it to my scrubbed directory on KLONE as well. Note that if you have aR
folder in your home directory then it will supercede this other path to install R packages. Your home directory is limited and can't be expanded so you will almost certainly fill it up. - Set the path to the Rstudio container in the script.
If you're successful a file named rstudio-server.job.177885
will pop up in your home directory. The suffix matches the job number you see. Check out its contents like below for instructions on how to connect to your Rstudio session.
In a new terminal prompt on your laptop copy and paste the other SSH command from the SLURM output. You will get your 2FA prompt and after logging in the system will appear to hang. It's fine, leave this window open and it is your connection to the Rstudio session running on KLONE. If you are disconnected and reconnect you can resume your Rstudio session.
To close out the Rstudio session it will either hit the job runtime limit and self-terminate or you can (preferably) manually close it out using the scancel
command provided with the specific jobID. If this file is accidentally deleted you can always see all your running jobs with a sacct -X
command on your active KLONE login prompt to get the jobID.
The credentials are randomly generated for each sbatch
job but once you log in you should see an environment similar to that as below. Both your KLONE home directory and gscratch folders will be mounted.