Blog from March, 2019

WEST is again looking to host interns for summer 2019. These are suitable for graduate or undergraduate students. The following is a sampling of the tasks interns could work on:

  1. Assistance with machine learning (painting, training and verifying)

    1. "thump" detection in wind turbine blades

    2. waterfowl detection in stationary video

    3. object detection from manned aerial flights

  2. R package maintenance

    1. spatially balanced sampling routines

    2. Distance sampling routines

    3. Capture recapture routines

  3. SQL and data husbandry (clean and store large data sets using code).

  4. Web-site modifications to facilitate client communications and paid webinars.

To apply, please complete the simple application (upload a resume). If you have questions, contact Former user (Deleted) in the EPSCoR office or Abby Hoffman (Abby was an intern in 2018).

If you’re not aware, Stan (https://mc-stan.org/) is a bayesian modeling platform that has become increasingly popular over the last few years. Stan uses the Hamiltonian Monte Carlo (HMC) algorithm (or some variant, like the No U-Turn Sampler) to sample from posterior distributions. This is unlike JAGS, which uses a Gibbs sampler. Compared to Gibbs samplers, HMC has proven to be fairly efficient (i.e., it takes less time to explore the posterior) for large, continuous models – though it is not a panacea for all distributions and models (but nothing is).

I want to use Stan, so I put in a software request for the software to be added to Teton. ARCC ended up installing RStan (for use with R), PyStan (for Python) and CmdStan (for direct access to the program, rather than using a wrapper library from R or Python). Below are some notes on using RStan with R.

Using RStan in R with the rstan, rstanarm, and brms packages

To use RStan with R, at a minimum, you need the rstan library (in addition to the RStan program, indicated above). The other two packages, rstanarm and brms aren’t necessary, but they are very nice R interfaces for running Bayesian regression (in the case of rstanarm), as well as some more advanced Bayesian models (brms), without having to write your own Stan code (in the form of a .stan file).

Logging off the login node with more memory and time

To install the rstan package, I found that I needed more memory than was allotted with the standard srun command, so I added a memory flag (with 10 GB of memory) when I logged off the login node:

srun --pty --account="ecoisolab" -t 0-02:00 --mem=10G /bin/bash

Note, I’m also adding a bit more time to my login, just in case I run into some installation problems.

Loading all the necessary dependencies

To use RStan with R v 3.5.2 on Teton, you need to load a couple of dependencies once you are off the login node (for more on using Teton: Teton: beginner's guide). All dependencies can be loaded in a single line:

ml swset/2018.05 gcc/7.3.0 r/3.5.2-py27 r-rstan/2.17.2-py27 r-rcpp r-rcppeigen

Then you can load R:

R

Ensuring your R library is correctly specified

This section will eventually become its own post, as this issue is bigger than using Stan.

I learned this the hard way, but you will likely need to set your personal library as the default place to install and upgrade packages, rather than using Teton’s library. This is necessary for a few reasons:

  1. Teton’s library is not writable. This means R will constantly as you if you want to install packages to your personal library, which is not terrible, but kind of annoying.

  2. Ensures your libraries are accessible between sessions. This should be the default, but the steps we’re going to add will make sure this is true.

  3. By default, R will try to use Teton’s library, which tends to be out of date. The result of not having up-to-date packages are conflicts, which will limit your ability to actually use the packages you’ve installed.

To clarify this last point, R prioritizes package loading based on the position of the library paths (yes, you can have multiple library paths), which you can see via the function .libPaths() from in R. This means that if you have a package that needs rlang 0.3.0, which you have installed in your library, but your library is below Teton’s in the library path, then Teton’s version of rlang(0.1.9) will be loaded, which will not allow your original package to be loaded (i.e., you will get an error).

Without any of the steps below, you could work around this situation by specifying library(“<package>”, lib.loc = “<location/of/your/library>”), but that could get pretty tedious pretty fast.

Okay, below are the steps I’m going to recommend:

  1. Assuming R is open, make sure your working directory is set to your home directory:

    1. getwd()

    2. In my case, this returns “/pfs/tsfs1/home/jmercer4”, which is what I want.

  2. Quit your R session: q()

  3. From the console, make and edit a .Renviron file. This will ensure your library (and not Teton’s) is at the top of R’s library search tree, when R is looking where to install and load packages.

    1. touch .Renviron – “touch” makes a file.

    2. vi .Renviron - “vi” opens of the .Renviron file in the vim text editor.

    3. Pres <ALT>+i to start editing.

    4. Add the following to the top of the file: R_LIBS=~/R/x86_64-pc-linux-gnu-library/3.5

    5. Quit: press <CTRL>+c then type :wq! to save your work (you can view the contents of the file, to make sure it saved properly, via cat .Renviron).

  4. Make and edit the .Rprofile. This is probably redundant, but ensures your library is used.

    1. touch .Rprofile

    2. vi .Rprofile

    3. <ALT>+i

    4. Add the following to the top of the file: .libPaths("/pfs/tsfs1/home/<username>/R/x86_64-pc-linux-gnu-library/3.5"). Don’t forget to change <username> to your actual username.

    5. <CTRL>+c, then :wq!

  5. Confirm the our edits took effect.

    1. Open R: R

    2. Check that both .Renviron and .Rprofile exist in your default directory (need to be in the default so R knows to load these files and not the similar files in the installation directory).

      1. file.exists("~/.Rprofile")

      2. file.exists("~/.Renviron")

    3. The top value of .libPaths() should be "/pfs/tsfs1/home/<username>/R/x86_64-pc-linux-gnu-library/3.5"

Done!

If you’d like to learn more about .Rprofile, .Renviron, and how R starts up more generally, I found this site to be pretty useful: https://csgillespie.github.io/efficientR/3-3-r-startup.html.

Installing rstan

The first time I tried to install rstan, I ran out of memory. If you run into the same problem see the section on “Troubleshooting package installation” below. Once I had enough memory, though, it worked fine, using the following command (from in R):

install.packages(“rstan”)

To run Stan models using the rstan interface, you will also need to install the Rcpp and inlinepackage. To do that, use the following command:

install.packages(c(“Rcpp”, “inline”), type=“source”)

The reason for this is because Stan requires that the Rcpp and rstan package be compiled by the same source when installed. I got some segmentation fault and out of memory errors when I didn’t install the two packages listed above.
NOTE: If you’ve already installed Rcpp before on Teton then you may not need to do it again. Just make sure that you have the gcc module loaded before starting your R session.

Once you run this line, R will ask you a couple of questions:

  1. Local installation? You definitely want to install your packages locally.

    1. Note: this may not be an issue if you followed the instructions from the section above (“Ensuring your R library is correctly specified“). In particular, setting up the .Rprofile file helped with this part of the process.

  2. Which CRAN mirror to use? Doesn’t really matter, but I’ve been using Berkeley (60).

You’ll then see a bunch or C++ outputs, indicating the library is compiling. Hopefully you don’t see any errors. Warnings seem to be fine.

Check the package has installed properly: library(rstan).

Installing rstanarm

This is the package the prompted me to write the “Ensuring your R library is correctly specified“ section, because that is when I learned about how R loads library’s and Teton’s default library. To reduce headaches, I’d suggest following the instructions in that section before continuing.

Also, you should probably update your packages before continuing: update.packages(ask = FALSE) (ask = FALSE keeps R from asking you to verify an update for each package that is out of date).

Run install.packages(“rstanarm”).

Confirm the library is loaded with library(rstanarm). You should get messages when it loads, but no errors. \*fingers crossed\*

Installing brms

Using the command install.packages(“brms”) I was able to install the package with no problems.

Troubleshooting package installation

The first time I tried to install rstan, I ran out of memory and I was kicked off my node (hence why I suggest adding more memory when calling srun). This can cause a problem when trying to re-install the library once logged on again, as there will be a lock file on the package directory (assuming you used the exact command I used).

The easiest way to “unlink” (i.e., unlock) the directory is to use R’s unlink function. In my specific case, this was:

unlink('/pfs/tsfs1/home/jmercer4/R/x86_64-pc-linux-gnu-library/3.5/00LOCK-rstan', recursive = TRUE)

If your library doesn’t install, look for a similar path and adapt the above line of code for your circumstance.

Once you’ve unlinked the directory, you can then re-run your call to install.packages. For a bit more on this topic: https://stackoverflow.com/questions/14382209/r-install-packages-returns-failed-to-create-lock-directory.

Moving forward

I have yet to actually run a Stan program on Teton. As I figure out how to do that, I’ll be sure to add another write-up, which will eventually be added to the Knowledge base.

If you have questions, improvements, or alternative strategies related to any of this, please leave them in the comments, so I can either address them here or in future documentation.