Blog from June, 2018 - UW Data Science Center

"A simple Metropolis-Hastings MCMC in R"

Jason Mercer posted on Jun 25, 2018

To complement our work with Alex, I thought I’d share with you all a blog post and accompanying paper that I thought were pretty useful and interesting:

https://theoreticalecology.wordpress.com/2010/09/17/metropolis-hastings-mcmc-in-r/

The paper associated with the blog:

Statistical inference for stochastic simulation models – theory and application

Preview:

mcmc

Took a while, but I fixed mcmclm.R

Alex Buerkle posted on Jun 21, 2018

There were a few problems that were leading us to get poor estimates of sigma with mcmclm.R in our meeting on the 20th of June. The most important one was fundamental and took me a long time to find. That is, for the simulations that I put together, the prior on the sigmas turned out to be informative (because we misspecified the prior, by misusing scale and rate). As a consequence, we kept digging ourselves a hole, but estimating very small sigmas. Beyond that, I cleaned up the code to have it work with local variables, rather than variables from a function’s calling environment. Initially I fixed this by placing a uniform prior on the sigmas, but I now have the gamma prior specified correctly. I added some output to compare simulated parameters, those estimated with MCMC, and what we would have gotten with a linear model using lm().

Sorry about the trouble with this and my slowness to figure out what the problem was. The dgamma() prior worked with a previous example, because the data were simulated with a much smaller standard deviation. Thanks for your patience.

The updated version of mcmclm.R has the fixes and additions.

Some resources on reproducibility and generating compendia for manuscripts and projects

Jason Mercer posted on Jun 15, 2018

I asked a question on Twitter the other day (see full discussion here) about where people go to store their data and code for the long-term. It was a pretty neat response. I’ve summarized the resources that were provided. Hopefully this will be useful to some of you, and may even provide a jumping-off point for further discussion of what resources are available and how people organize their projects. Feel free to edit and reorganize as you see fit.

Standards for code and data sharing – projects and manuscripts as a compendium

Standards for sharing code

Written in the context of neuroscience, but seems generalizable.

https://www.nature.com/articles/nn.4550

Reproducible research in R (rrr) package

Part of the ropensci initiative. Isn’t really a package (per se), but a description with examples of making a “package” (usually with the devtools package) that serves as your outline for generating a compendium for a manuscript.

https://github.com/ropensci/rrrpkg

Packaging data analytical work reproducibly using R (and friends)

A paper providing some best practices related to producing a compendium.

https://peerj.com/preprints/3192/

Open Science Framework

This seems like the most comprehensive option for generating a compendia that can be shared with collaborators and plays nice with common online tools (e.g., GitHub, Google Drive, Mendeley).

https://osf.io/

Code (and data) storage sites (with elements of version control) - Public

Zenodo

Connects to GitHub. Can generate a DOI.

https://zenodo.org/

Figshare

https://figshare.com/

GitHub

Git-based side for version control and sharing code.

https://github.com/

GitLab

Git-based side for version control and sharing code. Seems like it has some more comprehensive offerings for their free plans, compared to GitHub.

https://about.gitlab.com/

University of Wyoming’s Library

Turns out that that the library can store data and code (on the PetaLibrary), as well as provide DOIs. You just need to contact them and they can set you up.

Code (and data) storage sites – Private

Amazon S3 buckets

I don’t really know anything about this, but seems like an interesting option for those using other AWS products:

https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html

Code Ocean

Seems like a private version of the Open Science Framework, but don’t quote me on that.

https://codeocean.com/

Bitbucket

Part of the Atlassian universe of tools.

https://bitbucket.org

Open questions

How to deal with more than one language?

I like the idea of making a package/compendium, but if I use devtools, that is specific to R (as far as I can tell) and may make it difficult to use multiple languages (e.g., Python, C++ [might be able to get around this with reticulate and Rcpp, respectively – but what about other languages?]).

Is there a best option for open and reproducible science?

Seems to me like there is no “one” solution to reproducibility or long-term storage of code and data. But maybe there are some resources that are better than others?

What incentives are at UWyo (or other Universities and organizations), in terms of incentivizing open and reproducible science?

Are those incentives expected to change in the future?

Additional resources

A fascinating video on open science with some thoughts as to why wiki-style sites for scientific problems tend to die: https://www.youtube.com/watch?v=DnWocYKqvhw.

MCMC for linear model (2-way ANOVA analog)

Alex Buerkle posted on Jun 13, 2018

Here’s a cleaned up version of the linear model code we went through a bit on 13 June. I think it is worth revisiting on the 20th and modifying in interesting ways (continuous and more covariates). Can you have a look at this and then we’ll live-code some modifications in our meeting on the 20th.

Lynda.com access to UWyo employees

Jason Mercer posted on Jun 12, 2018

I just saw this on the bottom of a UWIT email, but apparently UWyo employees (I think this includes most graduate students) have access to resources on Lynda.com, which provides a ton of on-line tutorials for data science related stuff, like CSS, web design, Git, Python, etc. I’ve generally heard good things about Lynda, but it is a subscription service. I’ve not been motivated enough to shell out the cash for a membership, but UWIT’s promotion seems like a good way to lower the financial barrier.

For more: https://uwyo.teamdynamix.com/TDClient/Requests/ServiceDet?ID=25023.

Ensuring Rmd equations render in web pages

Jason Mercer posted on Jun 10, 2018

I sometimes like to upload the html files that I produce from .Rmd code. I’ve noticed, though, that when I upload the html to a web page, it often looks different from the “raw” html code that I can see if I open the html file locally - the equations look like garbage.

To improve the visual appeal of my code, and get it looking like I’d expected, I’ve found that I can add a script to the top of the knitted html file that calls MathJax, and thus formats my equations. Below is an example of the code I add to the header of my html - pay attention to the text between <script…> and </script>:

<head>
<!-- Place the following script somewhere in the head of your html file. -->
<script type="text/javascript" async
  src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
</head>

An example of properly formatted equations: https://theartoflittlequestions.blogspot.com/2018/06/penman-monteith-and-priestley-taylor.html

More about MathJax: http://docs.mathjax.org/en/latest/start.html

Cheers!

5 Comments ·

youshouldknow

Using the `conflicted` package

Jason Mercer posted on Jun 09, 2018

I’ve been trying out the conflicted package and have found it to be pretty useful so far. It is a super simple package, with no actual functions (it “runs” in the background). Below is an example of some of its behavior when I had zoo loaded.

CWx_15min$TIMESTAMP <- as.Date(CWx_15min$TIMESTAMP)
Error: as.Date found in 2 packages. You must indicate which one you want with ::
 * zoo::as.Date
 * base::as.Date

This isn’t, perhaps, a super useful example, but does provide a sense of what happens when a function conflict is detected. And to be clear, I fixed the above error with the following:

CWx_15min$TIMESTAMP <- base::as.Date(CWx_15min$TIMESTAMP)

For more on the package, see the GitHub page: https://github.com/r-lib/conflicted.

Cheers!

youshouldknow

Generating equation numbers in RMarkdown files

Jason Mercer posted on Jun 08, 2018

I came across a MathJax script that you can add to the top of a .Rmd file, which then automatically generates numbers to the right of your display equation (i.e., $$eqn$$):

<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  TeX: { 
      equationNumbers: { 
            autoNumber: "all",
            formatNumber: function (n) {return +n}
      } 
  }
});
</script>

The script is placed at the top of the Rmd file, below the YAML block. I’m not sure if this works for other documents, but this is how it looks in a webpage:

It also appears that the numbers don’t show up for in-line equations. And you can stop a number from showing up to the right of a display equation using \nonumber in the LaTeX code. For example: $$E=mc^2\nonumber$$ would not have a number associated with it.

Source: https://stackoverflow.com/questions/35026405/auto-number-equations-in-r-markdown-documents-in-rstudio

youshouldknow

CryptoKitties

Jason Mercer posted on Jun 07, 2018

I know next to nothing about block chain, its versions (WTH is ethereum blockchain?), or how it is used with cryptocurrencies. But I have recently learned that it has been applied to virtual cats known as CryptoKitties. More here: https://www.cryptokitties.co/

I thought this was kind of fun, because there are elements of data science, as well as “genetics” when breading these virtual kitties. Here’s a bit more about their genetics: https://hackernoon.com/hacking-the-cryptokitties-genome-1cb3e7dddab3

Probably the best part of the CryptoKitties, though, is that they are hypoallergenic.

1 Comment ·

silly