To complement our work with Alex, I thought I’d share with you all a blog post and accompanying paper that I thought were pretty useful and interesting:
https://theoreticalecology.wordpress.com/2010/09/17/metropolis-hastings-mcmc-in-r/
The paper associated with the blog:
Statistical inference for stochastic simulation models – theory and application
Preview:
There were a few problems that were leading us to get poor estimates of sigma with mcmclm.R in our meeting on the 20th of June. The most important one was fundamental and took me a long time to find. That is, for the simulations that I put together, the prior on the sigmas turned out to be informative (because we misspecified the prior, by misusing scale and rate). As a consequence, we kept digging ourselves a hole, but estimating very small sigmas. Beyond that, I cleaned up the code to have it work with local variables, rather than variables from a function’s calling environment. Initially I fixed this by placing a uniform prior on the sigmas, but I now have the gamma prior specified correctly. I added some output to compare simulated parameters, those estimated with MCMC, and what we would have gotten with a linear model using lm().
Sorry about the trouble with this and my slowness to figure out what the problem was. The dgamma() prior worked with a previous example, because the data were simulated with a much smaller standard deviation. Thanks for your patience.
The updated version of mcmclm.R has the fixes and additions.
I asked a question on Twitter the other day (see full discussion here) about where people go to store their data and code for the long-term. It was a pretty neat response. I’ve summarized the resources that were provided. Hopefully this will be useful to some of you, and may even provide a jumping-off point for further discussion of what resources are available and how people organize their projects. Feel free to edit and reorganize as you see fit.
Standards for code and data sharing – projects and manuscripts as a compendium
Standards for sharing code
Written in the context of neuroscience, but seems generalizable.
https://www.nature.com/articles/nn.4550
Reproducible research in R (rrr) package
Part of the ropensci initiative. Isn’t really a package (per se), but a description with examples of making a “package” (usually with the devtools package) that serves as your outline for generating a compendium for a manuscript.
https://github.com/ropensci/rrrpkg
Packaging data analytical work reproducibly using R (and friends)
A paper providing some best practices related to producing a compendium.
https://peerj.com/preprints/3192/
Open Science Framework
This seems like the most comprehensive option for generating a compendia that can be shared with collaborators and plays nice with common online tools (e.g., GitHub, Google Drive, Mendeley).
More on using OSF with R:
Webinar with some additional information - https://www.youtube.com/watch?v=cnE3AcdeGVY
An R package for pushing and pulling data to/from OSF - https://github.com/CenterForOpenScience/osfr
Code (and data) storage sites (with elements of version control) - Public
Zenodo
Connects to GitHub. Can generate a DOI.
Figshare
GitHub
Git-based side for version control and sharing code.
GitLab
Git-based side for version control and sharing code. Seems like it has some more comprehensive offerings for their free plans, compared to GitHub.
University of Wyoming’s Library
Turns out that that the library can store data and code (on the PetaLibrary), as well as provide DOIs. You just need to contact them and they can set you up.
Code (and data) storage sites – Private
Amazon S3 buckets
I don’t really know anything about this, but seems like an interesting option for those using other AWS products:
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html
Code Ocean
Seems like a private version of the Open Science Framework, but don’t quote me on that.
Bitbucket
Part of the Atlassian universe of tools.
Open questions
How to deal with more than one language?
I like the idea of making a package/compendium, but if I use devtools, that is specific to R (as far as I can tell) and may make it difficult to use multiple languages (e.g., Python, C++ [might be able to get around this with reticulate and Rcpp, respectively – but what about other languages?]).
Is there a best option for open and reproducible science?
Seems to me like there is no “one” solution to reproducibility or long-term storage of code and data. But maybe there are some resources that are better than others?
What incentives are at UWyo (or other Universities and organizations), in terms of incentivizing open and reproducible science?
Are those incentives expected to change in the future?
Additional resources
A fascinating video on open science with some thoughts as to why wiki-style sites for scientific problems tend to die: https://www.youtube.com/watch?v=DnWocYKqvhw.
Here’s a cleaned up version of the linear model code we went through a bit on 13 June. I think it is worth revisiting on the 20th and modifying in interesting ways (continuous and more covariates). Can you have a look at this and then we’ll live-code some modifications in our meeting on the 20th.
I just saw this on the bottom of a UWIT email, but apparently UWyo employees (I think this includes most graduate students) have access to resources on Lynda.com, which provides a ton of on-line tutorials for data science related stuff, like CSS, web design, Git, Python, etc. I’ve generally heard good things about Lynda, but it is a subscription service. I’ve not been motivated enough to shell out the cash for a membership, but UWIT’s promotion seems like a good way to lower the financial barrier.
For more: https://uwyo.teamdynamix.com/TDClient/Requests/ServiceDet?ID=25023.
I sometimes like to upload the html files that I produce from .Rmd code. I’ve noticed, though, that when I upload the html to a web page, it often looks different from the “raw” html code that I can see if I open the html file locally - the equations look like garbage.
To improve the visual appeal of my code, and get it looking like I’d expected, I’ve found that I can add a script to the top of the knitted html file that calls MathJax, and thus formats my equations. Below is an example of the code I add to the header of my html - pay attention to the text between <script…> and </script>:
<head> <!-- Place the following script somewhere in the head of your html file. --> <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-MML-AM_CHTML"> </script> </head>
An example of properly formatted equations: https://theartoflittlequestions.blogspot.com/2018/06/penman-monteith-and-priestley-taylor.html
More about MathJax: http://docs.mathjax.org/en/latest/start.html
Cheers!
I’ve been trying out the conflicted
package and have found it to be pretty useful so far. It is a super simple package, with no actual functions (it “runs” in the background). Below is an example of some of its behavior when I had zoo
loaded.
CWx_15min$TIMESTAMP <- as.Date(CWx_15min$TIMESTAMP) Error: as.Date found in 2 packages. You must indicate which one you want with :: * zoo::as.Date * base::as.Date
This isn’t, perhaps, a super useful example, but does provide a sense of what happens when a function conflict is detected. And to be clear, I fixed the above error with the following:
CWx_15min$TIMESTAMP <- base::as.Date(CWx_15min$TIMESTAMP)
For more on the package, see the GitHub page: https://github.com/r-lib/conflicted.
Cheers!
I came across a MathJax script that you can add to the top of a .Rmd file, which then automatically generates numbers to the right of your display equation (i.e., $$eqn$$):
<script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "all", formatNumber: function (n) {return +n} } } }); </script>
The script is placed at the top of the Rmd file, below the YAML block. I’m not sure if this works for other documents, but this is how it looks in a webpage:
It also appears that the numbers don’t show up for in-line equations. And you can stop a number from showing up to the right of a display equation using \nonumber
in the LaTeX code. For example: $$E=mc^2\nonumber$$
would not have a number associated with it.
I know next to nothing about block chain, its versions (WTH is ethereum blockchain?), or how it is used with cryptocurrencies. But I have recently learned that it has been applied to virtual cats known as CryptoKitties. More here: https://www.cryptokitties.co/
I thought this was kind of fun, because there are elements of data science, as well as “genetics” when breading these virtual kitties. Here’s a bit more about their genetics: https://hackernoon.com/hacking-the-cryptokitties-genome-1cb3e7dddab3
Probably the best part of the CryptoKitties, though, is that they are hypoallergenic.