Previous meetings archive

The schedule for the current time period can be found here.

Fall 2020

The semester has started. Welcome everyone.

What shall we do for the working group during the pandemic? Are you up for another video meeting? I (Alex) am up for it and would be happy to participate or lead, but we’d need to poll for a different time. Drop your thoughts here, or in the Comments below.

Spring 2020

  • 5 February 2020 – Check-in meeting with updates from participants, and wish list for semester topics (see list below).

  • 12 February 2020 – @Alex Buerkle will lead some regular expression gymnastics. For these data wrangling exercises, we will work with regular expressions and | (pipe) and a few UNIX friends (we will save cut, sed, awk, uniq, and sort for another day).  Please bring a laptop.

  • 19 February 2020 –  @Dylan Perkins (End User Support Manager, ARCC) will give a presentation on "containers and software environments" and lead some hands-on with Conda Environments. 

  • 26 February 2020 – @Alex Buerkle will lead some additional regular expression gymnastics.  Please enter requests and suggestions on the linked page.

  • 4 March 2020 – Lars Kothoff will lead a discussion of compiled versus interpreted languages, how interpreted languages regularly rely on compiled code, what steps compilers use to optimize code execution, and what really happens when you compile a STAN model and run it from python or R.

    • For those with less of a computer science background that would like a bit more information about the abstractions required to code from binary (the 0s and 1s we discussed) to interpreted languages, I (Jason) found Crash Course's Computer Science playlist to be super interesting. It's 41 videos, but they do kind of an amazing job of linking our day-to-day interactions with computers to the fundamental elements that make computers actually work.

  • 11 March 2020 – Given that this is an optional meeting, in response to the anticipated spread of COVID19, we are going to suspend meetings of the working group for now.  We will see how things look after spring break and evaluate the rest of the semester's schedule. Meanwhile carry on with your other work and stay well.

  • 18 March 2020 – Spring break



  • future – Hierarchical statistical modeling of mixtures 1 (@Alex Buerkle will lead) – conceptual, graphical, and mathematical introduction to one or a few examples of mixture models

    • Mixture models are used when we want to assign cases (individuals, samples) to source populations (stone artifacts assigned to known or unobserved source quarries, migrant animals assigned to a birth place), or when we want to assign fractions of an observation to sources (fraction of a diet to different diet items: plants, animals, etc.; ancestry of an individual to different human populations).

  • future – Hierarchical statistical modeling of mixtures 2 (@Alex Buerkle or someone from his lab will lead) – coding and using a model in JAGS

  • future –  Hierarchical statistical modeling of mixtures 3 (@Alex Buerkle or someone from his lab will lead) – marginalizing the discrete parameters, in JAGS, then Stan

Future topics for spring 2020:
  • Statistical modeling topics

  • Continue reading Interpretable Machine Learning

  • Low-level computing information

    • What does compilation mean? Interpreted versus compiled code and languages.

    • Optimization 

  • Simple introduction to Stan focusing on regularization and mixing models.

  • More visualization with d3 and JavaScript

  • Teton usage

    • How to use?

  • Machine learning

    • Deep learning – Something that combines ML with inferential modeling

    • Limitations of deep learning

    • Is anyone besides @Alex Buerkle interested reading and discussing: Machine Learning: a Probabilistic Perspective by Kevin Patrick Murphy?  It is available for free as an ebook from UW library.  Here are the leading pages of the book, with table of contents.  Has anyone already read it?

  • Optimization

    • What is optimization?

      • How do ecologists vs other disciplines view optimization? 

    • What is an objective function and how does it relate to optimization?

      • How can we use different objective functions to penalize a problem for different outcomes of interest?

      • How does least squares or the maximum likelihood relate to the concept of an objective function?

    • Some optimization algorithms of interest:

      • Simulated annealing

      • Integer Linear Programming

      • Mixed Integer Nonlinear Programming

    • A list of different optimization problems: https://neos-guide.org/content/optimization-tree-alphabetical

    • A list of algorithms by optimization type: https://neos-guide.org/content/algorithms-by-type

Fall 2019

Spring 2019

  • 13 February – Organization and plan for the semester. Further discussion and work on improving our beginner's guide to use the Teton system.  We have only scratched the surface from our whiteboard from our 13 September meeting (see below).

  • 20 February – HPC: An introduction to using SLURM on Teton (UW's high performance computing system) w/ @Alex Buerkle.

  • 27 February – Reproducible research: Using R with Rmarkdown/git/LaTeX/Overleaf w/ @Jessi Rick + discussion of reproducibility. Please feel free to share your own reproducible workflows.

  • 6 March – Reproducible research: Using make for reproducibility w/ @Joshua Harrison + more discussion.

  • 13 March – Hands-on session to put reproducible research topics into action, or to get/give one-on-one help with other computing tasks (how to login to and navigate teton, how to submit SLURM jobs, how to set up an initial LaTeX document).

  • 20 March – Spring break

  • 27 March – Discuss A Quick Guide to Organizing Computational Biology Projects and different approaches to organizing the work of individuals, research groups, and larger collaboratives.

  • 3 April – another Hands-on session to put data science into action, to get/give one-on-one help with other computing tasks (how to login to and navigate teton, how to submit SLURM jobs, how to set up an initial LaTeX document).

  • 10 April – in person Q&A with a non-academic data scientist: Fawn Hornsby, Data Infrastructure Lead and Biometrician at WEST, Inc in Laramie (MS in statistics from UW).

  • 17 April – Q&A with a non-academic data scientist: Johan Grahnen, Senior Data Scientist at Microsoft Cloud and AI (PhD in Molecular and Cellular Life Sciences at UW, 2012).

  • 24 April – in person Q&A with a newly non-academic data scientist: Marie-Agnes Tellier, Senior Environmental Statistician at Trihydro Corporation in Laramie (PhD in Statistics at UW, 2018).

  • 1 May – @Joshua Harrison and @Vivaswat Shastry will lead a tutorial and Q&A about the use of git for version control.

  • 8 May – Semester wrap-up, with hands-on help and discussion of what we learned from our Q&As with non-academic data scientists.

Fall 2018

  • 6 Sept – Presentation by @Liz Mandeville and discussion: Machine learning

  • 13 Sept – working meeting in which we'll brainstorm any and all questions about research computing resources at UW, and work to document them and answer some of them in the Knowledge base.  

  • 20 Sept – Q&A with a non-academic data scientist: Joseph Murray, PhD in electrical engineering (mid 2000s, applied machine learning) who now works as a principal scientist for FICO (credit scores and services to financial clients.  Joe works on methods to detect financial fraud and money laundering). 

  • 27 Sept – working meeting related to how to's and questions about research computing resources at UW.  Some groups want to step through some demo's of different tasks, and others can translate some of our work from the 13 Sept from the whiteboard to the Knowledge Base.

  • 4 Oct – Q&A with a non-academic data scientist: Alison Appling of the USGS (Tuscon, Arizona); PhD in ecology

  • 11 Oct – another working meeting related to how to's and questions about research computing resources at UW

  • 18 Oct – either another working meeting about research computing, or a conversation with a non-academic data scientist (in this case with a bit more emphasis about their work and future directions in data science; waiting to get confirmation from outside scientist)

  • 25 Oct – short round-robin of topics of interest (20 min), followed by a working demo of how to submit 20 jobs to SLURM on teton using a script.

  • 1 Nov  – Q&A with a non-academic data scientist: Justin Abold-Labreche (U.S. Internal Revenue Service; Ph.D. Law / Criminology, Oxford University).  Published interview

  • 8 Nov – Working session: bring a data science problem to share, come to help solve problems, or come listen in and encourage.  

  • 15 Nov – Discuss a cool paper: Reverse-engineering ecological theory from data (pdf)

  • 29 Nov – Q&A with a non-academic data scientist: Jimena's friend’s (Sergio Ballester) software developing company that deploys data-gathering drones to improve industrial and agriculture practices (e.g.sugar/coffee/pineapple plantations) in Costa Rica (https://www.indigoia.com), among others.

  • 6 Dec – Watch and discuss a cool talk on Youtube: Mike Bostock – Design is a search problem.  We talked about a few different things afterwards, but here are some links to some tech that came up:

Presentations

Data sets of interest

Q&A with non-academic data scientists:

Wish list for future topics:

  • A couple of ideas:

Crash course in creating databases.
Crash course on interface with databases using R, Python, etc.
Static website design (bonus: using git).
Jekkyl + GitHub or GitLab or ...
Blogdown/Hugo + Netlify.
Dynamic website design with database support.
shiny (http://shiny.rstudio.com/gallery/) + hosting on a website.
Using shiny in a presentation.
Package or tool of the week (sharing some useful bit of code or an algorithm).
Rcpp.
dplyr.
Coding up different Bayesian samplers in R, Python, or C++.
Using Stan and JAGS on Teton.
Using make to streamline workflow (JGH; If nobody else is interested, then I will figure this out and write a post on it)
Issue tracking in the wiki. It would be nice to have a global list of things that could be improved upon. Not sure how management of those issues would work, though.

Previously:

On April 4th, let's discuss the following paper:  Deep Learning for Population Genetic Inference (Sheehan S, Song YS (2016) PLoS Comput Biol 12(3): e1004845. https://doi.org/10.1371/journal.pcbi.1004845). It provides details of methods, an application of deep learning in biology, and a comparison to Approximate Bayesian Computation.  A complementary paper is Supervised Machine Learning for Population Genetics: A New Paradigm, which looks interesting and is a contemporary review, but has fewer details.

  • No meeting on June 6th, we'll resume on June 13th with coding a linear model and MH from scratch.  

  • On May 30th, we'll discuss this short intro to Bayesian analysis, with its example of how to code Metropolis-Hastings updates in R.  Beyond this, in future meetings this summer, we'll build on this first step and code our own MCMC for parameters of a linear model in R, add a reversible jump feature for model selection in the MCMC, and learn how to use Rcpp(and probably RcppGSL or just GSL directly) and code an equivalent MCMC model in C.

  • On June 13th, we took an initial and somewhat disorganized look at a linear model using MCMC and by-hand Metropolis-Hastings. In this case, this is equivalent to a two-way ANOVA, where we're considering the effect of genotype and planting treatment on phenotype. After our meeting, Alex cleaned up the code some more (including a more interesting simulation; so now there are two) and here's a clearer version.  We will likely return to this next week and extend it in some way (e.g., simulating and modeling a continuous covariate). mcmclm.R. Once we're comfortable with this, we'll do a sparse linear model with reversible jump MCMC.

  • On June 20th, we kept working on mcmclm.R and found that we were doing a terrible job of estimating sigma with the simulated data.  The updated version of mcmclm.R fixes this by changing to a uniform prior for sigma.

  • For June 27th, we'll look at mcmclm.R again, modify it for continuous covariates and work toward reversible jump MCMC, to find evidence for and estimate sparse models. (Alex is going to be away)

  • Alex will be away July 4 and July 11th too.