Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • – Organizational meeting

    • develop list of topics for the semester

    • pair and share – break out into smaller groups and share what you have been up with respect to data science, what you’ve been wanting to learn, and learn about other people in the group.

  • Lars Kotthoff will do a brief intro to mlr3pipelines

  • Dylan Perkins (ARCC/End User Support): Intro to shared computing at UW – Teton compute and storage resources (video is below, slides are in https://docs.google.com/presentation/d/145AVEOLHi22CPn0IwpLkVCNWJ7ZzGJq1fZDWt4IBE8I/edit?usp=sharing).

    • As a follow-up, please contact arcc-info@uwyo.edu with questions.

    • Please drop them a line if you are interested in participating in some testing of the new browser-based, graphical interface they are developing to the teton compute resources. They would appreciate several people testing the system.

      dylan_perkins_ARCC_15sept21.mp4
  • – hands on teton

    • Tasks you would like help with or demonstrated, from entry-level to advanced

      • we did one big group screen share to show:

        • how to configure ssh so that we can provide Teton password and validation (2FA) once per session

        • how to launch a SLURM interactive session to do some text wrangling with UNIX command line tools

        • demonstrate two ways of using text editors to write bash scripts that can be executed on teton; this include making a SLURM compliant script that we submitted withs batch

        • our script demo’d the use of /dev/shm, /lscratch, and /gscratch for simulation output and how to move data home and clean up after yourself at the end

      • We did not get to demo how to install R packages yourself. Instead, I started a Knowledge Base entry on this, which you are welcome to add to, edit, and improve.

  • – Short demonstration of LaTeX as implemented in Overleaf, followed by hands-on session for participants to sign-up for a free account, make documents with one or more templates, and ask questions of more experienced users. Alex Buerkle will do initial demo and will ask for helpers to assist others in hands-on session.

  • – an introduction to Bayesian modeling. Eryn McFarlane Topher Weiss-Lehman will discuss the basis of Bayesian thinking and talk about why one might want to use Bayesian methods.

  • – an introduction to computational Bayesian modeling (Andrew Siefert , Joshua Harrison. Why do computers help when doing Bayesian statistics? What does sampling and convergence mean? A high level overview of the different tools one can use to do Bayesian statistics. Finish with illustration of a model specified with R and Stan so that folks can get an idea of the modeling process.

    • Animation of samplers for Bayesian modeling: https://chi-feng.github.io/mcmc-demo/app.html

    • HERE is a git repo that has the code for the little mini-talk that Josh gave. We can keep posting Bayesian stuff here if we want. Feel free to do pull requests. If you have not used git, you can go to that link and view the different files and download them as you like.

  • – Breakout groups for hands-on and Q&A regarding Bayesian models for parameter estimation and inference. Request a group below. We’ll add one or two more at the beginning of the meeting.

    • Bayesian hierarchical modeling using brms (an R interface that can use traditional model specification as one would find in lme4 to create and run models in STAN)

    • More about specifying models in STAN

    • More about model specification itself, before one writes code to implement the model.

  • – Regular expressions tutorial and practice – Alex will lead and we will do hands-on work in break out groups

    • We will use R and https://regexr.com as sandboxes for learning

    • Regular expressions are a tool for parsing, modifying, and wrangling text and are present in many programming language, command-line tools, and text editors.

  • – Reproducible research with R, Git, LaTeX, etc. Jessi Rick & others welcome to join in

    • let Jessi know if you have workflows/ideas that you’d like to add to the discussion

  • – …

...

  • More with mlr3 - I’ve (J. Harrison) been using the software for simple ML tasks on manageable data, but am curious how to scale up to larger data and even if mlr3 is the right tool for that task.

  • Scaling up machine learning. Would like to do a hands-on project on some really large data (leaving this vague, so we can find something useful to the broader group).

  • A more introductory overview of machine learning for beginners. Maybe someone can give some background for part of a group meeting, and several of us can share how we are implementing machine learning (or would like to) in our own research? Joshua Harrison

  • Text mining intro - a basic primer of questions and tools and maybe a follow-up if anyone is interested in digging deeper. Maybe with a focus on how the methods have been applied to biological questions, rather than social science questions.

  • (on the schedulewe did this) Machine learning pipelines – I (LK) could do a brief intro to mlr3pipelines (https://mlr3pipelines.mlr-org.com).

  • Bayesian methods that don’t involve MCMC: variational inference, INLA, ABC

  • (we did this) A primer of what Bayesian stats are and why one might want to use them.

  • Bayesian machine learning/neural networks

  • (we did this) STAN (super basic, please?)

  • Overview of various statistical tests (when to use what for what purpose, etc.)

  • Practicing with loops and apply functions in R

  • Regular expressions tutorials/practice

  • Approximate Bayesian computation, see https://www.pnas.org/content/104/6/1760

  • Parallelization in R

  • Intermediate bash tips & tricks

  • Bayesian multilevel modeling (using the brms R package or other methods)

  • Nonlinear modelling (frequentist and Bayesian approaches)

  • Compositional data analysis, potentially as applied to microbiome data

  • Functional programming in R

  • replicable data cleaning and manipulation

  • Intro to replicability in data science (i.e. github, etc)

    • Best practices/how-to for sharing code, data, etc. (e.g., on GitHub)

  • Database management

  • Webscraping

  • Interactive plotting

  • Spatial analysis

  • A primer of OverLeaf, LaTex, r markdown and integration of these tools.

  • Math for machine learning/statistics: linear algebra, probability, calculus

  • Spatial data

  • Series of short (5-10 minute) presentations by someone on their research with Q&A afterwards

  • Effective story-telling and visualization for scientific research with applications in R

  • How to make a dashboard, with an example for hands-on learning.

  • General tools for API calls in Python or R.

  • Machine learning collaborative group (see HERE for more)

  • Causal inference (see this page if you’re interested in a reading group)

...