In this working group we learn about methods and ideas for confronting models with data, and about data science generally.  Anyone is encouraged and welcome to attend.

Meeting schedule and topics:

In the Spring 2023 semester, we will meet in person again, Fridays 1pm in EN 2101.

Spring 2023

Fall 2022

We’re going to try something different this semester and have a series of presentations from faculty and students across UW to see the breadth of data science (in the widest sense). The format is a short-ish talk/demo, followed by questions – we want this to be as interactive as possible! In the Fall 2022 semester, we met at 1 p.m. Fridays (Mountain Time) on Zoom.

If you’re interested in contributing a talk/demo/?, get in touch with Lars Kotthoff.

Spring 2022


We are inviting some speakers to join us this semester, with a few different potential topics including computer security, value systems associated with data sharing, data science in biotechnology companies, careers in data science, etc. If you have suggestions or requests, please add them here or in the wishlist below.

Wish list for topics

Please add and edit this list, including expanding the plan and volunteering to lead on one of these topics.


Fall 2021


Spring 2021

Below is R code to accompany beta distribution of allele frequencies in a population. This uses the closed form solution for P(p|x,n) = P(x|p,n)*P(p), where the product of a binomial and a beta is a new beta distribution.

(160+1)/(160+1 + 200-160+1)  # expectation with P(p)=beta(1,1)
(160+0.1)/(160+0.1 + 200-160+0.1) # expectation with P(p)=beta(0.1,0.1)

p<-seq(0, 1, 0.001)
plot(p, dbeta(p, shape1=160+1, shape2=40+1), type="l")

par(mfrow=c(2,1))
plot(p, dbeta(p, shape1=160+1, shape2=40+1), type="l", xlim=c(0.5, 1), col="red")
abline(v=qbeta(c(0.025, 0.975), shape1=161, shape2=41), col="red")
abline(v=qbeta(c(0.025, 0.975), shape1=16 + 1, shape2=4+1), col="blue")
lines(p, dbeta(p, shape1=16+1, shape2=4+1), col="blue")

par(mfrow=c(3,1))
plot(p, dbeta(p, 1, 1), type="l") # beta(1,1)
plot(p, dbeta(p, 0.1, 0.1), type="l")  # beta(0.1, 0.1)
plot(p, dbeta(p, 160+1, 200-160+1), type="l")
# tune two parameters of the rpart classification tree learner
require(rpart)

train.indices = sample(1:150, 150 * 2/3)
test.indices = setdiff(1:150, train.indices)

evalParams = function(...) {
    model = rpart(Species~., iris[train.indices, ], ...)
    preds = predict(model, newdata = iris[test.indices, -5], type = "class")
    return(list(pars = ..., acc = sum(preds == iris[test.indices, "Species"]) / length(preds)))
}

pars = expand.grid(minbucket = 1:20, minsplit = 1:20)
res = lapply(1:nrow(pars), function(i) do.call(evalParams, as.list(pars[i,])))
best = which.max(sapply(res, function(x) x$acc))
res[[best]]

# ...and now with mlr/MBO
# adapted from https://mlrmbo.mlr-org.com/articles/supplementary/machine_learning_with_mlrmbo.html
require(mlr)
require(mlrMBO)

# tune same parameters
par.set = makeParamSet(
  makeIntegerParam("minbucket", 1, 20),
  makeIntegerParam("minsplit", 1, 20)
)

ctrl = makeMBOControl()
tune.ctrl = makeTuneControlMBO(mbo.control = ctrl, budget = 10)
res = tuneParams(makeLearner("classif.rpart"), iris.task, cv3, par.set = par.set, control = tune.ctrl)

plot(cummin(getOptPathY(res$opt.path)), type = "l", ylab = "mmce", xlab = "iteration")

Ideas for spring 2021