In this working group we learn about methods and ideas for confronting models with data, and about data science generally.  Anyone is encouraged and welcome to attend.

In fall 2021, we will meet online again, using Zoom, Wednesdays 11-noon (Mountain Time).

This was based on a poll http://whenisgood.net/gc33wpw. As of , 57 people had responded to the poll. That’s a great response. Unfortunately it means that our best time works for 35 of us and 22 have a conflict.

We can also consider a second weekly meeting slot for the many interested people who won’t be able to make this time. People are welcome to use these webpages on Confluence to organize additional interest groups.

Meeting schedule and topics:

Zoom link

Fall 2021

In the queue to place on the schedule:

Wish list for topics for fall 2021
Machine learning

Spring 2021

Below is R code to accompany beta distribution of allele frequencies in a population. This uses the closed form solution for P(p|x,n) = P(x|p,n)*P(p), where the product of a binomial and a beta is a new beta distribution.

(160+1)/(160+1 + 200-160+1)  # expectation with P(p)=beta(1,1)
(160+0.1)/(160+0.1 + 200-160+0.1) # expectation with P(p)=beta(0.1,0.1)

p<-seq(0, 1, 0.001)
plot(p, dbeta(p, shape1=160+1, shape2=40+1), type="l")

par(mfrow=c(2,1))
plot(p, dbeta(p, shape1=160+1, shape2=40+1), type="l", xlim=c(0.5, 1), col="red")
abline(v=qbeta(c(0.025, 0.975), shape1=161, shape2=41), col="red")
abline(v=qbeta(c(0.025, 0.975), shape1=16 + 1, shape2=4+1), col="blue")
lines(p, dbeta(p, shape1=16+1, shape2=4+1), col="blue")

par(mfrow=c(3,1))
plot(p, dbeta(p, 1, 1), type="l") # beta(1,1)
plot(p, dbeta(p, 0.1, 0.1), type="l")  # beta(0.1, 0.1)
plot(p, dbeta(p, 160+1, 200-160+1), type="l")
# tune two parameters of the rpart classification tree learner
require(rpart)

train.indices = sample(1:150, 150 * 2/3)
test.indices = setdiff(1:150, train.indices)

evalParams = function(...) {
    model = rpart(Species~., iris[train.indices, ], ...)
    preds = predict(model, newdata = iris[test.indices, -5], type = "class")
    return(list(pars = ..., acc = sum(preds == iris[test.indices, "Species"]) / length(preds)))
}

pars = expand.grid(minbucket = 1:20, minsplit = 1:20)
res = lapply(1:nrow(pars), function(i) do.call(evalParams, as.list(pars[i,])))
best = which.max(sapply(res, function(x) x$acc))
res[[best]]

# ...and now with mlr/MBO
# adapted from https://mlrmbo.mlr-org.com/articles/supplementary/machine_learning_with_mlrmbo.html
require(mlr)
require(mlrMBO)

# tune same parameters
par.set = makeParamSet(
  makeIntegerParam("minbucket", 1, 20),
  makeIntegerParam("minsplit", 1, 20)
)

ctrl = makeMBOControl()
tune.ctrl = makeTuneControlMBO(mbo.control = ctrl, budget = 10)
res = tuneParams(makeLearner("classif.rpart"), iris.task, cv3, par.set = par.set, control = tune.ctrl)

plot(cummin(getOptPathY(res$opt.path)), type = "l", ylab = "mmce", xlab = "iteration")

Ideas for spring 2021