Machine Learning

Discussed at meeting on 06 September 2018; notes taken by Libby Megna


Machine Learning - What is it, and what can it offer biologists?

by Liz Mandeville

Learning Objectives:

  1. Define machine learning
  2. Understand the distinction between supervised and unsupervised machine learning
  3. Identify the limits of prediction and classification schemes

Definition: Machine learning enables computers to do tasks without being explicitly programmed for those tasks.

Neural networks are a type of machine learning, but it does not encompass everything

Machine learning includes:

  • linear regression
  • logistic regression
  • decision tree
  • support vector machines
  • naive Bayes
  • k nearest neighbor
  • k-means
  • Random Forest
  • Dimensionality reduction (e.g. PCA)

Machine learning algorithms are ubiquitous for commerical applicatioins–e.g. Netflix, Facebook, email spam filters

Machine learning has lots of potential in biology: medical imaging (classifying MRI images or classifying cells), wildlife management (classifying camera trap images; e.g. paper from UW Comp Sci dept (Clune? sp?)

Cool guide: A visual introduction to machine learning (super cool visualizations)

Model often doesn't work on test data as well as it did on training data

Supervised vs. unsupervised machine learning

  • Supervised ML requires labeled training data
  • Unsupervised ML look for patterns within the data without labeling categories

Classifying candy ("organisms") activity - discussion points:

  • Which procedure was supervised and which unsupervised? (Procedure A was supervised, Procedure B was unsupervised)
  • How does sample size or variation in the training set affect the outcome? If your training dataset is different from your test dataset, accuracy may decline)

Connections to biological problems

  • Would phylogeny be a supervised or unsupervised algorithm? Could make arguments for both, but Liz argues that it is unsupervised because you don't know the actual evolutionary relationships before you start.
  • What about using DNA barcoding to identify thousands of individuals to species? DNA barcoding for animal species relies on using a few hundred base pairs of mitochondrial DNA to identify a species.

Applying machine learning to biology

  • Defining your question well is essential
  • There is a potential trade-off between prediction and mechanistic understanding–could get good predictions but have no idea of underlying mechanisms
  • What are your goals, and what are your data like?

No shortage of easy-to-use ML tools in R

~~~

Random notes by Libby

I think this is a good read on trade-off between prediction power vs. simplicty/mechanistic understanding: Breiman 2001

Deep learning is just a neural net with more layers


From Liz: Here is the presentation PDF and the candy-based exercise. Note that these materials were prepared for a teaching demo I had to do for a job interview.