/
Machine Learning

Machine Learning

Discussed at meeting on 06 September 2018; notes taken by Libby Megna


Machine Learning - What is it, and what can it offer biologists?

by Liz Mandeville

Learning Objectives:

  1. Define machine learning
  2. Understand the distinction between supervised and unsupervised machine learning
  3. Identify the limits of prediction and classification schemes

Definition: Machine learning enables computers to do tasks without being explicitly programmed for those tasks.

Neural networks are a type of machine learning, but it does not encompass everything

Machine learning includes:

  • linear regression
  • logistic regression
  • decision tree
  • support vector machines
  • naive Bayes
  • k nearest neighbor
  • k-means
  • Random Forest
  • Dimensionality reduction (e.g. PCA)

Machine learning algorithms are ubiquitous for commerical applicatioins–e.g. Netflix, Facebook, email spam filters

Machine learning has lots of potential in biology: medical imaging (classifying MRI images or classifying cells), wildlife management (classifying camera trap images; e.g. paper from UW Comp Sci dept (Clune? sp?)

Cool guide: A visual introduction to machine learning (super cool visualizations)

Model often doesn't work on test data as well as it did on training data

Supervised vs. unsupervised machine learning

  • Supervised ML requires labeled training data
  • Unsupervised ML look for patterns within the data without labeling categories

Classifying candy ("organisms") activity - discussion points:

  • Which procedure was supervised and which unsupervised? (Procedure A was supervised, Procedure B was unsupervised)
  • How does sample size or variation in the training set affect the outcome? If your training dataset is different from your test dataset, accuracy may decline)

Connections to biological problems

  • Would phylogeny be a supervised or unsupervised algorithm? Could make arguments for both, but Liz argues that it is unsupervised because you don't know the actual evolutionary relationships before you start.
  • What about using DNA barcoding to identify thousands of individuals to species? DNA barcoding for animal species relies on using a few hundred base pairs of mitochondrial DNA to identify a species.

Applying machine learning to biology

  • Defining your question well is essential
  • There is a potential trade-off between prediction and mechanistic understanding–could get good predictions but have no idea of underlying mechanisms
  • What are your goals, and what are your data like?

No shortage of easy-to-use ML tools in R

~~~

Random notes by Libby

I think this is a good read on trade-off between prediction power vs. simplicty/mechanistic understanding: Breiman 2001

Deep learning is just a neural net with more layers


From Liz: Here is the presentation PDF and the candy-based exercise. Note that these materials were prepared for a teaching demo I had to do for a job interview.


Related content

Confronting models with data
Confronting models with data
More like this
Machine learning collaborative group
Machine learning collaborative group
Read with this
Project description and postdoctoral researcher advertising
Project description and postdoctoral researcher advertising
More like this
Short courses and workshops
Short courses and workshops
More like this
Modelscape Consortium Home
Modelscape Consortium Home
More like this