User Tools

Site Tools


ml:ml_overview

This is an old revision of the document!


Machine Learning Overview

This page is a concise overview of topics in machine learning, with links to readings and other learning materials. Roughly, these topics are the union of topics covered in various ML books and courses.

This is a resource to help you get up to speed in various topics if you're trying to learn ML on your own or broaden your ML knowledge.

Books

Courses

Overview of Topics

This overview contains links to particular pages in textbooks, lectures, blog posts, and videos covering the topic, listed easiest to hardest to understand, with videos listed at the end. In other words, for each topic, introductory material is listed first with more advanced material afterwards, although you may find more advanced material easier to understand in some cases.

The blog posts and some of the videos are introductory and give the overall gist of the method, but may contain mathematical or conceptual errors. Videos that are lectures should be fine.

  • Introduction to Machine Learning MLBook p. 1-15 PML p. 1-28
    • “Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.” Arthur Samuel 1959
  • Basic Machine Learning Concepts
    • Inductive Bias MLBook p. 39-45
    • Overfitting/Underfitting
    • Approximation error vs estimation error aka Bias-Variance Tradeoff CIML p. 71-72 Bartlett notes Sometimes also called the bias-variance tradeoff.
    • Features
    • Hyperparameters
    • Train/dev/test split
    • Don't look at the test data CIML p. 25 “Do not look at your test data. Even once. Even a tiny peek. Once you do that, it is not test data any more. Yes, perhaps your algorithm hasn’t seen it. But you have. And you are likely a better learner than your learning algorithm.”
  • Additional ML Topics
    • Generative vs Discriminative Classifiers
    • Bayesian statistics
    • MLE vs MAP estimation (and examples of MAP in machine learning) Blog
  • Classification
  • Regression
  • Practicalities
    • *Hyperparameters and Model Selection
      • Train/dev/test split Video
        • The most practical and principled way to select the model and hyperparameters is on a development set.
    • Feature Selection and Feature Engineering CIML p. 55-62
    • Regularization
      • Early-stopping
      • L2 regularization
      • L1 regularization
      • Pruning (for decision trees)
    • Evaluation
      • Accuracy
      • Precision, Recall, F1: Macro vs Micro averaging
      • *Area Under the Curve (AUC) Blog
      • Tests of Significance CIML p. 67-69
    • Data Resampling Methods
      • k-fold Cross Validation Be careful using this method on NLP datasets! Due to the non-IID nature of NLP datasets, it is generally not recommended to use k-fold cross validation (can over-estimate performance). Better to use a thoughtfully-chosen train/dev/test split.
      • *Bootstrap Resampling
      • Jacknife
    • Debugging ML CIML p. 69-71 Blog
  • Deep Learning
  • *Reinforcement Learning
  • Graphical Models
    • Bayesian Networks Bishop p. 360
      • Hidden Markov Models (HMMs)
    • Undirected Graphical Models (MRFs and CRFs) Bishop p. 383
      • Linear-chain Conditional Random Fields
    • Factor Graphs Bishop p. 399
    • Inference
      • Variable Elimination
      • Belief Propagation (Sum-Product and Max-Product Algorithms) Bishop p. 402-415
      • Junction Tree Algorithm
      • Loopy Belief Propagation Bishop p. 417-418
      • Variational Inference
    • Sampling Methods
  • Combining Models
    • Ensembling
    • Mixture of Experts
    • *Boosting
    • Bayesian Model Averaging
  • Unsupervised Methods
  • Structured Prediction
    • Structured Perceptron
    • Structured SVM
    • Conditional Random Fields (CRFs)
  • Probability and Statistics Background
    • Terminology
      • Probability Distribution (referred to as just a “Distribution”)
      • To sample from a probability distribution
      • Parameters
      • Random Variable
      • Independent
      • Independent and Identically Distributed (IID)
      • Joint Distribution
      • Marginal Distribution (referred to as just a Marginal). Also to marginalize
        • To compute a marginal, you marginalize (sum) over the other random variables
    • Probability Distributions: Uniform, Normal, Poisson, Binomial, etc
    • *Bias-Variance Decomposition Lecture, Notes This is a statistics term, used when analyzing mean squared error in regression or density estimation, for example. In machine learning, it's more properly called approximation error (≈ bias) and estimation error (≈ variance) because you can't compute the bias (E[y]) or variance E[(y - E[y])^2] for non-numeric outputs like classes in multi-class classification. However, these terms are often applied to ML somewhat loosely.
    • Density Estimation
      • Histograms
      • Kernel Density Estimators
    • Gaussian Processes
  • Theory
    • Concept Learning
    • Hypothesis Space
    • Inductive Bias MLBook p. 39-45
    • Bias-Variance Tradeoff
    • VC dimension
    • NP hardness of Learning
    • PAC Learning Theory
    • PAC-Bayesian Learning Theory
  • Information Theory Murphy p. 56-61
    • Entropy
    • Cross-entropy
    • Mutual Information
    • KL-Divergence
  • Software
    • R
    • scikit-learn
    • TensorFlow
    • PyTorch
    • NLTK
    • SpaCy
    • OpenCV
  • ML Glossary (glossary of slightly more advanced terms)
ml/ml_overview.1652865382.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki