Differences

This shows you the differences between two versions of the page.

--- ml:history_of_ml [2022/07/11 11:51] – [Optimization (prior to 1980)] jmflanig
+++ ml:history_of_ml [2024/02/14 09:07] (current) – [Early History of ML (prior to 1980)] jmflanig
@@ Line 27: / Line 27: @@
     * [[https://dl.acm.org/doi/10.1147/rd.21.0002|Friedberg 1958 - A Learning Machine: Part I]] January 1958.
     * Andrews, A. M., "Learning Machines", Proceedings of the Symposium on the Mechanization of Thoughts Processes, H.M. Stationary Office, London, England, 1959. [[https://www.google.com/books/edition/Mechanisation_of_Thought_Processes/3Kk5yAEACAAJ?hl=en|vol1]] [[https://www.google.com/books/edition/Mechanisation_of_Thought_Processes/Ip8hAQAAIAAJ?hl=en|vol2]] Symposium held Nov 24-27, 1959. Cited by Carbonell p. 515.
-    * [[http://gpbib.cs.ucl.ac.uk/gp-html/DBLP_conf_ifip_KilburnGS59.html|Kilburn, Grimsdale & Sumner 1959 - Experiments in Machine Learning and Thinking]] {{papers:Kilburn-1959.pdf|pdf}} (June 15-20, 1959). Cited by Carbonell p. 530. Work started in Feb 1957 on the Mark I computer. Perhaps the first mention of the term "machine learning" in a paper that has experiments on a computer. First instance of **genetic programming**?
+    * [[http://gpbib.cs.ucl.ac.uk/gp-html/DBLP_conf_ifip_KilburnGS59.html|Kilburn, Grimsdale & Sumner 1959 - Experiments in Machine Learning and Thinking]] {{papers:Kilburn-1959.pdf|pdf}} (June 15-20, 1959). Cited by Carbonell p. 530. Work started in Feb 1957 on the Mark I computer. Perhaps the first mention of the term "**machine learning**" in a paper that has experiments on a computer. First instance of **genetic programming**?
     * [[http://www.cs.virginia.edu/~evans/greatworks/samuel1959.pdf|Samuel 1959 - Some Studies in Machine Learning Using the Game of Checkers]] (July, 1959).
     * [[https://www.sciencedirect.com/science/article/pii/S0019995859800140|Martens 1959 - Two notes on machine “Learning”]] {{papers:Martens-1959.pdf|pdf}}
@@ Line 98: / Line 98: @@
 ===== Misc Topics =====
+===== Other Methods (Prior to 2010) =====
+  * Support Vector Machines
+    * {{papers:multi-classsupportvectormachines.pdf|Weston & Watkins 1998 - Multi-class Support Vector Machines}} Tech Report CSD-TR-98-04 [[https://sites.google.com/site/jeisongutierrez/Multi-ClassSupportVectorMachines.pdf|link]]
 ===== Optimization (prior to 1980) =====
 Focusing on papers related to machine learning.
@@ Line 115: / Line 121: @@
     * [[https://link.springer.com/article/10.1007/BF01074757|Ermol'ev & Shor 1968 - Method of random walk for the two-stage problem of stochastic programming and its generalization]] [[https://link.springer.com/content/pdf/10.1007/BF01074757.pdf|pdf]]
     * [[https://link.springer.com/article/10.1007/BF01071485|Ermol'ev Tuniev 1968 - Direct methods of solving some stochastic programming problems]] [[https://link.springer.com/content/pdf/10.1007/BF01071485.pdf|pdf]] Cited by [[https://link.springer.com/article/10.1007/BF01071048|Guseva 1971]] Minimization of linear program with stochastic constraints.
-    * **[[https://link.springer.com/article/10.1007/BF01071091|Ermol'ev 1969 - On the method of generalized stochastic gradients and quasi-Féjer sequences]]** [[https://link.springer.com/content/pdf/10.1007/BF01071091.pdf|pdf]] I believe this may be the introduction of SSGD as we know it (stochastic sub-gradient descent). Calls sub-gradients "generalized gradient vectors." Assumes convexity, since assumes sub-gradient. Cited by [[https://link.springer.com/article/10.1007/BF01071541|Nurminskii 1974]] and [[https://link.springer.com/article/10.1007/BF01071048|Guseva 1971]].
+    * **[[https://link.springer.com/article/10.1007/BF01071091|Ermol'ev 1969 - On the method of generalized stochastic gradients and quasi-Féjer sequences]]** [[https://link.springer.com/content/pdf/10.1007/BF01071091.pdf|pdf]] I believe this may be the introduction of SSGD as we know it (stochastic sub-gradient descent). Calls sub-gradients "generalized gradient vectors," and calls the stochastic sub-gradient a "generalized stochastic gradient vector, or briefly, the stochastic quasi-gradient vector." Assumes convexity, since assumes sub-gradient. Cited by [[https://link.springer.com/article/10.1007/BF01071541|Nurminskii 1974]] and [[https://link.springer.com/article/10.1007/BF01071048|Guseva 1971]].
     * [[https://link.springer.com/article/10.1007/BF01071048|Guseva 1971 - Convergence rate of the method of generalized stochastic gradients]] [[https://link.springer.com/content/pdf/10.1007/BF01071048.pdf|pdf]] First proof of rate of convergence of an SGD-like algorithm. (Have to look closer, I'm not sure if it's actually SGD). Assumes convexity. Notes: g(x) is the subgradient of f(x), called reference function. It is called the support functional in [[https://www.researchgate.net/publication/265400868_A_General_Method_for_Solving_Extremum_Problems|Polyak 1967]] (citation [7]).
     * [[https://link.springer.com/article/10.1007/BF01071541|Nurminskii 1974 - Minimization of nondifferentiable functions in the presence of noise]] [[https://link.springer.com/content/pdf/10.1007/BF01071541.pdf|pdf]]