Machine learning & artificial neural networks
For cognitive context, see Computer Science and General culture in computer science & AI.
Machine learning
History
Introductory
Deisenroth, Faisal, Ong, Mathematics for machine learning
Strang, Linear algebra and learning from data
MacKay, Information theory, inference and learning algorithms
James et al, An introduction to statistical learning
Kearns and Vazirani, An introduction to computational learning theory
Devroye et al, A probabilistic theory of pattern recognition
Intermediate
Mitchell, Machine learning
Wilmott, Machine learning
Shalev-Shwartz & Ben-David, Understanding machine learning
Zhang, Mathematical analysis of machine learning algorithms
Mohri, Rostamizadeh & Ameet Talwalkar, Foundations of machine learning
Murphy, Machine learning: a probabilistic perspective
Bishop, Pattern recognition and machine learning
Hastie, Tibschirani & Friedman, The elements of statistical learning
Vapnik, The nature of statistical learning theory
Kaplan, Notes on contemporary machine learning for physicists
Advanced
Mumford & Desolneux, Pattern theory
Grohs & Kutyniok (eds), Mathematical aspects of deep learning
Huang, Statistical mechanics of neural networks
Singular Learning Theory
Watanabe, Mathematical Theory of Bayesian Statistics for Unknown Information Source
Watanabe, Mathematical theory of Bayesian statistics
Watanabe, Algebraic geometry and statistical learning theory
- For more references, see the student project on Singular learning theory and statistical inference.
Artificial neural networks
History
Domingo, The master algorithm
Philipp Schmitt's, Blueprints for intelligence: a visual history of ANNs
Anderson & Rosenfeld, Talking nets: an oral history of neural networks
Wang & Raj, On the origin of deep learning
Schmidhuber, Deep learning in neural networks: an overview
Juergen Schmidhuber's homepage
Introductory
Classical
Rojas, Neural networks: a systematic introduction
Minski & Papert, Perceptrons: an introduction to computational geometry
Anthony & Bartlett, Neural network learning: theoretical foundations
Murphy, Probabilistic machine learning: an introduction
Deep learning
Prince, Understanding deep learning
Calin, Deep learning architectures: a mathematical approach
Bishop & Bishop, Deep learning: Foundations and concepts
Goodfellow, Bengio & Courville, Deep learning
Roberts, Yaida & Hanin, The principles of deep learning theory (free draft)
Intermediate
Haykin, Neural networks and learning machines
Aggarwal, Neural networks and deep learning
Advanced
Bronstein, Bruna, Cohen & Velickovic, Geometric deep learning
Attention mechanism
IBM think | wiki | GfG | medium | adaline
Niu et al, A review of the attention mechanism of deep learning
Soydaner, Attention mechanism in neural networks: where it comes and where it goes
Guo, Attention mechanisms in computer vision: a survey
Sun et al, Efficient attention mechanisms for large language models: a survey
Hernandez & Amigo, Attention mechanisms and their applications to complex systems
Ruan & Zhang, Towards understanding how attention mechanism works in deep learning
...
Transformers
Vaswami et al, Attention is all you need
Phuong and Hutter, Formal algorithms for transformers
Lin et al, A survey of transformers
Fnu et al, Understanding the architecture of vision transformer and its variants: A review
Gumaan, Universal Approximation Theorem for a Single-Layer Transformer
Omidi et al, Memory-augmented transformers
GPTs:
...
MoE
Zhang et al, Mixture of Experts in Large Language Models
LLMs & RLLMs
Zhang et al, Survey of Large Language Models in Extended Reality
Liu et al, A Comprehensive Evaluation on Quantization Techniques for Large Language Models
Zhang et al, A Survey of Reinforcement Learning for Large Reasoning Models
LLM alignment, explainability & interp
Pan et al, A Survey on Training-free Alignment of Large Language Models
Palikhe et al, Towards Transparent AI: A Survey on Explainable Language Models
...
Multimodal LLMs
Jaegle et al, Perceiver: general perception with iterated attention
Jaegle et al, PerceiverIO: A general architecture for structured inputs and outputs
Song et al, How to bridge the gap between modalities
Bae et al, Graph perceiver IO: a general architecture for graph-structured data
Carolan et al, A review of multimodal large language and vision models
Kibria et al: Decoding the multimodal maze
Chen et al: A survey of multimodal hallucination evaluation and deception
Xu et al, MARS2025 challenge
...
Embodied
Sanghai & Brown, Advances in transformers for robotics applications: a review
Shao et al, Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
Applications to science
Hu et al, A survey of scientific large language models
Wei et al, From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery
Ma et al, A Survey of Deep Learning for Geometry Problem Solving
...