Machine Learning and Neural Networks

Machine learning & artificial neural networks

For cognitive context, see Computer Science and General culture in computer science & AI.

Also see the Student project on Statistical Physics applied to SLT and BL.

Machine learning

History

Timeline | Brief

Introductory

Deisenroth, Faisal, Ong, Mathematics for machine learning

Strang, Linear algebra and learning from data

MacKay, Information theory, inference and learning algorithms

James et al, An introduction to statistical learning

Kearns and Vazirani, An introduction to computational learning theory

Devroye et al, A probabilistic theory of pattern recognition

Intermediate

Mitchell, Machine learning

Wilmott, Machine learning

Shalev-Shwartz & Ben-David, Understanding machine learning

Zhang, Mathematical analysis of machine learning algorithms

Mohri, Rostamizadeh & Ameet Talwalkar, Foundations of machine learning

Murphy, Machine learning: a probabilistic perspective

Bishop, Pattern recognition and machine learning

Hastie, Tibschirani & Friedman, The elements of statistical learning

Vapnik, The nature of statistical learning theory

Kaplan, Notes on contemporary machine learning for physicists

Advanced

Mumford & Desolneux, Pattern theory

Grohs & Kutyniok (eds), Mathematical aspects of deep learning

Huang, Statistical mechanics of neural networks

Singular Learning Theory

Watanabe, Review and Prospect of Algebraic Research in Equivalent Framework between Statistical Mechanics and Machine Learning Theory

Watanabe, Mathematical Theory of Bayesian Statistics for Unknown Information Source

Watanabe, Mathematical theory of Bayesian statistics

Watanabe, Algebraic geometry and statistical learning theory

For more references, see the student project on Singular learning theory and statistical inference.

Artificial neural networks

History

Domingo, The master algorithm

Philipp Schmitt's, Blueprints for intelligence: a visual history of ANNs

Anderson & Rosenfeld, Talking nets: an oral history of neural networks

Wang & Raj, On the origin of deep learning

Schmidhuber, Deep learning in neural networks: an overview

Juergen Schmidhuber's homepage

Introductory

Classical

Rojas, Neural networks: a systematic introduction

Minski & Papert, Perceptrons: an introduction to computational geometry

Anthony & Bartlett, Neural network learning: theoretical foundations

Murphy, Probabilistic machine learning: an introduction

Deep learning

Prince, Understanding deep learning

Calin, Deep learning architectures: a mathematical approach

Bishop & Bishop, Deep learning: Foundations and concepts

Goodfellow, Bengio & Courville, Deep learning

Roberts, Yaida & Hanin, The principles of deep learning theory (free draft)

Intermediate

Haykin, Neural networks and learning machines

Aggarwal, Neural networks and deep learning

Advanced

Bronstein, Bruna, Cohen & Velickovic, Geometric deep learning

Attention mechanism

IBM think | wiki | GfG | medium | adaline

Niu et al, A review of the attention mechanism of deep learning

Soydaner, Attention mechanism in neural networks: where it comes and where it goes

Guo, Attention mechanisms in computer vision: a survey

Sun et al, Efficient attention mechanisms for large language models: a survey

Hernandez & Amigo, Attention mechanisms and their applications to complex systems

Ruan & Zhang, Towards understanding how attention mechanism works in deep learning

...

Transformers

IBM think | wiki | eventum

Vaswami et al, Attention is all you need

Phuong and Hutter, Formal algorithms for transformers

Lin et al, A survey of transformers

Fnu et al, Understanding the architecture of vision transformer and its variants: A review

Gumaan, Universal Approximation Theorem for a Single-Layer Transformer

Ravindran, Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers

Omidi et al, Memory-augmented transformers

GPTs:

...

MoE

IBM think | wiki

Zhang et al, Mixture of Experts in Large Language Models

LLMs & RLLMs

Zhang et al, Survey of Large Language Models in Extended Reality

Liu et al, A Comprehensive Evaluation on Quantization Techniques for Large Language Models

Zhang et al, A Survey of Reinforcement Learning for Large Reasoning Models

LLM alignment, explainability & interp

Pan et al, A Survey on Training-free Alignment of Large Language Models

Palikhe et al, Towards Transparent AI: A Survey on Explainable Language Models

...

Multimodal LLMs

IBM think | GfG | medium

Jaegle et al, Perceiver: general perception with iterated attention

Jaegle et al, PerceiverIO: A general architecture for structured inputs and outputs

Song et al, How to bridge the gap between modalities

Bae et al, Graph perceiver IO: a general architecture for graph-structured data

Carolan et al, A review of multimodal large language and vision models

Kibria et al: Decoding the multimodal maze

Chen et al: A survey of multimodal hallucination evaluation and deception

Xu et al, MARS2025 challenge

...

Embodied

Sanghai & Brown, Advances in transformers for robotics applications: a review

Shao et al, Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

Applications to science

Hu et al, A survey of scientific large language models

Wei et al, From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery

Ma et al, A Survey of Deep Learning for Geometry Problem Solving

...

Local SEENET-MTP schools

Trans-Carpathian Student Circle