This is a collection of things I have read that led to enjoyment or learning, and thus wanted to note down for future reference. Some are things that I have not read yet, but think will be enjoyable or valuable.
Websites:
- rsrch space | varepsilon
- Main Content | Jeremy Kun
- colah’s blog
- projects | Bones
- Sorta Insightful | Alex Irpan
- Deep Generative Models | CS 326 Notes
- Lil’Log | Lilian Weng
- An Opinionated Guide to ML Research
- Neel Nanda
Books:
- Linear Algebra Done Right
- The Algorithm Design Manual
- Understanding Deep Learning
- Deep Learning - Foundations and Concepts
- An Infinite Descent into Pure Mathematics
- A Programmer’s Introduction to Mathematics
- Poor Charlie’s Almanack
- The Napkin
- Nonlinear Dynamics and Chaos
- Alice’s Adventures in a Differentiable Wonderland — Volume I, A Tour of the Land
Technical Blogs:
- jsomers.net | I should have loved biology
- An Intuition for Attention
- GPT in 60 Lines of NumPy
- An Intuition for Attention
- The Annotated Transformer
- Neural network training makes beautiful fractals | Jascha’s blog
- I’m Switching Into AI Safety
- “How Do You Feel About Grad School?” and The 5 Year Update on Skipping Grad School (and Whether I’d Recommend It)
- Become a person who Actually Does Things
- An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
- Concrete Steps to Get Started in Transformer Mechanistic Interpretability
- Learning how to learn
- On Those Undefeatable Arguments for AI Doom
- How I got to OpenAI | agentydragon
- KL is All You Need
- Autodidax: JAX core from scratch
- An Opinionated Guide to ML Research
- Why Deep Learning Works Even Though It Shouldn’t | Ryan Moulton
- C++ Is An Absolute Blast
- Notes on China
- i sensed anxiety and frustration at NeurIPS’24 – Kyunghyun Cho
- ARC-AGI Without Pretraining | iliao2345
- attention is logarithmic, actually
Papers:
- Relational NN
- Minimum Description Length Principle
- Automatic Gradient Descent: Deep Learning without Hyperparameters
- SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
- Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
- ACT
- VQ-VAE
- Automatic Gradient Descent: Deep Learning without Hyperparameters
- Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations
- [Partially Stochastic Infinitely Deep Bayesian Neural Networks
Writing I enjoyed:
- The Colors Of Her Coat - by Scott Alexander
- escaping flatland: career advice for CS undergrads
- Do Ten Times as Much
Other:
Compilations: