Archives
- 21 Nov KV cache – The how not to waste your FLOPS starter
- 11 Nov Attention scores, Scaling and Softmax
- 01 Nov The Hidden Beauty of Sinusoidal Positional Encodings in Transformers
- 28 Oct Vanishing and exploding Gradients – A non-flat-earther's perspective.
- 25 Oct Teaching an AI to Drive a Taxi – A Friendly Guide to Q-Learning
- 22 Oct Recall and Precision – A Practical Case Against Memorization