Deep-learning 8
- Speculative Decoding - Making Language Models Generate Faster Without Losing Their Minds
- Mixture of Experts – Scaling Transformers Without Breaking the FLOPS Bank
- Doing MORE To consume LESS – Flash Attention V1
- A beginner's guide to Vision Language Models (VLMs)
- KV cache – The how not to waste your FLOPS starter
- Attention scores, Scaling and Softmax
- The Hidden Beauty of Sinusoidal Positional Encodings in Transformers
- Vanishing and exploding Gradients – A non-flat-earther's perspective.