Machine Learning 16

Why Modern LLMs Dropped Mean Centering (And Got Away With It) Feb 22, 2026
The Epsilon Trap: When Adam Stops Being Adam Jan 17, 2026
Entropic Instruction Following: Does Semantic Coherence Help LLMs Follow Instructions? Dec 2, 2025
Elements Of Mechanistic Interpretability: From Observation to Causation Oct 26, 2025
SFT vs. DPO (/ RLHF)- A Visual Guide to What Your LLM Actually Learns Aug 30, 2025
Do You Need A Matryoshka Model? Jun 22, 2025
Chunking Strategies for RAG - Breaking Down Documents for Better Retrieval May 31, 2025
Speculative Decoding - Making Language Models Generate Faster Without Losing Their Minds Apr 21, 2025
Mixture of Experts – Scaling Transformers Without Breaking the FLOPS Bank Mar 16, 2025
Doing MORE To consume LESS – Flash Attention V1 Feb 8, 2025
KV cache – The how not to waste your FLOPS starter Nov 21, 2024
Attention scores, Scaling and Softmax Nov 11, 2024
The Hidden Beauty of Sinusoidal Positional Encodings in Transformers Nov 1, 2024
Vanishing and exploding Gradients – A non-flat-earther's perspective. Oct 28, 2024
Teaching an AI to Drive a Taxi – A Friendly Guide to Q-Learning Oct 25, 2024
Recall and Precision – A Practical Case Against Memorization Oct 22, 2024