inference 3 Speculative Decoding - Making Language Models Generate Faster Without Losing Their Minds Apr 21, 2025 Guidance – Structuring your outputs is easier than you think Jan 4, 2025 KV cache – The how not to waste your FLOPS starter Nov 21, 2024