Speculative Decoding - Making Language Models Generate Faster Without Losing Their Minds
Speculative decoding speeds up autoregressive text generation by combining a small draft model with a larger verifier model. This two-step dance slashes latency while preserving quality, an essenti...