Chunking Strategies for RAG - Breaking Down Documents for Better Retrieval

A comprehensive guide to chunking strategies for Retrieval-Augmented Generation, from basic splitting to advanced semantic and agentic approaches.

May 31, 2025 Machine Learning, Deep-learning

Speculative Decoding - Making Language Models Generate Faster Without Losing Their Minds

Speculative decoding speeds up autoregressive text generation by combining a small draft model with a larger verifier model. This two-step dance slashes latency while preserving quality, an essenti...

Apr 21, 2025 Machine Learning, Deep-learning

Mixture of Experts – Scaling Transformers Without Breaking the FLOPS Bank

Mixture of Experts (MoE) lets you scale transformer models to billions of parameters without proportional compute costs. By selectively routing tokens through specialized experts, MoE achieves mass...

Mar 16, 2025 Machine Learning, Deep-learning

Doing MORE To consume LESS – Flash Attention V1

Flash Attention played a major role in making LLMs more accessible to consumers. This algorithm embodies how a set of what one might consider "trivial ideas" can come together and form a powerful s...

Feb 8, 2025 Machine Learning, Deep-learning

Guidance – Structuring your outputs is easier than you think

In this post, we explore how to simplify and optimize the output generation process in language models using guidance techniques. By pre-structuring inputs and restraining the output space, we can ...

Jan 4, 2025 Language Models, Optimization

A beginner's guide to Vision Language Models (VLMs)

The amount of visual data that we constantly ingest is massive, and our ability to function in an environment may greatly impove when we have access to this modality, thus being able to use it as a...

Dec 23, 2024 Deep-learning, Vision Language Models

Row of the contextualized representation needed for predicting the next token

KV cache – The how not to waste your FLOPS starter

You've probably heard of the Transformers by now, they're everywhere, so much so that new born babies are gonna start saying Transformers as their first word, this blog will explore an important co...

Nov 21, 2024 Machine Learning, Deep-learning

Mean and Variance of unscaled dot production with varying hidden dim

Attention scores, Scaling and Softmax

If you're familiar with the Attention Mechansim, then you know that before applying a softmax to the attention scores, we need to rescale them by a factor of $\frac{1}{\sqrt{D_k}}$ where $D_k$ is t...

Nov 11, 2024 Machine Learning, Deep-learning

The dot product of positional encoding with its transpose

The Hidden Beauty of Sinusoidal Positional Encodings in Transformers

In this blog we will shed the light into a crucial component of the Transformers architecture that hasn't been given the attention it deserves, and you'll also get to see some pretty vizualizations!

Nov 1, 2024 Machine Learning, Deep-learning

Small Gradients means, with sigmoid and no layer norm

Vanishing and exploding Gradients – A non-flat-earther's perspective.

In this post we will explore how exploding and vanishing gradients may happen, and how normalization and a change of activation functions can help us deal with these issues.

Oct 28, 2024 Machine Learning, Deep-learning