The dot product of positional encoding with its transpose

The Hidden Beauty of Sinusoidal Positional Encodings in Transformers

In this blog we will shed the light into a crucial component of the Transformers architecture that hasn't been given the attention it deserves, and you'll also get to see some pretty vizualizations!

Nov 1, 2024 Machine Learning, Deep-learning

Chunking Strategies for RAG - Breaking Down Documents for Better Retrieval

A comprehensive guide to chunking strategies for Retrieval-Augmented Generation, from basic splitting to advanced semantic and agentic approaches.

May 31, 2025 Machine Learning, Deep-learning

Speculative Decoding - Making Language Models Generate Faster Without Losing Their Minds

Speculative decoding speeds up autoregressive text generation by combining a small draft model with a larger verifier model. This two-step dance slashes latency while preserving quality, an essenti...

Apr 21, 2025 Machine Learning, Deep-learning

Mixture of Experts – Scaling Transformers Without Breaking the FLOPS Bank

Mixture of Experts (MoE) lets you scale transformer models to billions of parameters without proportional compute costs. By selectively routing tokens through specialized experts, MoE achieves mass...

Mar 16, 2025 Machine Learning, Deep-learning

Guidance – Structuring your outputs is easier than you think

In this post, we explore how to simplify and optimize the output generation process in language models using guidance techniques. By pre-structuring inputs and restraining the output space, we can ...

Jan 4, 2025 Language Models, Optimization

A beginner's guide to Vision Language Models (VLMs)

The amount of visual data that we constantly ingest is massive, and our ability to function in an environment may greatly impove when we have access to this modality, thus being able to use it as a...

Dec 23, 2024 Deep-learning, Vision Language Models

Small Gradients means, with sigmoid and no layer norm

Vanishing and exploding Gradients – A non-flat-earther's perspective.

In this post we will explore how exploding and vanishing gradients may happen, and how normalization and a change of activation functions can help us deal with these issues.

Oct 28, 2024 Machine Learning, Deep-learning

Teaching an AI to Drive a Taxi – A Friendly Guide to Q-Learning

Let's kick start your journey into reinforcement learning with a cool taxi-driving simulation! You'll get hands-on with Q-learning, starting from random exploration all the way to nailing it. Plus,...

Oct 25, 2024 Machine Learning, Reinforcement-learning

Recall and Precision – A Practical Case Against Memorization

A dissection of a system built for failure through the lens of an interview

Oct 22, 2024 Machine Learning, Evaluation