Tags Activation Patching1 Adam1 Chunking1 computational efficiency1 Context Window1 contrastive learning1 convergence1 DPO1 efficiency2 Embeddings2 Evaluation1 Fine-Tuning1 flops1 gating1 gradients1 grammar1 guidance1 gymnasium1 hardware1 Indexing1 inference3 Instruction Following1 Interpretability1 interview1 LLM4 LLMs2 math3 Mechanistic Interpretability1 memory1 metrics1 MoE1 NLP5 normalization2 Optimization1 optimizations1 positional embedding1 Probing1 python4 PyTorch1 q-learning1 RAG1 Retrieval1 RLHF1 sampling1 scaling1 SFT1 speculative decoding1 structured generation1 training stability1 transformers7 Transformers2 tutorial7 vision1