Tags Activation Patching1 Adam2 Chunking1 computational efficiency1 Context Window1 contrastive learning1 convergence1 DPO1 efficiency2 Embeddings2 Evaluation1 Fine-Tuning1 flops1 gating1 Geometry1 gradients1 grammar1 guidance1 gymnasium1 hardware1 Indexing1 inference3 Instruction Following1 Interpretability1 interview1 LLM5 LLMs2 math3 Mechanistic Interpretability1 memory1 Meta-Learning1 metrics1 MoE1 NLP6 normalization2 Normalization1 Optimization2 optimizations1 positional embedding1 Probing1 python4 PyTorch2 q-learning1 RAG1 Retrieval1 RLHF1 sampling1 scaling1 SFT1 speculative decoding1 structured generation1 training stability1 transformers7 Transformers3 tutorial7 vision1