
Elements Of Mechanistic Interpretability: From Observation to Causation
We strip down mechanistic interpretability to three key experiments: watching a model 'think', finding where it stores concepts, and performing 'causal surgery' to change its 'thought process'




/paligemma.png)