Zhoutong’s Research Log

Deep dives into LLM architecture, post-training, and the bridge between research and production.

Why on-policy self-distillation is a natural complement to standard RL, and how SDPO, OPSD, and GLM-5 implement it.

Why standard SFT causes catastrophic forgetting, and how contextual on-policy self-distillation fixes it from first principles.

How PLE, STEM, Engram, and LongCat decouple knowledge storage from reasoning by moving static facts into embedding modules.