writing
Interactive essays, research write-ups, and the occasional exploration.
Where Hyperbolic Geometry Helps (and Where It Doesn’t)
Summer 2026 · Representation learning · Hyperbolic embeddings
A write-up of our prototype-embedding study on WikiArt. Across 150 runs and three reference trees, hyperbolic prototypes preserve local hierarchy in a way that matched Euclidean prototypes don’t, while classification and global tree fidelity go the other way. A local-vs-global trade-off.
Steering Mixture-of-Experts Models by Choosing the Route
Spring 2026 · Alignment · MoE routing
A continuation of activation steering: in sparse MoE models, behavior can be steered by biasing router decisions toward or away from behavior-linked experts.
Making Concept Steering Actually Work
Spring 2026 · Alignment · Concept steering
Reading Davarmanesh, Wilson, and Radhakrishnan on attention-guided feature learning, and why better measurement (token choice, soft labels, and enrichment-based layer selection) may matter more than fancier steering algorithms.
Steering Large Language Models with Concept Vectors
Spring 2026 · Alignment · Concept steering
A reflection on Beaglehole et al. (Science, 2026): concept vectors, steering, and steerable oversight.
What Survives Post-Training Inside a Language Model?
Spring 2026 · Alignment · Mechanistic interpretability
A mechanistic walkthrough of Du et al. (COLM 2025): which internal mechanisms persist under post-training (knowledge, truthfulness) and which get rewritten (refusal).
Open Problems in Mechanistic Interpretability
Spring 2026 · Alignment · Mechanistic interpretability
An interactive summary of Sharkey, Chughtai et al. (2025): decomposition, sparse dictionary learning and its eight limitations, circuit discovery, scalable oversight, and where I'm staking research direction toward alignment.
What a Semester of Graduate Game Theory Taught Me
Spring 2026 · Game theory
Nash equilibrium and its discontents, the refinement ladder, monotone comparative statics, supermodular games, potential games, and where the analogies to alignment hold up (and where they don't).
A Game-Theoretic View of High-Skill Tech Hiring
Fall 2025 · Game theory · Labor markets
Why recruiting researchers at labs like Midjourney looks less like a job board and more like a repeated auction with hidden values. Interactive visualizations of payoff matrices, mixed strategies, and reputation dynamics.