writing

Interactive essays, research write-ups, and the occasional exploration.

Where Hyperbolic Geometry Helps (and Where It Doesn’t)

Summer 2026 · Representation learning · Hyperbolic embeddings

A write-up of our prototype-embedding study on WikiArt. Across 150 runs and three reference trees, hyperbolic prototypes preserve local hierarchy in a way that matched Euclidean prototypes don’t, while classification and global tree fidelity go the other way. A local-vs-global trade-off.

read · code · pdf

Steering Mixture-of-Experts Models by Choosing the Route

Spring 2026 · Alignment · MoE routing

A continuation of activation steering: in sparse MoE models, behavior can be steered by biasing router decisions toward or away from behavior-linked experts.

read

Making Concept Steering Actually Work

Spring 2026 · Alignment · Concept steering

Reading Davarmanesh, Wilson, and Radhakrishnan on attention-guided feature learning, and why better measurement (token choice, soft labels, and enrichment-based layer selection) may matter more than fancier steering algorithms.

read

Steering Large Language Models with Concept Vectors

Spring 2026 · Alignment · Concept steering

A reflection on Beaglehole et al. (Science, 2026): concept vectors, steering, and steerable oversight.

read

What Survives Post-Training Inside a Language Model?

Spring 2026 · Alignment · Mechanistic interpretability

A mechanistic walkthrough of Du et al. (COLM 2025): which internal mechanisms persist under post-training (knowledge, truthfulness) and which get rewritten (refusal).

read

Open Problems in Mechanistic Interpretability

Spring 2026 · Alignment · Mechanistic interpretability

An interactive summary of Sharkey, Chughtai et al. (2025): decomposition, sparse dictionary learning and its eight limitations, circuit discovery, scalable oversight, and where I'm staking research direction toward alignment.

read

What a Semester of Graduate Game Theory Taught Me

Spring 2026 · Game theory

Nash equilibrium and its discontents, the refinement ladder, monotone comparative statics, supermodular games, potential games, and where the analogies to alignment hold up (and where they don't).

read

A Game-Theoretic View of High-Skill Tech Hiring

Fall 2025 · Game theory · Labor markets

Why recruiting researchers at labs like Midjourney looks less like a job board and more like a repeated auction with hidden values. Interactive visualizations of payoff matrices, mixed strategies, and reputation dynamics.

read

Geospatial Embeddings for Pollution Prediction

Fall 2024 · Geospatial ML · Air pollution

An interactive walkthrough of my UC Berkeley honors thesis: using satellite-derived embeddings to predict block-level NO₂ across the Bay Area. A bit of ML archaeology with Ridge regression, SVR, and Random Forests.

read · pdf · talk