writing

all essays · advice to myself

Interactive essays, research write-ups, and the occasional exploration.

Steering Mixture-of-Experts Models by Choosing the Route

Spring 2026 · Alignment · MoE routing

A continuation of activation steering: in sparse MoE models, behavior can be steered by biasing router decisions toward or away from behavior-linked experts.

read

Making Concept Steering Actually Work

Spring / Summer 2026 · Alignment · Concept steering

Reading Davarmanesh, Wilson, and Radhakrishnan on attention-guided feature learning — and why better measurement (token choice, soft labels, and enrichment-based layer selection) may matter more than fancier steering algorithms.

read

Steering Large Language Models with Concept Vectors

Spring 2026 · Alignment · Concept steering

A reflection on Beaglehole et al. (Science, 2026): concept vectors, steering, and steerable oversight.

read

What Survives Post-Training Inside a Language Model?

Spring 2026 · Alignment · Mechanistic interpretability

A mechanistic walkthrough of Du et al. (COLM 2025): which internal mechanisms persist under post-training (knowledge, truthfulness) and which get rewritten (refusal).

read

Open Problems in Mechanistic Interpretability

Spring 2026 · Alignment · Mechanistic interpretability

An interactive summary of Sharkey, Chughtai et al. (2025) — decomposition, sparse dictionary learning and its eight limitations, circuit discovery, scalable oversight, and where I'm staking research direction toward alignment.

read

What a Semester of Graduate Game Theory Taught Me

Spring 2026 · Game theory

Nash equilibrium and its discontents, the refinement ladder, monotone comparative statics, supermodular games, potential games — and where the analogies to alignment hold up (and where they don't).

read

A Game-Theoretic View of High-Skill Tech Hiring

Fall 2025 · Game theory · Labor markets

Why recruiting researchers at labs like Midjourney looks less like a job board and more like a repeated auction with hidden values. Interactive visualizations of payoff matrices, mixed strategies, and reputation dynamics.

read

Geospatial Embeddings for Pollution Prediction

Fall 2024 · Geospatial ML · Air pollution

An interactive walkthrough of my UC Berkeley honors thesis: using satellite-derived embeddings to predict block-level NO₂ across the Bay Area. A bit of ML archaeology with Ridge regression, SVR, and Random Forests.

read · pdf · talk