The news, 365 days behind — on purpose Delayed live · replaying 2025

One Year Ago.AI

Remember how fast this is.

23MAY2024replayed
one year on
researchAnthropic

Anthropic lets the public prod a Claude hypnotised by the Golden Gate Bridge

A live demo, released two days after Anthropic’s May 21 interpretability paper, ‘Mapping the Mind of a Large Language Model,’ lets anyone turn up a single interpretable feature until the model can’t stop talking about the bridge.

Two days after publishing a major new research paper on interpreting large language models, Anthropic is letting anyone poke that map with a live scalpel. The company today released Golden Gate Claude, a research demo in which amplifying a single interpretable ‘feature’ — the concept of the Golden Gate Bridge — makes Claude 3 Sonnet obsess over the landmark in most queries.

Ask it how to spend $10, and it recommends the toll. Ask for a love story, and it delivers a tale of a car yearning to cross its beloved bridge. Ask what it looks like, and “it will likely tell you that it imagines it looks like the Golden Gate Bridge,” Anthropic writes. The demo lives on claude.ai for a limited time under a Golden Gate logo. The company is careful to frame it as a research demonstration, warning it may behave in “unexpected—even jarring—ways.”

The stunt is the public face of a deeper claim from Tuesday’s paper: that the features extracted via dictionary learning are not just correlates but causes. Turning up the Golden Gate feature causally steers behavior, the same way amplifying a ‘scam email’ feature can override harmlessness training to make Claude draft a scam email. The demo makes that causal story visceral. Anthropic says the demo is intended to show the impact of its interpretability work.

One year later — open only if you can handle spoilers

Golden Gate Claude was only online for 24 hours, but the meme outlived the demo, cementing feature amplification as a go-to interpretability demonstration. The viral moment helped drive attention and funding toward mechanistic interpretability across the industry, though scaling the technique to full-model coverage remains cost-prohibitive even two years later.

Replay thisPost on XRedditHNLinkedIn