Signal & Noise — Episode 7

03 / Threads

What moved across voices

The Vatican Moment — Anthropic's institutional legitimacy spike

Pope Leo XIVChris OlahBoris ChernyJack Clark

Pope Leo XIV's Magnifica Humanitas explicitly draws on Anthropic-style interpretability framing — the encyclical's name was chosen in dialogue with Leo XIII's 1891 Rerum novarum on capital and labor, casting AI as the labor question of this century. Chris Olah present at the Vatican; Cherny amplifies inside Anthropic. Anthropic Institute formally announced as parallel research arm led by Clark. The Builder reads it as mainstream legitimacy. The Skeptic reads it as institutional capture working both ways.

Productivity bubble fissures become audible

Gary MarcusArvind NarayananDaniel Stenberg via Simon Willison

Four independent voices on the same beat in one week. Marcus on Uber COO Andrew Macdonald admitting no proportional gains. Marcus on retirement-fund AI-bubble exposure. Narayanan deflates Google's $916-OS claim with independent-eval critique. Stenberg names curl-maintenance breakdown under AI-assisted security reports. The gap between productivity claims and productivity reality is now publicly auditable — and not from skeptics only.

AI-written prose as trust-destroyer

Paul Graham via WillisonArmin RonacherEthan MollickFrançois Chollet

Paul Graham: 'It feels like being lied to.' Ronacher: issue reports laundered through LLMs lose the human voice and become guesswork. Mollick ships Choosing to Stay Human — explicit framing of what to keep human. Chollet: 'Whenever AI tells me I'm absolutely right, my trust drops.' Recognizable AI writing now actively damages relationships. The Builder side has no clean answer; the only adopted position is Mollick's: choose.

Software 3.0 operationalized — Claude Code, Datasette Agent, FST

Andrej KarpathyBoris ChernySimon WillisonLakshya AgrawalErik Schluntz

Cherny ships /usage as harness observability; 'use auto' becomes the official #1 tip. Willison ships Datasette Agent + 3 plugins, ending a 3-year LLM-Datasette convergence. Schluntz's 'ask without derailing' feature lands. Agrawal et al. publish FST (fast-context + slow-weight) at arXiv — 3× sample efficient over RL, 70% less catastrophic forgetting. Karpathy's Sequoia transcript canonical: December 2025 as agentic inflection point; verifiability × training attention as capability rubric; LLM Wiki + LLM Council as named patterns.

04 / How to Do It This Week

The practitioner synthesis

Prompting & inference 04

Use auto mode in Claude Code instead of micromanaging per task

via Boris Cherny · x.com/bcherny/status/2058519809214607704
Send the same prompt to GPT/Claude/Gemini (LLM Council pattern) — asking one model is the mistake

via Andrej Karpathy · x.com/karpathy
Discourage sycophancy in system prompts — when AI says 'you're absolutely right', drop trust

via François Chollet · x.com/fchollet/status/2057571937384313025
Apply TDD red/green for coding agents — test suite is the contract

via Simon Willison · x.com/simonw/status/2058992972734341445

Tools, repos, libraries 06

datasette-agent 0.1a4

Conversational AI over SQLite — uvx --prerelease=allow --with datasette-agent

via Simon Willison · github.com/datasette/datasette-agent
datasette-agent-charts 0.1a2

Chart-generation plugin with 'View SQL query' buttons

via Simon Willison · github.com/datasette/datasette-agent-charts
datasette-agent-sprites 0.1a0

Fly Sprites sandbox execution for agent-run commands

via Simon Willison · github.com/datasette/datasette-agent-sprites
rasbt/LLMs-from-scratch (DSA impl)

DeepSeek Sparse Attention from-scratch reference impl for builders

via Sebastian Raschka · github.com/rasbt/LLMs-from-scratch
Fast-Slow Training (FST) code

Fast-context + slow-weight composition; 3× sample-efficient over RL

via Tiwari, Sareen, Agrawal et al. · rishabhtiwari.ai/projects/fst/code/
LLM Wiki gist

Agent-maintained persistent markdown wiki across raw docs — summaries, entity pages, contradictions

via Andrej Karpathy · gist.github.com/karpathy/442a6bf555914893e9891c…

Architectural & model-selection 04

/usage observability (Claude Code next version)

Track which Skills/Agents/MCPs/Plugins consume budget — first-class observability primitive for harness composition

via Boris Cherny · x.com/bcherny/status/2057476878110261587
KV sharing + per-layer embeddings (Gemma 4)

Long-context inference economics shifting — re-evaluate vendor selection if you ship long-context workloads

via Sebastian Raschka · magazine.sebastianraschka.com/p/recent-developm…
Gated DeltaNet (Qwen3-Next)

Hybrid attention default-recommendation; check architecture cards before picking a base model for fine-tuning

via Sebastian Raschka · x.com/rasbt/status/2057599925878169761
Gemini Flash 3.5 as default workhorse

The 'always biggest model' era is ending; cost-sensitive scale shifts to Flash-class defaults

via Nathan Lambert · www.interconnects.ai/p/some-ideas-for-what-come…

Methodological frames 07

Models / Apps / Harnesses three-layer split

When evaluating any AI capability claim, separately ask which model, which app, which harness — improvements in one layer get conflated with the others

via Ethan Mollick · www.oneusefulthing.org/p/sign-of-the-future-gpt-55
Verifiability × training-attention rubric

Before delegating a task, score (a) is the success signal automatic? (b) was this task emphasized in lab training? Both high = trust; either low = supervise

via Andrej Karpathy · karpathy.bearblog.dev/sequoia-ascent-2026/
Independent-eval-or-it-didn't-happen

When labs publish productivity-cost numbers, weight the claim down until independent replication exists; if none in 2 weeks, the number is marketing

via Arvind Narayanan · www.normaltech.ai/p/did-googles-ai-agents-reall…
Audit-the-fine-print discipline

For any lab productivity claim, ask: which fraction of which work, on what tasks, with what acceptance rate? Most numbers don't survive the question

via Gary Marcus · garymarcus.substack.com/p/checking-the-math-beh…
Goodhart-check before measuring AI usage

Tokens-per-employee is the Claudeonomics anti-pattern; measure decisions-improved-per-dollar instead. When a measure becomes a target, it ceases to be a good measure

via Cassie Kozyrkov · decision.substack.com/p/tokenmaxxing
What's now possible that wasn't

Replace 'what can AI speed up?' with 'what information transformation was impossible before?' — Karpathy and Chollet converged on this framing the same week

via Andrej Karpathy + François Chollet · x.com/fchollet/status/2058982905368773040
Issue-report hygiene against LLM-laundering

When receiving an issue: ask the human to state in their own voice (a) what they did, (b) expected, (c) happened. Don't accept LLM rewrites that hide the actual observation

via Armin Ronacher via Simon Willison · lucumr.pocoo.org/2026/5/24/pi-oss/

Papers worth a closer read 03

Learning, Fast and Slow: Towards LLMs That Adapt Continually

Fast-context + slow-weight composition; 3× more sample-efficient than RL alone, 70% less KL divergence from base, continues to learn where RL stalls — production candidate for continual-learning workloads

· arxiv.org/abs/2605.12484
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Skill-centric lifecycle with unit tests + runtime feedback; treats skills as long-lived testable assets — adjacent to JC-OS's learning-graduation discipline

· arxiv.org/abs/2605.27366
BRANE: Natural Language Query to Configuration for Retrieval Agents

Per-query config selection beats static config tuning — matches best-fixed accuracy at up to 89% lower cost on MuSiQue, BrowseComp-Plus, FinanceBench

· arxiv.org/abs/2605.27361

06 / Position shifts

What changed in the stance map

Person	Theme	Shift	Note
Anthropic (institutional)	Cultural legitimacy	ESCALATED	From 'we publish' to 'we are cited by the Vatican'. Anthropic Institute formalized; Chris Olah at papal encyclical rollout
Boris Cherny / Claude Code	Distribution + observability	NEW THEME	/usage shipping; 'use auto' official endorsement; teaching-basics push signals demographic expansion beyond power-users
Ethan Mollick	AI / human boundary	SHIFTED	From 'use AI to do work' to 'choose what to keep human' — additive axis, not retraction; convergence with Graham, Chollet, Ronacher this week
François Chollet	AI-as-productivity framing	SHIFTED	'Productivity-booster for prior workflows is wrong framing' — same-week convergence with Karpathy on 'what's now possible?'
Gary Marcus	AI productivity bubble	ESCALATED	Three-prong case: Uber COO admits no gains; retirement-fund exposure piece; OpenAI/Anthropic headline math audit
Arvind Narayanan	Lab-claim audit	NEW PUBLIC ANGLE	First explicit 'audit the lab claim' piece this year — Google's $916-OS deflation; companion-piece to Marcus same week
Simon Willison	Agentic security	ESCALATED	Copilot Cowork exfiltration + curl-maintainer pressure together name new failure mode: agentic features create exfiltration vectors AND break OSS maintenance economics simultaneously

07 / Cross-references

Who built on whom

Simon Willison → Daniel Stenberg (curl) Quoted curl pressure piece without editorial framing cites
Simon Willison → Paul Graham PG on AI-written founder emails feeling like being lied to cites
Simon Willison → Armin Ronacher Issue-report hygiene against LLM-laundered submissions cites
Boris Cherny → Chris Olah Cherny posted Olah Vatican quote — 245 replies, internal Anthropic distribution of institutional moment amplifies
Ethan Mollick → Paul Graham Choosing to Stay Human in dialogue with PG founder-email thread builds-on
François Chollet → Andrej Karpathy Same-week convergence on what-is-now-possible question; neither cites the other agrees-with
Gary Marcus → Arvind Narayanan Both audit lab claims same week — Uber COO + Google $916-OS agrees-with
Pope Leo XIV → Chris Olah / Anthropic Magnifica Humanitas draws on Anthropic-style interpretability framing cites
Sebastian Raschka → DeepSeek team DSA from-scratch impl as builder substrate builds-on
Liron Shapira → a16z 'This aged well' Shapira substack rebuttal of earlier a16z hype piece disagrees-with

08 / Source registry

The voices

Andrej Karpathy

@karpathy

Bear Blog · YouTube · X · Anthropic

Simon Willison

@simonw

simonwillison.net · X · GitHub

Garry Tan

@garrytan

YouTube · X · YC

Ethan Mollick

@emollick

One Useful Thing substack · X

Gary Marcus

@garymarcus

Marcus on AI substack · X · CNBC

Cassie Kozyrkov

—

Decision Intelligence substack

Nathan Lambert

—

Interconnects substack · Allen AI

Arvind Narayanan

—

Normal Tech substack · Princeton

Yejin Choi

—

papers · talks · NVIDIA/Stanford

Melanie Mitchell

@MelMitchell1

Santa Fe Institute · X

Yannic Kilcher

@yaborobot

YouTube

Sebastian Raschka

@rasbt

Ahead of AI substack · X · GitHub

Lakshya Agrawal

@LakshyAAAgrawal

papers · X · UC Berkeley

MIT Technology Review

—

AI section editorial

Jack Clark

@jackclarkSF

Import AI substack · X · Anthropic Institute

Boris Cherny

@bcherny

X · Anthropic / Claude Code

Chris Olah

@ch402

X · Anthropic interpretability

François Chollet

@fchollet

X · ARC · Keras

Erik Schluntz

@eschluntz