Welcome to Ben's Bites

Discover the latest AI news and curated content

Latest Newsletters

What Ben's Consuming

AI may unleash the most entrepreneurial generation we’ve ever seen.

Visit 550

Give your agent a laboratory, not a task .

Visit 330

How do you store and retrieve information from the web in a database ?

Visit 292

2026: This is AGI. A case for long-running agents satisfying a functional definition of AGI.

Visit 249

Understanding Manus sandbox - your cloud computer.

Visit 229

How I write with AI with Wispr Flow, Claude, and Pangram. (not me btw)

Visit 788

Common patterns for agent design that emerged over the past year.

Visit 444

The team at Ramp put claude code in Rollercoaster Tycoon . They also built their own internal coding agent called Inspect. Here’s why.

Visit 253

A software library with no code .

Visit 227

Exploring how compact conversation works in claude code.

Visit 194

Collaborative Intelligence - treating agents as social participants in the multiplayer systems.

Visit 171

Search the way you think with Dropbox Dash , context-aware AI that connects to all your work apps. With Dash, your work files, messages, and projects are in one simple, secure workspace, so you can search across tools, share with context, and keep work moving without juggling a dozen tabs.*

Visit 118

The team at Ramp put claude code in Rollercoaster Tycoon . They also built their own internal coding agent called Inspect. Here’s why.

Visit 97

Claude Code and why non-coders should be paying attention (and playing with the system)

Visit 1063

21 lessons from 14 years at Google.

Visit 1032

The state of AI going into 2026. Companies should make AI operational without turning their product into a bespoke service.

Visit 702

Cursor rebuilt how their agent uses context to reduce token usage.

Visit 226

Behind the scenes of a coding agent asking follow-ups on a Kanban board.

Visit 166

PR descriptions need to carry the context , for engineering judgment and teaching juniors.

Visit 161

What droid searches - an analysis of 780,000 of its own web searches.

Visit 157

Voice AI startups fail when they rely on low-quality models and lose early user trust. Speechmatics ’ production-grade speech APIs are built for messy, real-world audio – accents, noise, and 55+ languages. Their Startup Program offers $50k in credits plus hands-on support. Apply for $50k. *

Visit 147

Prompt caching - 10x cheaper LLM tokens, but how? This is a very good and beginner-friendly read, even if you don’t know anything about LLMs

Visit 541

Prototypes are the new PRDs.

Visit 430

Frontier of the Year 2025 - New milestones reached in 2025 (beyond just AI) and their potential impact.

Visit 389

What actually is Claude Code’s plan mode ?

Visit 302

Writing the prompts for Granola’s Crunched 2025.

Visit 246

Cursor CEO interviews John Schulman , the co-founder of Thinking Machines and previously OpenAI.

Visit 238

This week, Adobe added three of its most popular apps, Photoshop, Adobe Express and Acrobat , into ChatGPT. So now you can edit photos, create designs and edit PDFs directly in your ChatGPT conversations. This handy tutorial shows you how to get started for free. *

Visit 217

The jagged AI frontier is a data frontier .

Visit 209

Claude Code’s DX is too good , and that’s a problem.

Visit 805

Where’s my flying car? - Five things I thought we’d have by the end of 2025 in the LLM era.

Visit 471

This week, Adobe added three of its most popular apps, Photoshop, Adobe Express and Acrobat, into ChatGPT . So now you can edit photos, create designs and edit PDFs directly in your ChatGPT conversations. This handy tutorial shows you how to get started for free .*

Visit 357

What happens when the coding becomes the least interesting part of the work.

Visit 270

Longer-running agents are starting to work; what about debugging and improving them?

Visit 228

How Instacart built Pixel - an unified image generation platfrom.

Visit 193

Joining OpenAI at 10 - From Fidji Simo, the former CEO of Instacart, who is now the CEO of Applications at OpenAI. OpenAI now has three ex-CEOs in top positions: Sarah Friar from Nextdoor as CFO and Denise Dresser from Slack as CRO.

Visit 92

Why AGI will not happen .

Visit 522

Supermemory: Raising $3M at 19 and going from open source to funded startup. (I’m an investor)

Visit 309

Clay’s approach to GTM for going from $1M to $100M ARR.

Visit 258

200k tokens is plenty - Software development using a dozen short threads vs a longer one.

Visit 210

Useful patterns from building HTML tools .

Visit 186

What does AI think about Hacker News comments from 10 years ago?

Visit 177

Demonstrably safe AI for autonomous driving - A look into Waymo’s approach.

Visit 166

Minimise successful test outputs and stop overloading your agent’s context.

Visit 156

Google Illuminate is quietly becoming the best research tool I didn’t know I needed.

Visit 1416

Your experts won’t train your AI . You have to interview them.

Visit 542

How I keep up with AI-generated PRs .

Visit 402

Better ways to build self-improving agents - A recap of 2025 NeurIPS research papers.

Visit 247

Antropic’s resident philosopher, Amanda Askell , answers questions about AI.

Visit 237

Stop juggling 5 APIs. AssemblyAI handles multilingual speech-to-text , speaker & topic detection, and PII redaction in one stack. $0.15/hr, pay-as-you-go , no lock-in. Get free credits and start shipping features faster.*

Visit 44

Visit 29

Droid-maxxing - Pushing AI-assisted development to its limits.

Visit 16

A profile of ChatGPT’s product chief , Nick Turley.

Visit 11

Cursor’s guide for non-technical hires to build with Cursor.

Visit 561

Lessons from building AI agents for messy government PDFs .

Visit 494

One year of using ChatGPT Pro as my first hire .

Visit 459

How is AI changing work inside Anthropic? Basically, engineers feel ~50% more productive, 27% of Claude-assisted tasks were projects that “would not have been done otherwise.

Visit 279

What I learned building an opinionated and minimal coding agent .

Visit 214

On the consumption of AI-generated content at scale.

Visit 397

The new alignment research blog from OpenAI, live with its first two posts.

Visit 313

Thoughts from building and shutting down a portable context layer company.

Visit 212

Supermemory is state-of-the-art on LongMemEval - a measure for reasoning, recollection and more across 100K+ tokens. (I’m an investor)

Visit 173

OpenAI’s lead under pressure as rivals start to close the gap - FT

Visit 172

The new alignment research blog from OpenAI, live with its first two posts.

Visit 55

The new alignment research blog from OpenAI, live with its first two posts.

Visit 41

I don’t care how well your “AI” works - a rant about AI product fatigue.

Visit 521

Ilya Sutskever – We’re moving from the age of scaling to the age of research

Visit 454

High School Dropout to OpenAI Researcher

Visit 315

Jailbreaking Opus 4.5

Visit 281

How to prompt with Gemini 3 to get the best UIs

Visit 471

A first principles deep dive into Claude Agent Skills.

Visit 435

Benedict Evans’ Nov 2025 edition of AI eats the world.

Visit 391

Building an AI native engineering team .

Visit 349

My read for the weekend is this 35k+ words monster from Contrary Research covering all things OpenAI .

Visit 235

Why your Voice AI keeps interrupting users (hint: it’s too fast).*

Visit 215

I tested Gemini 3.0 Pro inside Droid and Google Antigravity.

Visit 186

Why we forked Chromium for AI automation.

Visit 182

Hyperproductivity - An astonishing, exhilarating, exhausting new style of work.

Visit 673

Gemini 3 reviews: Matt Shumer — Simon Willison

Visit 355

How evals drive the next chapter in AI for businesses .

Visit 314

Gemini 3 reviews: Matt Shumer — Simon Willison

Visit 248

LLMs have distinct coding personalities. This research (free report, no form fill needed) from Sonar lays out each LLM’s unique habits, blind spots and risks – like hidden security flaws, messy code and severe bugs. Useful read for making smarter, safer decisions when coding with AI.*

Visit 233

The impact of AI scams on elderly people.

Visit 218

GPT-5.1-Pro reviews: Matt Shumer — Simon Smith

Visit 146

GPT-5.1-Pro reviews: Matt Shumer — Simon Smith

Visit 125

Why AI writing is mid .

Visit 547

The rise of Gamma from “dumbest idea I’ve heard” to $100M ARR. (Luckily, I didn’t think it was a dumb idea and wrote a check in 2022 as a a16z scout - who led their last round)

Visit 536

How three YC startups built their companies with Claude Code.

Visit 505

The state of Chinese LLMs - supposedly frontier performance, supposedly huge shadow adoption, all under massive compute constraints.

Visit 198

How we made sandboxed coding agents 10x faster to start .

Visit 125

Benchmarks should focus on agentic work and brittleness , and why this new one isn’t a good measure for hallucination .

Visit 83

Benchmarks should focus on agentic work and brittleness , and why this new one isn’t a good measure for hallucination .

Visit 60

Benchmarks should focus on agentic work and brittleness , and why this new one isn’t a good measure for hallucination .

Visit 40

AI glossary - Simple definitions for key AI terms with visualisations.

Visit 603

Multi-agent systems can handle more complex tasks, but are they worth navigating dozens of moving parts and infra costs that add up fast? Yep, if you’ve got a guide to follow. The team at Galileo has compiled just that, proven tips to simplify the process, and they’re giving it away for free. *

Visit 505

How to improve Claude’s frontend design outputs using Skills.

Visit 411

Satya Nadella answers hard questions from Dwarkesh and Dylan Patel.

Visit 361

Building a Character AI clone with Google AI Studio in 50 minutes.

Visit 344

AI adopters produced 39% more code merges , with no sign of a decrease in quality.

Visit 191

Teach your AI to think like a senior engineer .

Visit 726

AI progress and recommendations from OpenAI.

Visit 444

What today’s AI (memory) products can learn from Chrome’s early design explorations of web history.

Visit 199

From words to worlds : Spatial intelligence is AI’s next frontier.

Visit 197

Building Cursor Composer with Sasha Rush.

Visit 163

Highlights from USV’s 2025 Annual Meeting (captured with AI).

Visit 119

Thoughts by a non-economist on AI and economics.

Visit 366

Sam Altman on trust, persuasion, and the future of intelligence .

Visit 256

Hyper-engineering - Pushing agents to their full potential.

Visit 249

Concepts matter more than raw code when building with AI.

Visit 226

Loading available MCP tools with code execution (vs declaring them all at once).

Visit 206

The case against LLMs as rerankers .

Visit 202

Making a website to observe trick-or-treaters and identify their costumes with Gemma 3 on edge.

Visit 130

Semantic search improves satisfactory code generation rates in Cursor.

Visit 121

What if you don’t need MCP at all?

Visit 733

The secrets of Claude Code from the engineers who built it.

Visit 506

Quick recap on where we are with agents.

Visit 486

Al Engineering 101 with Chip Huyen - Have fun while building AI products as a company. ( TLDR )

Visit 437

The end of cloud inference - everyday intelligence will live on the device you already own.

Visit 262

Did Codex really get worse over the last few weeks?

Visit 169

Inside the data centres that train A.I. and drain the electrical grid.

Visit 147

How we built OWL , the new architecture behind ChatGPT’s browser Atlas.

Visit 144

Why AI voice agents fail at multi-speaker conversations – and how to fix it.*

Visit 317

Cursor 2.0’s system prompt in an easy-to-explore artifact (hunted by elder_plinius ).

Visit 287

How Rakuten replaced LLM-as-a-Judge with SAE probes for PII detection (while saving money).

Visit 215

New diligence challenge with AI - When a prototype performs better than the company you wanted to acquire.

Visit 201

Signs of introspection in LLMs - Can Claude actually recognise its thoughts, or does it just make up explanations when asked about them?

Visit 191

Cursor 2.0’s system prompt in an easy-to-explore artifact (hunted by elder_plinius ).

Visit 50

Why we built TLDW - A tool that helps people learn from long YouTube videos.

Visit 591

Amjad Masad and Marc Andreessen on AI agents, AGI, creativity, and reasoning .

Visit 402

A quick review of vibe coding in Google AI Studio .

Visit 397

LangChain Essentials - Learn the basics of LangChain, the open-source framework.

Visit 287

Why Stagehand is graduating from Playwright .

Visit 232

Fork your data like you fork your code. How Tigris brings Git-style workflows to datasets, letting you clone massive buckets instantly for experimentation.*

Visit 142

On policy distillation - The best of both SFT and RL with an order of magnitude lower cost. (by Thinking Machines)

Visit 121

As you know, I’m learning to be more technical so the Solve It With Code course by Jeremy Howard (Inventor of the LLM) & Eric Ries (The Lean Startup) is one course I’m taking - I’ve learnt a ton from both of them over the years. Their method teaches how to think with AI, not just prompt. Break challenges into small pieces, build iteratively. Starts Nov 3 with 5 weeks of live problem-solving. Bonus: Access to their Solveit platform. Join here . Also Andrej Karpathy set a challenge: “Can you take my 2h13m tokenizer video and translate [into] a book chapter”. and Solve It did - read it here .

Visit 621

How people at Cursor use Cursor .

Visit 474

AI is making us work more - How the AI boom revived a 996 work culture.

Visit 430

Use the saw, fear the saw - Don’t stop making powerful (and potentially dangerous) tools; rather, we should empower people to use them safely.

Visit 271

There is no God Tier video model .

Visit 224

Zero Framework Cognition : A way to build resilient AI applications.

Visit 212

Neural audio codecs - how to get audio into LLMs.

Visit 177

Visit 134

Using AI to generate 100% of my code over the last few months.

Visit 470

Andrej Karpathy on Dwarkesh’s podcast —I’m 30% into it, and I have just one recommendation: don’t listen to commentaries on the podcast; instead, listen to it. Everyone has their own take that benefits what they want to sell you.

Visit 362

How GPT-5 thinks , with OpenAI’s VP of Research.

Visit 338

A tale of two Agent Builders - Two competing solutions to the same design problem in AI interfaces.

Visit 233

It pays to be a middleman - how SF compute corners to offtake market.

Visit 223

3 ways Manus engineers context for its agent.

Visit 213

Local models are (not) cope.

Visit 202

LLM psychosis isn’t, generally, psychosis.

Visit 167

Evaluating long context reasoning ability and introducing a new benchmark.

Visit 92

Just talk to it - the no-bs way of agentic engineering.

Visit 562

The case for vibe coding - personal software.

Visit 557

Haiku 4.5 vibe check by Every.

Visit 334

Optimising coding agent rules for improved accuracy. They improved Cline’s performance with GPT-4.1 from 18% to 34% on SWE-Bench lite.

Visit 238

AI agent benchmark compendium - high-level overview to over 50 of modern benchmarks, grouped into four key categories.

Visit 163

How fine-tuning a model reduced latency by 3x while improving reliability for Cal AI.

Visit 145

Why everyone will be talking about Harness Engineering in six months.

Visit 984

AI memory explained: SuperMemory MCP for Cursor, Claude & Windsurf.

Visit 610

The case for Supermemory is the case for solo founders .

Visit 452

Technological Optimism and Appropriate Fear - What do we do if AI progress keeps happening?

Visit 268

DevRel is unbelievably back . I saved this then surprisingly saw my name mentioned in the post (ty swyx!) - I think dev rel is the best job, especially in today’s world. A lot of the functions are really similar to being a founder; you speak to your customers every day, listen to & implement feedback, understand your product inside and out, know whats on the roadmap, why certain technical decisions are made and you’ve got to be in charge of growth. I’m going to start talking about little tools I’m making to help me in my day to day at Factory , like;

Visit 222

How Codex ran OpenAI DevDay 2025.

Visit 149

Visit 53

Taste is your moat with Dylan Field, Figma.

Visit 691

The State of AI report 2025 — this one will take me a few days to go through.

Visit 677

A cartoonist’s review of AI art by The Oatmeal.

Visit 511

Agents of Scale - new podcast by my old boss at Zapier.

Visit 276

Vibe engineering and embracing the parallel coding agent lifestyle .

Visit 264

Vibe engineering and embracing the parallel coding agent lifestyle .

Visit 236

Sora, AI Bicycles, and Meta Disruption .

Visit 222

Two things LLM coding agents are still bad at : copy-pasting code and asking questions. This is actually a good critique—especially the first one. I wonder if you could create an app/service for refactoring codebases by teaching an open-source model (like Kimi K2 or GLM 4.6) to use “cut/copy/paste” as tools.

Visit 147

Which apps are startups spending their AI budget on (beyond GPUs)? The top 50 contains 4 vibe-coding and 10 creative AI tools, hinting they are here to stay.

Visit 516

Vibe Check by Every - OpenAI DevDay 2025.

Visit 217

Three paths for how AI changes scientific discovery .

Visit 206

If you’re looking for a tactical playbook on startup investing (did you know you can write $1k checks?), my friends at Angel Squad run a weekly newsletter called Small Bets . It’s one of the top investor newsletters in the space with 30k+ subscribers. If you’re interested in AI, you’ll enjoy it. Check it out – subscribe here .*

Visit 160

Effective context engineering for AI agents - how does it differ from prompt engineering?

Visit 803

via Anthropic

Visit 803

The people running Elon Musk’s xAI.

Visit 345

Your Factory AI guide - Building a software development army with Droid.

Visit 289

The anatomy of MCP authorization - how to add OAuth to your MCP server.

Visit 269

Saving hours by writing release notes for Bun using AI.

Visit 196

Rethinking muscle mem as an LLM proxy - as long as a tool gets called, does it matter who called it?

Visit 136

Abundant Intelligence by Sam Altman.

Visit 490

Real AI agents and real work.

Visit 434

First course on Cursor Learn - A six-part video series on AI foundations.

Visit 422

What I look for in an AI PM at Google Labs - part 1 , part 2 , part 3 .

Visit 311

AI is already writing 90% of my code - by the maker of Flask.

Visit 262

LoRA without regret - new blog from Thinking Machines Lab comparing LoRA with full fine-tuning and RL.

Visit 176

Code Mode by CloudFlare - LLMs are better at writing TypeScript code to call MCP than at calling MCP directly—making code gen a better way to use MCP.

Visit 151

What I look for in an AI PM at Google Labs - part 1 , part 2 , part 3 .

Visit 103

What I look for in an AI PM at Google Labs - part 1 , part 2 , part 3 .

Visit 103

Building agentic interfaces - how do we speak to them, and how should they speak back?

Visit 393

Cognition’s CEO on what comes after code .

Visit 276

Building SOTA enterprise agents 90x cheaper with automated prompt optimization .

Visit 230

Intercom’s Co-founder on Fin AI, his rubric for investing in AI companies and more.

Visit 189

GleanNext-gen work AI Agents + Assistant functionality, unveiled at Glean:LIVE . Register here to watch the virtual launch, which features live product demos, performance results, and new personalizations .*

Visit 89

From managing people to managing AI with Julie Zhuo.

Visit 303

Why we built the Responses API . Worth reading if you’re still using the old completions API from OpenAI.

Visit 210

Next-gen work AI Agents + Assistant functionality, unveiled at Glean:LIVE . Register here to watch the virtual launch, which features live product demos, performance results, and new personalisations.*

Visit 142

6-minute video of adding a heatmap activity feature with Droid - Factory’s CLI tool.

Visit 136

Launch day lies —day two tells the truth. Naveen (Monologue’s maker) talks about the drop-off after the initial excitement vs a product that people adopt as their daily driver.

Visit 102

Everyone wants to be an RL startup.

Visit 701

How OpenAI uses Codex - use cases and best practices.

Visit 450

Making an email agent using the Claude Code SDK. Also, a practical example showing the difference between agentic search and RAG.

Visit 398

Prototyping Google AI Studio by vibe-coding.

Visit 387

State of startups and their approach to AI in 2025.

Visit 359

How the current UX for NotebookLM came to be.

Visit 291

Technical breakdown of why Claude models were performing so badly for the last few weeks.

Visit 282

How Origin built an AI financial advisor respecting the SEC’s regulations.

Visit 545

How to write effective tools for agents by using agents.

Visit 348

Microsoft published this detailed blog on blind spots in MCP . Top concerns are

Visit 223

Vercel has some advice for building MCP servers .

Visit 220

What not to do when monetising a newsletter from our (yes, Ben’s Bites’) firsthand experience.

Visit 479

Defeating nondeterminism in LLM inference - blog by Thinking Machines (ex-OpenAI CTO, Mira Murati’s new company)

Visit 375

via Connectionism - A blog by Thinking Machines Lab

Visit 375

20-minute crash course for AI SDK v5.

Visit 290

How Factory builds agents that help across the entire software development life-cycle.

Visit 271

Shawn Wang (aka swyx)’s thesis for joining Cognition (which just raised at a $10B valuation)

Visit 229

Inside the Man vs Machine hackathon - 100+ participants, 6 final projects for a $12,500 top prize. Can you guess which ones used AI to build and which ones didn’t? ( non-paywalled article )

Visit 152

Inside the Man vs Machine hackathon - 100+ participants, 6 final projects for a $12,500 top prize. Can you guess which ones used AI to build and which ones didn’t? ( non-paywalled article )

Visit 129

Build an AI life co-pilot with Claude Code in 25 minutes.

Visit 713

How to code with Droids - step-by-step guide for how artists, designers, writers, and more can create software.

Visit 420

In the age of AI, young founders aren’t waiting to grow up.

Visit 259

The bear and bull case for local models in just 4 basic graphs.

Visit 233

Using linters to direct agents .

Visit 202

How we built an interpreter for Swift .

Visit 104

How to build a school with AI where students beg not to take a summer break. This was a pretty awesome episode thinking about what schools of the future could look like - hopefully they cross the pond to the UK!

Visit 582

State of the software engineering job market in 2025.

Visit 334

Slash commands vs subagents : how to keep AI tools focused.

Visit 325

AI will supercharge modelbusters .

Visit 188

Will AI be as big of a catalyst for a consumer AI wave as mobile?

Visit 158

Why Replit is betting AI prices will not come down. (paywalled) - tldr; Investors expect competition to lower prices, but Masad says limited competition in coding models keeps prices high.

Visit 116

The state of play for AI creative tools .

Visit 363

Demos and tips on how to use subagents in Claude Code.

Visit 315

What AI policy for school boards can look like.

Visit 308

Mass Intelligence - AI is no longer niche. Everyone is starting to get access to powerful AI.

Visit 262

Can AI help understand how the brain learns to see the world?

Visit 185

Top 100 AI Apps by a16z.

Visit 847

Vibe coding apocalypse & how to survive it (by the ex-CTO @ GitHub). Essentially you should learn backwards (which is what I do) - build the thing and learn how it works on the go (plus rebuild it several times)

Visit 528

How to index an entire repo and update it as code changes.

Visit 387

How educators use AI by Anthropic ( thread )

Visit 350

How educators use AI by Anthropic ( thread )

Visit 66

A journalist spent two days vibe coding at Notion and also shipped some actual code.

Visit 656

The full journey of vibe-coding something real in three weeks.

Visit 644

YC’s collection of talks on Context Engineering .

Visit 404

We put a coding agent in a while loop , and it shipped 6 repos overnight.

Visit 359

The context window problem : scaling agents beyond token limits.

Visit 338

Has AI gotten good enough to predict my taste ?

Visit 335

Building and prototyping with Claude Code’s PM, Cat Wu.

Visit 324

Practical example of refactoring an existing codebase with AI.

Visit 295

How to get your pitch for candidates right in the AI talent wars.

Visit 155

RAG is dead , and context engineering is king.

Visit 1076

Just one more prompt - confessions of a Claudoholic.

Visit 607

How 1500 early-stage companies are raising, spending, and hiring in 2025 .

Visit 519

A new tomorrow, today - a short essay on creative work in the age of AI.

Visit 424

Don't outsource your judgment to AI as a PM .

Visit 408

State of AI Dev Tools & Agents - A recap of the last quarter.

Visit 339

Dylan Patel (Semianalysis) on GPT-5, GPUs vs TPUs, and monetisation of AI apps .

Visit 230

If agents are building your app, who gets the W-2?

Visit 209

Microsoft's head of AI on why we should not build seemingly conscious AI .

Visit 191

How Modal built a Data Cloud from the ground up.

Visit 108

Best practices for building agentic AI systems : what actually works in production.

Visit 676

Sam Altman can ignore a messy launch and keep going with full speed ahead .

Visit 336

Do LLMs have good music taste ?

Visit 247

Compound Engineering - My AI has already fixed the code before I saw it.

Visit 245

Does Sonnet’s 1M context window make it better than Gemini? Slightly, but Gemini is just too cheap to care about it.

Visit 129

Fine-tuning Gemma 3 4B to perform Korean content moderation better than GPT-4o and Claude Sonnet.

Visit 120

Doomprompting is the new doomscrolling.

Visit 863

Can you explain your school’s AI policy in 30 seconds?

Visit 315

5 types of purchases and AI’s place in that transaction?

Visit 267

AI Dev Tools demo night.

Visit 266

Prompting guide for GPT-5 .

Visit 1064

How much do we wanna defer our decision-making to AIs?

Visit 453

Using Google Stitch and Jules to design and build an app.

Visit 397

GPT-5 vibe check by Every

Visit 279

GPT-5 review by Latent Space

Visit 227

Why all scores on SWE-bench Verified are not the same.

Visit 132

Learn to use Claude Code - from basics to MCP integrations, Hooks and more.

Visit 578

Stripe’s analysis of payment data from top 100 AI companies on their platform.

Visit 450

GothamChess’s commentary on the chess tournament between top AI models. I have timestamped it to the match between Opus 4 and 2.5 Pro. It’s so hilarious watching him react to all the “justifications” these models give for their moves.

Visit 182

Best practices to stop Claude Code from being "dumb" and ship features with less mistakes.

Visit 748

6 weeks of Claude Code .

Visit 632

Don’t read this startup slop - Peter got banned on a forum because he uses AI to assist with writing.

Visit 515

Chris Paik’s letter on why they bet on Cline (and its way of building in AI). Cline just raised a $32M Series A.

Visit 497

AI is polytheistic , not monotheistic, and 10 more thoughts on AI from Balaji.

Visit 388

Building an index of 15 years of evening news - 150k+ stories indexed in 39 mins, costing just $153.80 with Gemini 2.5 Flash Lite.

Visit 374

The Master Builder is the most valuable type of DevRel right now.

Visit 298

Vibe code is legacy code .

Visit 453

Behold the first AI-native investment bank .

Visit 422

Cursor’s lead designer built an operating system with Cursor.

Visit 358

Logan Kilpatrick’s latest podcast with Matan Grinberg, CEO of Factory AI (I’m an investor).

Visit 173

What type of apps can non-engineers make with vibe coding? for real, right now.

Visit 608

The Modern Startup Playbook with Grant Lee , CEO of Gamma (I intro’d Bryce and Grant and so cool to see the end product)

Visit 559

Enough AI copilots! We need AI HUDs.

Visit 526

FAQs on MCP , written for developers.

Visit 415

A visual explanation of token consumption when agents use files and call tools.

Visit 353

Reverse engineering GitHub Spark by using GitHub Spark.

Visit 204

You don’t own your memory .

Visit 459

The complete recipe to build a fully functioning, code-editing agent.

Visit 455

How to use Claude Code as your video editor .

Visit 395

How and why teens use AI companions . (33% as a companion, 46% as a tool/program)

Visit 289

Real-time experiments with an AI co-scientist .

Visit 260

Goodbye, Featured Snippets : How SERP features have evolved in the AI era.

Visit 176

Stop pretending you know what AI does to the economy .

Visit 469

Vibe scraping a conference’s schedule, built entirely on my phone.

Visit 390

Journey to the v2 of Notion’s official MCP server.

Visit 352

Why you should tell your LLM not to write long functions .

Visit 245

Everything you can do in the Replit workspace + How Replit went from $10M to $100M ARR in just 9 months.

Visit 238

Compressing context - How Factory deals with the limitations of context window.

Visit 200

An intro on building agents for ARG-AGI-3.

Visit 137

Sporks of AGI - Why the real thing is better than the next best thing

Visit 124

🗓️ Thurs 25 July 📍 Zoom 🎟️ Free (but spots are limited): Register here

Visit 83

Everything you can do in the Replit workspace + How Replit went from $10M to $100M ARR in just 9 months.

Visit 73

This interview just landed in my inbox, so I will be watching after this email is written… The AI-native startup : 5 products, 7-figure revenue, 100% AI-written code - Dan Shipper (Every)

Visit 705

The future of jobs and the economy in the age of AI - with Chief Economist and COO of OpenAI.

Visit 692

$200/mo products are defining the new category for narrow startups .

Visit 353

How Shopify built a culture for fast AI adoption .

Visit 344

Opportunities amidst the evolving AI adoption in the enterprise.

Visit 269

How and where will agents ship software ?

Visit 198

Reflections on working at a mature version of OpenAI (May 2024- June 2025)

Visit 145

Claude Code is all you need - Using CC for non-technical tasks. I’m doing this more and more now - I used it to help with a P&L and do research for an investment memo I am writing. It really is the agentic workflow they’ve built that is great at using tools.

Visit 792

Stop saying RAG is dead.

Visit 555

The rise of agentic commerce and Stripe’s role in it.

Visit 533

Context Rot - How increasing input tokens impacts LLM performance.

Visit 329

The architecture behind Lovable and Bolt . Despite the scary A word, it’s an easy read.

Visit 766

How to use Claude Code for notes & research .

Visit 707

How to spend your 20s in the AI era.

Visit 668

Crash course for improving your RAG implementation.

Visit 403

AI makes wishes real, be careful what you wish for.

Visit 366

Designing the AI future - control over memory, ads, openness, collaboration and more.

Visit 286

Google for a new internet —of tools and MCPs. Smithery (I invested) just hired a co-founder, Anirudh, who wrote this piece which is a really good read on how the internet works and what it could look like in the age of AI.

Visit 827

Langchain made a video on context engineering for agents . I really like this little graphic describing four parts of dealing with context.

Visit 745

Against brain damage - looking into the claims on how AI hurts our thinking.

Visit 569

Shreya’s thoughts on background agents .

Visit 481

Why Meta and Google learned to love art. This is an amusing read.

Visit 461

The 10-minute AGI-proof stress test

Visit 369

François Chollet at YC Startup school - How we get to AGI .

Visit 368

Anthropic’s proposal and framework for transparency in frontier AI.

Visit 316

I shipped a macOS app built entirely by Claude Code.

Visit 712

Coding agents 101 - A practical guide to using them for engineers.

Visit 460

The missing guide to subagents in Claude Code .

Visit 384

How to give Claude Code access to a browser that you can also use.

Visit 313

Tools you need to build claude code on your own.

Visit 296

via Jason Zhou

Visit 296

Walking away from Arc and building Dia as the AI native browser.

Visit 255

How Exa built its multi-agent web research system with LangGraph and LangSmith.

Visit 243

A comparison of open-source RL libraries for LLMs.

Visit 186

New work from Sakana AI lets models collaborate with each other to improve performance on the ARC-AGI-2 benchmark.

Visit 138

Tools you need to build claude code on your own.

Visit 74

via Jason Zhou

Visit 74

Iconiq Capital’s “ state of AI research ” report, based on a survey of 300+ executives in April 2025.

Visit 528

Handbook for building the future of consumer AI .

Visit 520

How AI agents are reshaping enterprise work .

Visit 458

I came across this benchmark that evaluates Gemini models on Mermaid diagram syntax. Creating these specific evals is one of the best ways to get noticed as an AI engineer in this market.

Visit 375

Vercel’s CEO on the change in software engineering, MCP and GUI for AI.

Visit 289

This report from MenloVC on how AI is faring amongst consumers in the US.

Visit 267

Using Claude Code to build a GitHub Actions workflow .

Visit 175

Building AI agents that actually automate knowledge work .

Visit 873

What hundreds of engineers building in AI are using, building and reading (take a guess)

Visit 472

via Amplify Partners

Visit 472

Build a personalized AI assistant with Postgres.

Visit 370

Combining the power of Cloudflare and OpenAI’s Agents SDKs.

Visit 152

Hacking OpenAI transcription costs by speeding up the audio.

Visit 149

A simple walkthrough of all the claude code commands .

Visit 452

How to vibe code as a senior engineer .

Visit 441

Command shortcuts to use on Dia for creative professionals .

Visit 397

Agentic search for dummies - a quick overview to understand how it differs from both embedding-based RAG and normal search.

Visit 391

New Anthropic research claims that all top LLMs will blackmail you to prevent their shutdown.

Visit 285

Two posts about context engineering - Rise of context engineering and Context makes AI magical .

Visit 272

o3 pro vibe check by Dan. I’ve been using claude code a lot these days, so I haven’t tested o3 pro much despite paying OpenAI $200. ps: I created this community on twitter to share tips and help each other for using claude code.

Visit 238

Two posts about context engineering - Rise of context engineering and Context makes AI magical .

Visit 226

Advice on building voice AI applications in June 2025.

Visit 222

Is AI the friend that never logs off ?

Visit 185

Visit 153

Elon Musk and Sam Altman’s talks from YC startup school.

Visit 127

Elon Musk and Sam Altman’s talks from YC startup school.

Visit 112

A lot of scepticism around “prompt engineering” was caused by calling hand-wavy tricks from non-technical folks “engineering”. Context Engineering, on the other hand, is emerging in the context of building testable systems and products to make LLMs useful, aimed at technical folks. As time has passed, “legit” prompt engineering jobs have started looking more like that already, but “context engineering” as a term is a nice Pokémon evolution. - Keshav

Visit 37

Does MCP kill vector search - why do you need to store embeddings if you can get real time data on demand.

Visit 532

The practical guide to onboard Claude Code to your team (+ some keyboard shortcuts for Claude Code) or if you prefer a video:

Visit 397

Difference between search for humans vs for AI .

Visit 351

What Google translate can tell us about vibecoding.

Visit 230

How to deploy a remote MCP server on Google Cloud , or a personal one on CloudFlare . From a community member, and super useful tutorial!

Visit 180

How to deploy a remote MCP server on Google Cloud , or a personal one on CloudFlare . From a community member, and super useful tutorial!

Visit 164

Pope Leo takes on AI as a potential threat to humanity. The piece has less about the new pope but it has a quick coverage of the interaction between tech companies and the Vatican over the last decade.

Visit 122

The practical guide to onboard Claude Code to your team (+ some keyboard shortcuts for Claude Code) or if you prefer a video:

Visit 97

How OpenAI's head of business products uses ChatGPT to save time at work.

Visit 737

If you’re still not on the Claude Code train, give this guide a read, but if you’re already burning tokens, here’s how to push it to its limits for more complex tasks.

Visit 697

Future of work with AI agents , based on a study of 1500 workers across 104 occupations.

Visit 587

A breakdown of bad AI writing patterns and what gets wrongly flagged as AI-generated.

Visit 570

Cursor’s CEO with Garry Tan. I like the part where Michael talks about niche software opportunities.

Visit 266

A conversation with the creators of the Model Context Protocol (MCP) .

Visit 234

Why we want robots at work , but humans in art.

Visit 205

According to a new Gallup poll, the number of workers who say they use AI at work has nearly doubled in the past year.

Visit 106

How to prompt Veo-3 for the best results.

Visit 695

Using claude code to ship like a team of 15—when you’re only a duo.

Visit 500

16 changes for AI in the enterprise —spending is growing and becoming permanent, and enterprises are testing and using multiple models.

Visit 493

How Vercel is adapting SEO for LLMs and AI search.

Visit 412

In consumer AI, momentum is the moat .

Visit 355

How a founder used AI to save his company from a two-year litigation nightmare.

Visit 322

Cursor’s team is on a podcast run. I see them everywhere, but this video with Anthropic is a good watch. I also highly recommend Ben Thompson’s interview with one of the co-founders, Michael.

Visit 301

Cursor’s team is on a podcast run. I see them everywhere, but this video with Anthropic is a good watch. I also highly recommend Ben Thompson’s interview with one of the co-founders, Michael.

Visit 230

How Intercom is building back using ai

Visit 204

The team at Every made LLMs compete in a game of Diplomacy . o3 and Gemini 2.5 Pro are the big dogs.

Visit 190

This professor is teaching national security and letting his students use AI . The blog has examples of student groups using AI and the professor's comments.

Visit 189

Here’s how I’m teaching my kids to use AI. This came from one of our members - it’s something I really want to think about more. I’m pro-AI for everything and want them to be AI native…but they’re only 2 (on thursday!!!)

Visit 917

A no hype vibe coding tutorial in 30 mins (BB members got this tutorial in March iykyk)

Visit 574

Reverse engineering Cursor's LLM client.

Visit 209

Seed rounds of all the AI Unicorns founded post-transformer.

Visit 191

Authors are now asking: How do I let AI train on my books?

Visit 171

This report claims that asking reasoners to “think step by step” only increases costs, not performance. But I think there’s merit to prompting these models elaborately to follow a custom reasoning policy beyond a simple CoT prompt.

Visit 139

Just before WWDC, researchers from Apple released this paper claiming that reasoning models don’t actually reason. But turns out, the models were failing a lot, partially because they weren’t thinking for long enough. I know nothing about research, but we know these models are better. Why not just use them to build a better Siri (which again was missing from WWDC)?

Visit 84

Jenson says the UK lacks digital infrastructure as Keir Starmer pledges £1bn for AI. The UK Gov is also partnering with Gemini to build a tool called Extract that turns old planning documents like blurry maps and handwritten notes into clear, digital data.

Visit 72

Visit 30

A practical guide to building agents by OpenAI.

Visit 786

The prompt engineering playbook for programmers. I like the examples more than the advice, and you should read them even if you’re vibecoding.

Visit 730

What if your company wiki was automatically written from all your meetings ?

Visit 552

Create a game in hours, not years.

Visit 548

Trends in AI by Mary Meeker and the Bond Capital team.

Visit 737

10 vibe coding ideas for GTM teams .

Visit 659

The recent history of AI in 32 otters.

Visit 497

Why I have slightly longer AGI timelines than some of my guests from Dwarkesh Patel.

Visit 231

Lovable has a security flaw when connecting to external databases.

Visit 146

AI eats the world by Ben Evans.

Visit 977

Vibe Coding is the Punk Rock of Software , says Rick Rubin, Pmarca, Ben and Ben.

Visit 444

State of AI in the Enterprise report from Box.

Visit 382

Vibe coding 101 : from idea to deployed app

Visit 932

short, sweet and practical intro into building AI agents (also free)

Visit 674

I did a rant on bad AI products after reading this essay from Pete Kooman a few weeks ago. But what’s the solution? He and two other YC partners made this video on how to design better AI apps

Visit 569

this (mostly technical) mcp course by Hugging Face

Visit 464

Sergey Brin on the future of AI and Google

Visit 416

why do we want to make AI models think by Lilian Weng (ex-openai, now cofounder of thinky with Mira Murati). If you want to develop an intuition about how thinking/reasoning models work (imp if you’re a founder), this is your guide

Visit 412

Another nice convo with YC president Garry Tan - building with and for AI

Visit 270

I did a rant on bad AI products after reading this essay from Pete Kooman a few weeks ago. But what’s the solution? He and two other YC partners made this video on how to design better AI apps

Visit 219

A formula for AI in companies .

Visit 830

I don’t have access to Google Flow, but these short films created with Flow are better than half the stuff on Netflix these days.

Visit 565

Functionality vs design - what comes first when building with AI?

Visit 237

This State of Talent Report from SignalFire - entry-level hiring is collapsing, elite AI labs are hunting and locking in top talent (Anthropic has 80% retention!), and Big Tech is slowing GTM hiring to prioritise technical roles.

Visit 208

Some interesting observations on Veo 3 generations and the weird nuances of creating dialogue-based videos with it.

Visit 149

What does it take to transform a company into an 'AI-native' one ?

Visit 649

Google's ex-CEO (who now has a secret AGI company) claims that AI is underhyped.

Visit 344

A chat between Aaron Levie (CEO of Box) and Kevin Weil (CPO of OpenAI) about AI agents in the enterprise .

Visit 257

A useful thread if you're looking to run AI locally: a list of recent ultra-small models , mostly under 1 billion parameters.

Visit 235

Replicate is making it easy for AI code editors and LLMs to use their APIs. Copy a model page as markdown or even create an llms.txt file for each model. simplifying for llms continues

Visit 98

João Moura, founder of Crew AI (I'm an investor), on what really matters for AI agents . Spoiler: it’s production readiness and good engineering, not just frameworks.

Visit 353

Josh, founder of The Browser Company (makers of ARC) shared some lessons from building Dia , their new AI browser. Chat is great, memory is hard and “context” is the secret key. Interesting timing with Perplexity's Comet browser supposedly launching in 3-5 weeks .

Visit 280

Which model should you choose in Cursor (and no, it’s not just ‘whatever one gets it to work’ - although, it is for a lot of users I’m sure. guilty)

Visit 272

This piece on AI knowing us too well asks if we really want AI to hold decades of our personal history.

Visit 251

Google's AI Futures Fund is now live, already backing 12 AI startups.

Visit 247

Deep dive into building and scaling ChatGPT Images with the OpenAI team.

Visit 156

Visit 91

DeepMind's AlphaEvolve designing advanced algorithms using Gemini, with new progress in open math problems, saving 0.7% of Google’s compute, making Gemini training 1% faster.

Visit 76

personalization of AI by Bojan Tunguz

Visit 621

Hassan’s ai apps always go viral. nice peek into his process from ideation to launch.

Visit 485

5 biggest problems with today’s conversational chatbpt design by Julie Zhuo

Visit 467

Thinking about automated news? This piece on building news agents covers how you might go about it. (yes, it includes MCP)

Visit 459

when software buys software - how do you sell a tool built to be used by AI

Visit 300

This fart sound generator on websim, because why not? You can generate all sorts of sounds on websim now. (i’m an investor)

Visit 277

Sakana AI Labs is out with something called Continuous Thought Machines ( tweet here ). The folks at Sakana keep comping up with these wild ideas. Related Q: should we do a post on all the new “AI labs” founded in last 1 year?

Visit 230

A good vibe check on Gemini 2.5 Pro and Flash by friends at Every.

Visit 226

Visit 69

do you meditate? maybe you should to work with LLMs .

Visit 470

100 startups for each of YC’s request for startups 2025.

Visit 260

Two solutions for messy B2B attribution .

Visit 247

Steph Smith (from the Hustle, a16z pod) is now leading Groq’s growth team.

Visit 161

Google is down 8% after Apple said it wants to move to “AI search” in Safari.

Visit 86

Andrej Karpathy’s review on his latest vibe-coded project

Visit 558

Tips for prompting to get good and accurate design from AI models.

Visit 494

A public CEO’s internal email about being an AI-first company

Visit 462

The rise of Cursor : The $300M ARR AI tool that engineers can’t stop using (now valued at $9B)

Visit 238

LLMs code by brute force, we shouldn’t be forcing them into structured code . Let them write whatever code they need to. I’m seeing this convo happen more and more.

Visit 213

Things Theo loves/hates about every AI model API .

Visit 211

Build a WordPress calculator plugin in 30 mins W/ Cursor, V0, and Google AI Studio, from one of our community 🙌

Visit 197

Gemini 2.5 Pro finished the game Pokémon Blue . The secret behind gemini’s performance is a good harness. Think of a harness as the app/system where the ai model is plugged in. In this case, the harness provides gemini raw data in addition to images from the game (valid, not cheating).

Visit 193

ChatGPT rolled back its ‘overly yes-man’ personality, and reviewed what went wrong

Visit 186

o3 is really good at Harvard Business School cases , and at translating greek poetry . These might be good ideas to build a wrapper—Jenni AI does millions in ARR, helping students write research papers. (and no I’m not stopping using em dashes just coz chatgpt uses them)

Visit 159

MCPs enabling better support agents for Intercom

Visit 138

Logan and a Gemini researcher talk about long context in AI models : RAG, making the whole of it useful, cost and what’s next. tldr for that: cheap long context is coming first, then 10M context, 100M needs more research.

Visit 117

three highlights about the demand for LLMs on openrouter - new models get adopted fast, they replace old models as well as expand the market, many apps use multiple models (from different labs too).

Visit 76

Visit 46

You can just vibe-code agents now. i tested this a couple times and i like the ‘flow’ feature of seeing your code in a workflow/canvas style - next extension of this i’d want to see/use is to edit the canvas ‘actually do it like this’, i also think they should not ask for your ai api keys and just let users use theirs+markup. It’s basically a create/lovable/bolt interface for agents. tbd if it’s the right one (vs ‘canvases’)

Visit 775

A great post on ‘ Did notebookLM become way better ’ - from a bb member, notebookLM was probably the only launch from the big G that I liked loved, but I haven’t stuck with it longer term. I really don’t know if it’s Google’s inability to put everything in one place (Gemini is over here, new models over here, notebooklm here, but also kinda different, etc) BUT Josh at the helm + Logan (dev-rel sensation) gives me confidence in big G. People > products

Visit 350

nbd but just a $13bn co publicly showing how claude code ; implemented 1M+ lines of AI code in 30 days, with 50% WAUsage and 80% reduction in incidents… Ramp’s one of those companies that I haven’t followed too closely (being a Brit) but it’s hard to ignore; their shipping mentality and the data ‘exhaust’ from their biz i.e. where are all startups spending their money, how much, and how quickly is that ‘ramp’ing up.

Visit 344

How DoorDash plugs in LLMs for better search retrieval.

Visit 199

Build your own computer use tool with Vercel’s AI SDK template .

Visit 180

An experiment routing different LLMs based on the complexity of the prompt.

Visit 172

Visit 96

Visit 66

Ben (not me) made the popular “ anatomy of an o1 prompt ” image. he has a new workflow for using o3 .

Visit 687

Ben (not me) made the popular “ anatomy of an o1 prompt ” image. he has a new workflow for using o3 .

Visit 629

How this guy vibe coded a game (he’s never made one before). I found this v interesting as I’m planning a game for my kids.

Visit 613

A fully automated end-to-end bug fixing worfklow in cursor (imma have to try this)

Visit 541

Creating a ‘time-travel’ photo app .

Visit 536

How to get the most out of vibe coding .

Visit 524

Aaron Levie constantly seems switched on with AI - so it was interesting to see what he said about if he was starting a company today ; reimagining operations, new business models and ability to do more “long tail” work (ie all the nice to have features)

Visit 470

Duolingo’s CEO goes public with his AI-first plan .

Visit 456

and/or this new tool mrge - ai code reviews

Visit 300

Building a coding agent from scratch.

Visit 292

An interview with Rahul from Julius AI (i’m an investor).

Visit 153

One of my latest investments, Smithery, is hiring a founding engineer .

Visit 123

Exa released a “ webset ” of 500 companies in the AI Agent space with funding stats, market strategy and more.

Visit 602

an alternative approach to building in Cursor as a vibe-coder

Visit 287

If you’re building an AI app, you need to cheat a little. Well, not technically, I just mean you need to “look at the data” you’re processing and generating. All top AI engineers are saying it.

Visit 285

why openai actually wanted to buy cursor instead of windsurf (apparently)

Visit 217

Lenny launched an AI podcast hosted by Claire Vo. do we need to do a podcast club?

Visit 141

The White House is planning to bring AI to K-12 classrooms .

Visit 125

Character AI will soon have videos. They trained a video generation model— Avatar FX . It’s not available to use yet. And I’m sure you can guess what lots of people will generate… but I like the otter video..

Visit 80

OpenAI projects $125B of revenue in 2029 , too little if they were done with automating all knowledge work. Anthropic is also claiming " virtual workers " in your office by 2026.

Visit 76

OpenAI projects $125B of revenue in 2029 , too little if they were done with automating all knowledge work. Anthropic is also claiming " virtual workers " in your office by 2026.

Visit 42

Can AI run a vending machine business ? This was such an interesting read.

Visit 400

We always read what Ethan has to say on new models .

Visit 221

AI-assisted search actually works now (almost).

Visit 213

Building Windsurf and the magic of AI coding. (remember it’s being bought by openai)

Visit 207

Cline is like an AI coding agent in Cursor (like Devin), so i’ve been testing it. They just released their full system prompt which is interesting to analyse for your own prompting. Funnily, they did it as lots of proprietary LLM instructions have been leaked . Garry Tan one-shotted Manus for an online guide .

Visit 178

Visit 174

This was a really good read on putting AI agents to the test , Dex feels like most ‘agents’ are not actually agentic. So what makes AI agents actually good enough?

Visit 135

Visit 122

Vercel’s AI SDK - this video has the complete breakdown, how it works and cloning Deep Research in 30 mins.

Visit 101

Cursor’s latest release includes a bunch of (very welcomed) features. I’m excited by; automated rules, images in MCP, improved agent, and project structure in context.

Visit 85

Google released quantised variants of Gemma 3 (its open-source models).

Visit 46

Welcome to Ben's Bites

Latest Newsletters

Agents That Keep Running

Claude Code For Everybody

Dr Chatgpt Will See You Now

How I Code With Agents Without Being

A Great Time To Be A Builder

Cheap Intelligence Expensive Ai

Gpt 5 Doesnt Suck Anymore

Googles Secret Kitchen

Top 0 AI Links This Week

What Ben's Consuming