Welcome to Ben's Bites
Discover the latest AI news and curated content
Latest Newsletters
Top 0 AI Links This Week
What Ben's Consuming
This interview just landed in my inbox, so I will be watching after this email is written… The AI-native startup : 5 products, 7-figure revenue, 100% AI-written code - Dan Shipper (Every)
The future of jobs and the economy in the age of AI - with Chief Economist and COO of OpenAI.
$200/mo products are defining the new category for narrow startups .
How Shopify built a culture for fast AI adoption .
Opportunities amidst the evolving AI adoption in the enterprise.
How and where will agents ship software ?
Reflections on working at a mature version of OpenAI (May 2024- June 2025)
Claude Code is all you need - Using CC for non-technical tasks. I’m doing this more and more now - I used it to help with a P&L and do research for an investment memo I am writing. It really is the agentic workflow they’ve built that is great at using tools.
Stop saying RAG is dead.
The rise of agentic commerce and Stripe’s role in it.
Context Rot - How increasing input tokens impacts LLM performance.
The architecture behind Lovable and Bolt . Despite the scary A word, it’s an easy read.
How to use Claude Code for notes & research .
How to spend your 20s in the AI era.
Crash course for improving your RAG implementation.
AI makes wishes real, be careful what you wish for.
Designing the AI future - control over memory, ads, openness, collaboration and more.
Google for a new internet —of tools and MCPs. Smithery (I invested) just hired a co-founder, Anirudh, who wrote this piece which is a really good read on how the internet works and what it could look like in the age of AI.
Langchain made a video on context engineering for agents . I really like this little graphic describing four parts of dealing with context.
Against brain damage - looking into the claims on how AI hurts our thinking.
Shreya’s thoughts on background agents .
Why Meta and Google learned to love art. This is an amusing read.
François Chollet at YC Startup school - How we get to AGI .
The 10-minute AGI-proof stress test
Anthropic’s proposal and framework for transparency in frontier AI.
I shipped a macOS app built entirely by Claude Code.
Coding agents 101 - A practical guide to using them for engineers.
The missing guide to subagents in Claude Code .
How to give Claude Code access to a browser that you can also use.
Tools you need to build claude code on your own.
via Jason Zhou
Walking away from Arc and building Dia as the AI native browser.
How Exa built its multi-agent web research system with LangGraph and LangSmith.
A comparison of open-source RL libraries for LLMs.
New work from Sakana AI lets models collaborate with each other to improve performance on the ARC-AGI-2 benchmark.
Tools you need to build claude code on your own.
via Jason Zhou
Iconiq Capital’s “ state of AI research ” report, based on a survey of 300+ executives in April 2025.
Handbook for building the future of consumer AI .
How AI agents are reshaping enterprise work .
I came across this benchmark that evaluates Gemini models on Mermaid diagram syntax. Creating these specific evals is one of the best ways to get noticed as an AI engineer in this market.
Vercel’s CEO on the change in software engineering, MCP and GUI for AI.
This report from MenloVC on how AI is faring amongst consumers in the US.
Using Claude Code to build a GitHub Actions workflow .
Building AI agents that actually automate knowledge work .
What hundreds of engineers building in AI are using, building and reading (take a guess)
via Amplify Partners
Build a personalized AI assistant with Postgres.
Combining the power of Cloudflare and OpenAI’s Agents SDKs.
Hacking OpenAI transcription costs by speeding up the audio.
A simple walkthrough of all the claude code commands .
How to vibe code as a senior engineer .
Command shortcuts to use on Dia for creative professionals .
Agentic search for dummies - a quick overview to understand how it differs from both embedding-based RAG and normal search.
New Anthropic research claims that all top LLMs will blackmail you to prevent their shutdown.
Two posts about context engineering - Rise of context engineering and Context makes AI magical .
o3 pro vibe check by Dan. I’ve been using claude code a lot these days, so I haven’t tested o3 pro much despite paying OpenAI $200. ps: I created this community on twitter to share tips and help each other for using claude code.
Two posts about context engineering - Rise of context engineering and Context makes AI magical .
Advice on building voice AI applications in June 2025.
Is AI the friend that never logs off ?
o3 pro vibe check by Dan. I’ve been using claude code a lot these days, so I haven’t tested o3 pro much despite paying OpenAI $200. ps: I created this community on twitter to share tips and help each other for using claude code.
Elon Musk and Sam Altman’s talks from YC startup school.
Elon Musk and Sam Altman’s talks from YC startup school.
A lot of scepticism around “prompt engineering” was caused by calling hand-wavy tricks from non-technical folks “engineering”. Context Engineering, on the other hand, is emerging in the context of building testable systems and products to make LLMs useful, aimed at technical folks. As time has passed, “legit” prompt engineering jobs have started looking more like that already, but “context engineering” as a term is a nice Pokémon evolution. - Keshav
Does MCP kill vector search - why do you need to store embeddings if you can get real time data on demand.
The practical guide to onboard Claude Code to your team (+ some keyboard shortcuts for Claude Code) or if you prefer a video:
Difference between search for humans vs for AI .
What Google translate can tell us about vibecoding.
How to deploy a remote MCP server on Google Cloud , or a personal one on CloudFlare . From a community member, and super useful tutorial!
How to deploy a remote MCP server on Google Cloud , or a personal one on CloudFlare . From a community member, and super useful tutorial!
Pope Leo takes on AI as a potential threat to humanity. The piece has less about the new pope but it has a quick coverage of the interaction between tech companies and the Vatican over the last decade.
The practical guide to onboard Claude Code to your team (+ some keyboard shortcuts for Claude Code) or if you prefer a video:
How OpenAI's head of business products uses ChatGPT to save time at work.
If you’re still not on the Claude Code train, give this guide a read, but if you’re already burning tokens, here’s how to push it to its limits for more complex tasks.
Future of work with AI agents , based on a study of 1500 workers across 104 occupations.
A breakdown of bad AI writing patterns and what gets wrongly flagged as AI-generated.
Cursor’s CEO with Garry Tan. I like the part where Michael talks about niche software opportunities.
A conversation with the creators of the Model Context Protocol (MCP) .
Why we want robots at work , but humans in art.
According to a new Gallup poll, the number of workers who say they use AI at work has nearly doubled in the past year.
How to prompt Veo-3 for the best results.
Using claude code to ship like a team of 15—when you’re only a duo.
16 changes for AI in the enterprise —spending is growing and becoming permanent, and enterprises are testing and using multiple models.
How Vercel is adapting SEO for LLMs and AI search.
In consumer AI, momentum is the moat .
How a founder used AI to save his company from a two-year litigation nightmare.
Cursor’s team is on a podcast run. I see them everywhere, but this video with Anthropic is a good watch. I also highly recommend Ben Thompson’s interview with one of the co-founders, Michael.
Cursor’s team is on a podcast run. I see them everywhere, but this video with Anthropic is a good watch. I also highly recommend Ben Thompson’s interview with one of the co-founders, Michael.
How Intercom is building back using ai
The team at Every made LLMs compete in a game of Diplomacy . o3 and Gemini 2.5 Pro are the big dogs.
This professor is teaching national security and letting his students use AI . The blog has examples of student groups using AI and the professor's comments.
Here’s how I’m teaching my kids to use AI. This came from one of our members - it’s something I really want to think about more. I’m pro-AI for everything and want them to be AI native…but they’re only 2 (on thursday!!!)
A no hype vibe coding tutorial in 30 mins (BB members got this tutorial in March iykyk)
Reverse engineering Cursor's LLM client.
Seed rounds of all the AI Unicorns founded post-transformer.
Authors are now asking: How do I let AI train on my books?
This report claims that asking reasoners to “think step by step” only increases costs, not performance. But I think there’s merit to prompting these models elaborately to follow a custom reasoning policy beyond a simple CoT prompt.
Just before WWDC, researchers from Apple released this paper claiming that reasoning models don’t actually reason. But turns out, the models were failing a lot, partially because they weren’t thinking for long enough. I know nothing about research, but we know these models are better. Why not just use them to build a better Siri (which again was missing from WWDC)?
Jenson says the UK lacks digital infrastructure as Keir Starmer pledges £1bn for AI. The UK Gov is also partnering with Gemini to build a tool called Extract that turns old planning documents like blurry maps and handwritten notes into clear, digital data.
Jenson says the UK lacks digital infrastructure as Keir Starmer pledges £1bn for AI. The UK Gov is also partnering with Gemini to build a tool called Extract that turns old planning documents like blurry maps and handwritten notes into clear, digital data.
A practical guide to building agents by OpenAI.
The prompt engineering playbook for programmers. I like the examples more than the advice, and you should read them even if you’re vibecoding.
What if your company wiki was automatically written from all your meetings ?
Create a game in hours, not years.
Trends in AI by Mary Meeker and the Bond Capital team.
10 vibe coding ideas for GTM teams .
The recent history of AI in 32 otters.
Why I have slightly longer AGI timelines than some of my guests from Dwarkesh Patel.
Lovable has a security flaw when connecting to external databases.
AI eats the world by Ben Evans.
Vibe Coding is the Punk Rock of Software , says Rick Rubin, Pmarca, Ben and Ben.
State of AI in the Enterprise report from Box.
Vibe coding 101 : from idea to deployed app
short, sweet and practical intro into building AI agents (also free)
I did a rant on bad AI products after reading this essay from Pete Kooman a few weeks ago. But what’s the solution? He and two other YC partners made this video on how to design better AI apps
this (mostly technical) mcp course by Hugging Face
Sergey Brin on the future of AI and Google
why do we want to make AI models think by Lilian Weng (ex-openai, now cofounder of thinky with Mira Murati). If you want to develop an intuition about how thinking/reasoning models work (imp if you’re a founder), this is your guide
Another nice convo with YC president Garry Tan - building with and for AI
I did a rant on bad AI products after reading this essay from Pete Kooman a few weeks ago. But what’s the solution? He and two other YC partners made this video on how to design better AI apps
A formula for AI in companies .
I don’t have access to Google Flow, but these short films created with Flow are better than half the stuff on Netflix these days.
Functionality vs design - what comes first when building with AI?
This State of Talent Report from SignalFire - entry-level hiring is collapsing, elite AI labs are hunting and locking in top talent (Anthropic has 80% retention!), and Big Tech is slowing GTM hiring to prioritise technical roles.
Some interesting observations on Veo 3 generations and the weird nuances of creating dialogue-based videos with it.
What does it take to transform a company into an 'AI-native' one ?
Google's ex-CEO (who now has a secret AGI company) claims that AI is underhyped.
A chat between Aaron Levie (CEO of Box) and Kevin Weil (CPO of OpenAI) about AI agents in the enterprise .
A useful thread if you're looking to run AI locally: a list of recent ultra-small models , mostly under 1 billion parameters.
Replicate is making it easy for AI code editors and LLMs to use their APIs. Copy a model page as markdown or even create an llms.txt file for each model. simplifying for llms continues
João Moura, founder of Crew AI (I'm an investor), on what really matters for AI agents . Spoiler: it’s production readiness and good engineering, not just frameworks.
Josh, founder of The Browser Company (makers of ARC) shared some lessons from building Dia , their new AI browser. Chat is great, memory is hard and “context” is the secret key. Interesting timing with Perplexity's Comet browser supposedly launching in 3-5 weeks .
Which model should you choose in Cursor (and no, it’s not just ‘whatever one gets it to work’ - although, it is for a lot of users I’m sure. guilty)
This piece on AI knowing us too well asks if we really want AI to hold decades of our personal history.
Google's AI Futures Fund is now live, already backing 12 AI startups.
Josh, founder of The Browser Company (makers of ARC) shared some lessons from building Dia , their new AI browser. Chat is great, memory is hard and “context” is the secret key. Interesting timing with Perplexity's Comet browser supposedly launching in 3-5 weeks .
DeepMind's AlphaEvolve designing advanced algorithms using Gemini, with new progress in open math problems, saving 0.7% of Google’s compute, making Gemini training 1% faster.
personalization of AI by Bojan Tunguz
Hassan’s ai apps always go viral. nice peek into his process from ideation to launch.
5 biggest problems with today’s conversational chatbpt design by Julie Zhuo
Thinking about automated news? This piece on building news agents covers how you might go about it. (yes, it includes MCP)
when software buys software - how do you sell a tool built to be used by AI
This fart sound generator on websim, because why not? You can generate all sorts of sounds on websim now. (i’m an investor)
Sakana AI Labs is out with something called Continuous Thought Machines ( tweet here ). The folks at Sakana keep comping up with these wild ideas. Related Q: should we do a post on all the new “AI labs” founded in last 1 year?
A good vibe check on Gemini 2.5 Pro and Flash by friends at Every.
Sakana AI Labs is out with something called Continuous Thought Machines ( tweet here ). The folks at Sakana keep comping up with these wild ideas. Related Q: should we do a post on all the new “AI labs” founded in last 1 year?
do you meditate? maybe you should to work with LLMs .
100 startups for each of YC’s request for startups 2025.
Two solutions for messy B2B attribution .
Steph Smith (from the Hustle, a16z pod) is now leading Groq’s growth team.
Google is down 8% after Apple said it wants to move to “AI search” in Safari.
Andrej Karpathy’s review on his latest vibe-coded project
Tips for prompting to get good and accurate design from AI models.
A public CEO’s internal email about being an AI-first company
The rise of Cursor : The $300M ARR AI tool that engineers can’t stop using (now valued at $9B)
LLMs code by brute force, we shouldn’t be forcing them into structured code . Let them write whatever code they need to. I’m seeing this convo happen more and more.
Things Theo loves/hates about every AI model API .
Build a WordPress calculator plugin in 30 mins W/ Cursor, V0, and Google AI Studio, from one of our community 🙌
Gemini 2.5 Pro finished the game Pokémon Blue . The secret behind gemini’s performance is a good harness. Think of a harness as the app/system where the ai model is plugged in. In this case, the harness provides gemini raw data in addition to images from the game (valid, not cheating).
ChatGPT rolled back its ‘overly yes-man’ personality, and reviewed what went wrong
o3 is really good at Harvard Business School cases , and at translating greek poetry . These might be good ideas to build a wrapper—Jenni AI does millions in ARR, helping students write research papers. (and no I’m not stopping using em dashes just coz chatgpt uses them)
MCPs enabling better support agents for Intercom
Logan and a Gemini researcher talk about long context in AI models : RAG, making the whole of it useful, cost and what’s next. tldr for that: cheap long context is coming first, then 10M context, 100M needs more research.
three highlights about the demand for LLMs on openrouter - new models get adopted fast, they replace old models as well as expand the market, many apps use multiple models (from different labs too).
o3 is really good at Harvard Business School cases , and at translating greek poetry . These might be good ideas to build a wrapper—Jenni AI does millions in ARR, helping students write research papers. (and no I’m not stopping using em dashes just coz chatgpt uses them)
You can just vibe-code agents now. i tested this a couple times and i like the ‘flow’ feature of seeing your code in a workflow/canvas style - next extension of this i’d want to see/use is to edit the canvas ‘actually do it like this’, i also think they should not ask for your ai api keys and just let users use theirs+markup. It’s basically a create/lovable/bolt interface for agents. tbd if it’s the right one (vs ‘canvases’)
A great post on ‘ Did notebookLM become way better ’ - from a bb member, notebookLM was probably the only launch from the big G that I liked loved, but I haven’t stuck with it longer term. I really don’t know if it’s Google’s inability to put everything in one place (Gemini is over here, new models over here, notebooklm here, but also kinda different, etc) BUT Josh at the helm + Logan (dev-rel sensation) gives me confidence in big G. People > products
nbd but just a $13bn co publicly showing how claude code ; implemented 1M+ lines of AI code in 30 days, with 50% WAUsage and 80% reduction in incidents… Ramp’s one of those companies that I haven’t followed too closely (being a Brit) but it’s hard to ignore; their shipping mentality and the data ‘exhaust’ from their biz i.e. where are all startups spending their money, how much, and how quickly is that ‘ramp’ing up.
How DoorDash plugs in LLMs for better search retrieval.
Build your own computer use tool with Vercel’s AI SDK template .
An experiment routing different LLMs based on the complexity of the prompt.
A great post on ‘ Did notebookLM become way better ’ - from a bb member, notebookLM was probably the only launch from the big G that I liked loved, but I haven’t stuck with it longer term. I really don’t know if it’s Google’s inability to put everything in one place (Gemini is over here, new models over here, notebooklm here, but also kinda different, etc) BUT Josh at the helm + Logan (dev-rel sensation) gives me confidence in big G. People > products
A great post on ‘ Did notebookLM become way better ’ - from a bb member, notebookLM was probably the only launch from the big G that I liked loved, but I haven’t stuck with it longer term. I really don’t know if it’s Google’s inability to put everything in one place (Gemini is over here, new models over here, notebooklm here, but also kinda different, etc) BUT Josh at the helm + Logan (dev-rel sensation) gives me confidence in big G. People > products
Ben (not me) made the popular “ anatomy of an o1 prompt ” image. he has a new workflow for using o3 .
Ben (not me) made the popular “ anatomy of an o1 prompt ” image. he has a new workflow for using o3 .
How this guy vibe coded a game (he’s never made one before). I found this v interesting as I’m planning a game for my kids.
A fully automated end-to-end bug fixing worfklow in cursor (imma have to try this)
Creating a ‘time-travel’ photo app .
How to get the most out of vibe coding .
Aaron Levie constantly seems switched on with AI - so it was interesting to see what he said about if he was starting a company today ; reimagining operations, new business models and ability to do more “long tail” work (ie all the nice to have features)
Duolingo’s CEO goes public with his AI-first plan .
and/or this new tool mrge - ai code reviews
Building a coding agent from scratch.
An interview with Rahul from Julius AI (i’m an investor).
One of my latest investments, Smithery, is hiring a founding engineer .
Exa released a “ webset ” of 500 companies in the AI Agent space with funding stats, market strategy and more.
an alternative approach to building in Cursor as a vibe-coder
If you’re building an AI app, you need to cheat a little. Well, not technically, I just mean you need to “look at the data” you’re processing and generating. All top AI engineers are saying it.
why openai actually wanted to buy cursor instead of windsurf (apparently)
Lenny launched an AI podcast hosted by Claire Vo. do we need to do a podcast club?
The White House is planning to bring AI to K-12 classrooms .
Character AI will soon have videos. They trained a video generation model— Avatar FX . It’s not available to use yet. And I’m sure you can guess what lots of people will generate… but I like the otter video..
OpenAI projects $125B of revenue in 2029 , too little if they were done with automating all knowledge work. Anthropic is also claiming " virtual workers " in your office by 2026.
OpenAI projects $125B of revenue in 2029 , too little if they were done with automating all knowledge work. Anthropic is also claiming " virtual workers " in your office by 2026.
Can AI run a vending machine business ? This was such an interesting read.
We always read what Ethan has to say on new models .
AI-assisted search actually works now (almost).
Building Windsurf and the magic of AI coding. (remember it’s being bought by openai)
Cline is like an AI coding agent in Cursor (like Devin), so i’ve been testing it. They just released their full system prompt which is interesting to analyse for your own prompting. Funnily, they did it as lots of proprietary LLM instructions have been leaked . Garry Tan one-shotted Manus for an online guide .
Cline is like an AI coding agent in Cursor (like Devin), so i’ve been testing it. They just released their full system prompt which is interesting to analyse for your own prompting. Funnily, they did it as lots of proprietary LLM instructions have been leaked . Garry Tan one-shotted Manus for an online guide .
This was a really good read on putting AI agents to the test , Dex feels like most ‘agents’ are not actually agentic. So what makes AI agents actually good enough?
Cline is like an AI coding agent in Cursor (like Devin), so i’ve been testing it. They just released their full system prompt which is interesting to analyse for your own prompting. Funnily, they did it as lots of proprietary LLM instructions have been leaked . Garry Tan one-shotted Manus for an online guide .
Vercel’s AI SDK - this video has the complete breakdown, how it works and cloning Deep Research in 30 mins.
Cursor’s latest release includes a bunch of (very welcomed) features. I’m excited by; automated rules, images in MCP, improved agent, and project structure in context.
Google released quantised variants of Gemma 3 (its open-source models).