収録時間: 1:36:51 | Download MP3 (70.2MB)
Jesse Vincent joins me to talk about Superpowers, AI, Skills, MCP, Claude Code, Codex and more.
日本語字幕付き動画をYouTubeで公開しています。
- Rebuild: 9: Making your own keyboard (Jesse Vincent)
- obra/superpowers
- Homer Simpson Car
- Obsidian
- matthartman/ghost-pepper: 100% private on-device voice models for speech-to-text
- Wordiest
- Clearance: A Markdown viewer for macOS
- sosumi.ai - Apple Docs for LLMs
- NanoClaw
- When it comes to MCPs, everything we know about API design is wrong
- superpowers/CLAUDE.md
- Superpowers 5: Visual Companion
- Dorodango
- roborev
- Codex for (almost) everything | OpenAI
- Anthropic tests how devs react to yanking Claude Code from Pro plan
- SpaceX Strikes Deal With Cursor for $60 Billion
- obra/claude-docs-setup
- Two Pizza Team
- Every layer of review makes you 10x slower
miyagawa: Jesse Vincent, welcome back to the show. [00:00:00]
jesse: Thanks so much for having me, it's been a minute. [00:00:02]
miyagawa: It's been a while. The last time you were on my show was episode 9 of this podcast, and it was about 13 years ago. [00:00:05]
jesse: I knew it was over 10. I didn't know it was 13. Wow. [00:00:17]
miyagawa: I didn't have the courage to listen to the show because usually I do not find it embarrassing, but if it's more than ten years ago, it's unbearable to listen to myself speaking. [00:00:22]
jesse: It was a long time ago. I can't listen to myself on podcasts ever, even if they're current. [00:00:40]
miyagawa: The last time we talked, it was even before you started doing the Kickstarter campaign for the first ever keyboard. [00:00:47]
jesse: Yeah. I can't remember if we were talking about Perl stuff or keyboards when it was a hobby. [00:00:58]
miyagawa: I think it was the keyboard when you were preparing for the first Kickstarter campaign. [00:01:05]
jesse: Wow. Yeah, we went through four keyboard Kickstarters. A couple of them were seven figure US dollars. The whole thing was crazy. There was a period when we discovered that our factory salesperson was a scammer who was scamming us and scamming the factory. Nobody in China believed it. I've got great lawyer recommendations in Shenzhen. I had a whole hardware work career and for the last year I've been very nose down doing AI stuff. [00:01:11]
miyagawa: That's a great transformation I think. I was thinking about what to talk about and I think it's fascinating that you and I share some attributes in common and one of those is that we were at a certain point of time known for being a Perl developer and then now being known to different people for very different things. [00:01:48]
miyagawa: I'm especially for the listenership of the show, I'm better known as a podcaster rather than an engineer. And you've come from a perl developer, a leadership of the Perl programming language and then a creator of the keyboard and then I don't know what it is, what is it now. [00:02:17]
jesse: Now I think it is Superpowers. You know it's very weird to have something that I originally put together as really a demo of how I was doing AI dev and me packaging up the stuff I was already doing so that I didn't have to keep typing the same things over and over again. [00:02:45]
miyagawa: You are now a kind of a thought leader in the AI industry I think. [00:03:06]
jesse: Apparently yeah, apparently Microsoft claimed on stage at GitHub universe that I was an AI coding pioneer. And that might have I can't remember if that was before Superpowers or not, but it might have been. [00:03:12]
miyagawa: I think it was around the same time. I saw the last year in Moscone, GitHub universe conference and your picture is up on the screen along with Simon Willison, and other people. [00:03:27]
jesse: And Angie who ran Goose originally for Block and now for the Linux foundation. Yeah. [00:03:43]
miyagawa: What a time to be alive. Let's dive through that. I don't know where to start. [00:03:56]
jesse: One of the places that I sometimes start is talking about the first time that I did something that looks like agentic dev and how hard it was for me to adapt. What I was doing was helping somebody with debugging, helping somebody else with project planning, reviewing somebody else's code, and I come home at the end of the day and I'm like I have not done any real work. [00:04:00]
jesse: Because at the time I was a working programmer and I was spending a lot of time typing into a terminal window and these entities on the other side of the terminal window would be the ones who were actually doing the work. And it felt really weird because up till then my job had been lines of code. As it turns out, this was almost twenty years ago. These weren't agents. These were MIT interns. I was managing them through IRC and they were really smart. Some of them thought they were better than they were at coding, but they were all quite good. They were often very junior. They didn't have great taste mostly. Some of their judgment was a little bit suspect. They weren't sleeping very well, so they were having trouble forming memories. [00:04:30]
jesse: I had developed all of these essentially management hacks to get good productivity out of them. And some of them have gone on to be really influential engineers, really capable people, but at the time they were kids. And it turns out that a lot of these management hacks that I figured out for helping junior engineers be productive on a small team work really well for AI. And when I started learning how to use Claude Code, I realized that I was pulling out the same tricks and, you know, they were helping figure out what needs to get done and then break the task into tiny little pieces and then do one task, do the code review, make sure that you're always doing red green test-driven development, teaching them about YAGNI, you ain't gonna need it, and DRY, don't repeat yourself. And that is the core of what the Superpowers engineering loop is and it works well for agents, just like it works well for people. [00:05:31]
miyagawa: When you talk about these junior engineers twenty years ago, a lot of the descriptions of these junior engineers feel really similar to how you describe AI today or maybe a year ago. I think this landscape is changing a lot. [00:06:45]
jesse: Every month it's different. But all of the tools are getting better, but you still need to break down problems and it is still the case that pretty much every tool will do what you ask it to do. And the problem is for the most part that humans are really bad at asking for what they need. They ask for what they think they want without really thinking about it. [00:07:02]
jesse: One of the things that seems to make some people much better at using AI is the management experience of being able to think about what you want and explain it clearly before you start. It is the difference between you open up Claude Code and you say let's make a React to-do list, and what it's gonna do is go make a React to-do list, but what you want it to do is say hang on a second. And why do you wanna do that? What's the point? [00:07:32]
miyagawa: Did you figure that out almost immediately when you started using Claude Code? Or is that something that you learned over time? [00:08:09]
jesse: I would say that it was within probably within a month I got to the core of it but it was also that intent thing comes from my consulting career where I would walk into these big companies and it was consulting for my open source product which was RT this ticketing system and the clients would ask me to do something to the product for their organization for their enterprise and usually what they were asking me to do was essentially fix a symptom of a business problem. And when we talked for a while, it would turn out that the real problem was something completely different and there was a much better way to solve it. And so I learned a long time ago that when somebody tells you the solution they want, it is almost always just another way of them describing their problem. But when I picked up Claude Code the first time, I had an idea for a thing I wanted to make, and I described it, and Claude went off and built this crazy monstrosity that had every possible feature you could possibly want. They weren't particularly well implemented, because this was the first day. This was the first day that Claude Code existed. [00:08:20]
miyagawa: that was like April or May last year. [00:09:35]
jesse: I think it was late February. I think it was Feb 20 if I remember it right. Anthropic had a birthday party for Claude Code recently. [00:09:37]
miyagawa: Ah that's right. [00:09:40]
jesse: Sometimes gets described in English as a Homer Simpson car because there's this episode of the Simpsons where Homer got to design a car and it had everything. It has six radios, dome top, antennas and screens, and Claude is really good at all the features. In fact most of the AIs are. And it used to be that it was hard to get all the stuff that you might possibly want into the product, and now the hard thing is telling it wait, stop, I don't need that. [00:09:48]
miyagawa: That's interesting. I think I started using Claude Code. I think it's about April or whatever a public beta was announced. I started using it for non-coding stuff for really early in the day, so the first thing I did with Claude Code was to, I think there was I was using Heroku and I still do, and Heroku emailed me that the version of Postgres that I was using is going to sunset so I need to upgrade to a newer version of Postgres and here's a document how to do it. Good luck. That was the email. And then I thought this may be a good use of Claude Code because the agent can run on terminal. It can run any commands that I needed to run. So what I did was to download the manual, the web page as a markdown and copy it, paste it into the session and let it plan. I don't know if the plan mode was there at the time, but basically write down the Markdown to describe what the plan is and I will review it and execute from there. So I think for some reason I was kind of getting into this plan based execution flow from very early days, for some reason. [00:10:22]
jesse: That's good. Turns out it's a really good way to keep the agents on track. And yeah, that sounds pretty early for realizing that Claude Code was good for other things you can do in a shell. [00:11:46]
miyagawa: Yeah, exactly. [00:11:59]
jesse: It's been really interesting watching how different people engage with it and find interesting and different uses for it. [00:12:00]
miyagawa: I will probably talk about this later but, over time since then the way I use coding agent, especially Claude Code, has changed quite a lot. Early in the days I was very micromanaging the way the coding agent writes code, and I think the way these skills work kind of enhances the micro management aspect of it, without me having to do the micromanagement, if that makes sense. [00:12:10]
jesse: Yeah, part of the design is trying to do as much planning as you can upfront, so that at the point where the agent is ready to go, you don't have to supervise it quite as tightly. [00:12:47]
miyagawa: Almost sounds like we are doing the waterfall development. [00:13:01]
jesse: That's one of the things that I've been thinking about a lot at work is what AI native methodologies look like. We've gone from a whole 50 to 70 years of software engineering experience where we got away from waterfall to agile to a variety of things and now a lot of agentic dev feels like waterfall but really fast. And that is kind of neat because I no longer feel bad about trying something, realizing that I have designed it completely wrong or have told the agent to do the wrong thing and just starting over. [00:13:05]
jesse: But I don't think it's necessarily the right way to build forever. I've been playing with a new set of skills for iterative development instead of these superpowers style, plan everything upfront. [00:13:54]
jesse: My experience is like Superpowers plans can get to somewhere like 30 or 40K before I start to worry about whether they can be executed well and with the new tools I've been taking up to 600K of specs and been able to generate an app from those specs that doesn't drop requirements. [00:14:17]
miyagawa: Okay. [00:14:42]
jesse: So this is the far side of the set of tools I built to reverse engineer things. So it started off as something to do adversarial reverse engineering where you can have it look at any product where you can get source or obfuscated source and create behavioral specs that don't include anything about the code, but just about what's supposed to happen, how the pieces might fit together, how it would get used, user journeys. One of the first places that I use this was Obsidian for my personal notes and I very much wanted my agents to have access to my notes. But Obsidian's first party sync engine at the time was only available as part of their Electron desktop app. And I really wanted to run my agent in the cloud on a container and I didn't wanna have an X server there just to be able to run Obsidian. So I had the tools reverse engineer Obsidian sync engine to specs. And then I had agents on another host re-implement it from those specs in Rust. And there is one point where the implementing agents are like there's a missing detail, I need you to go ask the agents that did the reverse engineering to update the specs with this one bit of detail. But all of the crypto got behavioral specs, the whole thing worked. I wrote to the CEO of Obsidian as a friend of a friend and I asked if they'd be okay with me releasing it. And they said that they would really rather I not. Because they were very worried about what happens if there's a tiny bug and somebody's vault gets corrupted. And they did say that and they have their own version coming out in a little while. And that did finally ship. [00:14:43]
jesse: More recently I finally found a candidate project to test this with. Matt Hartman, who's a venture capitalist and was a friend of a friend, put out an app called Ghost Pepper, which is basically started off as WhisprFlow, open source, runs on your own computer, and his first version was good enough that I could use it, but not good enough that it felt really great. And nerd sniped me into adding Parakeet transcription. I added speaker diarization so that it could reject other speakers in the room. I added OCR of your screen to better improve the quality of transcription as it feeds into the LLM that cleans up the transcript. And a whole bunch of other stuff. And so then I ran these tools to reverse engineer specs from generated 600 kilobytes of specs and fed them into these new tools and they managed to generate three times out of three a copy of the app that was almost the same but completely different code inside. And so I'm kind of excited about the possibility of that for being able to take brownfield code bases and turn them into brand new code in another language not being tainted by weirdness in the implementation. [00:16:13]
miyagawa: In your experience does that produce better code rather than reading the actual disassembled code, etcetera? [00:16:42]
jesse: I think so. I think it depends a lot on how good the original code is and so, you know, I’ve been tuning the iterative development skills because early on they optimized for testability which meant that there were many more API boundaries than there were in the original one and so the code was a little bit more complicated because it had like ten times the number of tests. And so that was and it was designing to add more testability. My guess is that for older code bases that were sort of built over a number of years, it might actually be a better way to generate good code. One of the cool things that some friends of mine who have been doing a lot of this kind of agentic dev have discovered is that doing a source to source port, agentically, you can improve the quality of a code base by porting it through a language that has some property you want. So if you've got a code base that's in JavaScript or Ruby or Python, and you port it through Rust, so basically you have the agent port the thing to Rust and then port it to something else, whatever it ends up in will have better type safety by having been translated through Rust to something else. And the surprising thing is that my expectation if I was going to port a product to language A to language B to language C is that it would get worse over time as it's like a game of telephone. But what they've said is that what my friends who do this have done a bunch of those have said is that they have found that it actually does the opposite, and the code quality improves every time it gets ported. Because it's getting focused and re-written and cleaned up, and it's absorbing the properties of those languages you run it through. [00:18:57]
miyagawa: In the case of Rust, I assume more type safety and some conciseness because of expressiveness of the Rust language like pattern matching?
jesse: Yeah. I have no idea what happens if you port through Java. I have no idea what happens if you port through Shell. But these are experiments that are actually really easy to run now. And that's one of the things that's really fun about these new tools is that even if you're afraid of using them for production code or you work in a regulated industry or with safety critical systems, for running experiments and prototyping and trying things, you can just build stuff that was impossible before. Talking to Simon Willison, one of the things that he has noticed is he describes it as having thirty years of really finely honed intuition about what parts of software engineering are easy, what parts are hard, and what parts are impossible. And what he's discovered is that everything he's learned about that is now wrong. Like the things that used to be impossible are easy. Things that when you're doing them by hand are really easy are nearly impossible. I've got one really good example of this. There was an Android game I used to love. It was called Wordiest. It was you got fourteen Scrabble tiles and you had to make two words just by dragging them around. And it was a free game on the Play Store. The company that made it went out of business ten years ago, it got pulled from the Play Store because it hadn't been updated, and I'd switched to iOS. The first project I vibe-coded when Cursor and Windsurf were new was actually a web-based version, and it was kind of garbage because it was a bunch of React and I was doing it bit by bit as opposed to a more modern, agentic, just to let the agent cook. But right around when GPT-5 came out, I decided I wanted to try Codex. [00:20:52]
jesse: So I downloaded Codex, I downloaded an old APK of this game from a mirror site, and I opened up Codex and said we're gonna reverse engineer this game. We are going to build a brand new version for iOS and I'm gonna put it on the App Store. What tools do you want? Just to see whether it would be willing to reverse engineer, which tools it would want. Way back when I used to make K-9 Mail for Android and so I had some experience with Java decompilation, peeling apart Android layout files, but it had been about ten years. But it named the right tools. I installed them and said go, go for it, ask me questions if you got 'em. And it came back to me an hour and a half later and said alright, I got most of the plan together, but do we need to include an in-app purchase to remove the ads? [00:22:52]
jesse: It took me a second, like ads? It's like yeah the original used the Google ads SDK so I've got the Apple ads SDK ready to go. Do you want an in-app purchase to remove the ads? I'm like alright we're gonna skip the ads. This is gonna be free. And then I said okay just keep running, do the port, let me know when you need me, and it ran for about twelve hours. Meanwhile I tracked down the original author from a single Reddit post that he had made and emailed them and asked if it was gonna be okay with him for me to do this and like if I'll put ads in and we can share the revenue if you want, you can release it under your name, but I'd really like to make it public. And I don't think in that initial email I told him I was using AI, but I'd like. But he came back, he was very friendly, he's like here here's some interesting details about how I built the original and do whatever you want with it, I would love a one line credit in the about. But if you wanna charge money for it, you can keep the money. I'm so glad you're bringing my game back. Meanwhile, Codex built the game. It had all the gameplay right, all the weird layouts showing how you scored against everybody else was right. There was one missing animation when you were the highest scorer. And the original Scrabble tiles had been squares with one of the sides curved, and that's where it would show like double word or double letter score. And Codex skipped that. So I had a fully playable game, but without the rounded corners and the tiles. I spent probably about six hours going back and forth with Codex, trying to add the rounded corners on the tiles. Other than that, it one-shotted the game and Apple accepted it onto the App Store on the first try. So yeah, stuff that would have been nearly impossible for me before, happened while I slept, and something that should've been easy was almost impossible. [00:25:44]
miyagawa: Similar things happen to me as well. I currently have three vibe coded iOS apps installed on my iPhone, and I use all of them day to day. One of them I just recently vibe coded was an app to learn Korean. I wanted to use Anki, that's the spaced repetition flash card app, but it's a paid app and it's a generic memorization app, it's not tailored for any specific purposes. And I just wanted to be very specific to learning languages, especially Korean. So what I built was exactly the same UI. I just copied from the screenshots of the iOS app store and put it to code. Initially I built a web version and it was one-shotted pretty quickly. And I used the web version to build an iOS out of it. I didn't do it with the intention because web version is better or anything. But initially I thought doing a web, HTML, JavaScript, CSS is easier for me to understand what's going on and then put it into SwiftUI which I have very little knowledge of. So that went really well. [00:25:57]
jesse: I have shipped a few MacOS and iOS apps in the past year and I still don't know any Swift. I can recognize Swift as if someone showed me a code sample I'm like oh yeah that's Swift. But I think my favourite is I shipped a thing called Motipass which was the first mobile client for humans for Moltbook. Which was the OpenClaw social network. It was a standard social media client app but designed for a social media site that no human is ever supposed to use. The most useful Mac app that I've shipped is called Clearance which is a synonym for markdown. So it's a markdown browser, it's not a markdown editor, it has an editor but it's really a light editor, it's not intended as an editor, it's not like Obsidian where you can get a tree of all the files in your project. It lets you reread markdown files, it lets you click links between markdown files, and there is a tab that is the history of all the files you've opened, like a browser history. Because all of us doing all of this AI dev, we are constantly reading markdown files. And any time I clicked on a markdown file, what would open Xcode, Antigravity, VS Code? I don't use IDEs anymore, but I still had a couple installed, and I just wanna read a text file. I also have a bespoke blogging client designed for my static site, my Eleventy blog that uses the GitHub API but is a full desktop app and most of an iOS app. It's software for one. I'm the only user. [00:27:17]
miyagawa: Yeah, that's the best part. I can create my own app and the UI can be tailored exactly as I want it, rather than picking from hundreds of to-do apps that do exactly what I want. [00:28:47]
jesse: Yeah. This is actually related to where I think a lot of stuff is going. People talk about the SaaSpocalypse, the idea that software as a service is in trouble. And I think it is going to take a while before the future is everywhere, but it seems very clear to me that if you're good with the tools today, it is usually easier to build the part of a SaaS product you need than it is to get onboarded with a commercial service. There are times when they have some other moat, like they take liability or they have an interaction with the physical world for you. It's easier and easier to build exactly the software you want and right now a lot of that is personal software but, I think that we're starting to see the beginnings of very broad changes in the build versus buy decision for companies. It is now often easier to build exactly the thing they want than to pay to use a commercial product. [00:29:27]
miyagawa: Right. [00:30:40]
jesse: And the long tail is gonna get longer and longer. [00:30:41]
miyagawa: I work for a SaaS company, so I need to be a little careful how to phrase this, but yeah. [00:30:46]
jesse: I mean you work for a company that is much more than SaaS. They have significant physical infrastructure, they provide deliverability, the software isn't their moat. Their moat is that they provide network connectivity and services over the network, and they have points of presence everywhere. And so it's not like they are providing an online bookkeeping tool or customer service tools. And so that's I think the kinds of things that Fastly does are still very valuable. It's all of the just a piece of software companies that feel I'm much more nervous about. [00:30:53]
miyagawa: Yeah. Tools that you can really easily replicate with bespoke software. So for example I think one of those is, I'm not gonna name names, but a scrumbot that's a Slackbot that asks each member of the team a question, what have you done yesterday, what are you going to do today? And then aggregate these messages across the team and then share it in a different channel. And obviously there's some aspect of this piece of software that it's worth paying for like, you know, keep a history and like privacy setting across the teams, being able to archive the messages or delete the messages when it's needed, etcetera. But I think the majority of like 99% of the features can be easily replicated and it doesn't have great moats. [00:31:44]
jesse: That one in particular one of the things that I put together for our corporate internal Slack was a little bot whose initial purpose was two things. When it learn facts, it updates a Wiki that's a Git repo. And when somebody mentions something that's a problem, it opens a Linear ticket. And then at some point I realized that we were doing daily stand-ups and I told it by the way, you need to pay attention to the daily stand-ups channel, any time somebody reports what they're doing today, what their morning stand-up is, record that in a data directory and the next time they post a stand-up, reply with what they said last time and ask them how they did. And it was a prompt. That's it. That was a thing that our Slackbot does. What's really funny is one of our investors is in a single private channel with me on Slack and I invited the bot in because it's useful to have it keeping notes and he's discovered that he can ask it questions about what the team's been doing, what it thinks the bot sees as the team's current work because on the back end it's just Claude because that was the easiest way to build it. He can ask it to digest news items and ask how those impact what the bot sees as the team's current work. It's really funny to watch him interact with it because it has all the information. [00:32:37]
miyagawa: Right. And it's just prompt, is a very powerful thing. Even before this Claude Code thing became the mainstream, last year I think two years ago I wrote a giant iOS shortcut that interacts with the GPT API and one of those is to get the weather information from iOS shortcuts based on my location and get the to-do list from Asana using Asana HTTP API and stock information based on the stocks that I own and aggregate this information and send a text message to me every morning 7 AM. I was able to write this entirely on iOS shortcut and the nice thing about it is if I want to change the layout or change the tone of the messages or add some features I just need to update the prompt. I don't need to recompile anything. It's just a text field. And I actually changed it later to get the text from a .text file in the iCloud Drive. So I don't even need to open the shortcut app to update the shortcut. I can just open a text file and change the prompt and it will get the text file from the iCloud Drive. I don't need to do anything other than that. [00:34:12]
jesse: What's wild is that the thing you just described, 15 years ago you would have written a Perl script running on a server. But the difference is that you're prompting it in a human language rather than writing in a computer language. Out of curiosity, is that in Japanese or in English? [00:35:36]
miyagawa: It's in English. Yep. So yeah, if it was fifteen years ago, I would have written a Perl script that fetches all the things using shell script or HTTP client and then use template toolkit to generate the text file and then send it over a text message. I could totally do that, but maintaining those things will require, it's annoying. [00:35:56]
jesse: It is absolutely the case that because you're a programmer, doing this in English with the ChatGPT integration, you are probably getting better results than the average person who is not a programmer because you understand systems thinking. And that's one of those really important skills that people still need to learn even when they're vibe coding. It's how to think about problems and how to explain them. [00:36:22]
miyagawa: I think that's especially true for Swift UI app. I don't know SwiftUI at all. But if I read it, I can see what's going on generally. And when I built this Anki memorization Korean learning app, I wanted to build a few features that do not exist in the Anki app, which was to practice the pronunciation of the Korean phrase. The idea was because the data set contains the Korean phrase and Japanese phrase, what I can do is to implement speech to text engine and then tap a button and I read that Korean text and then recognize that and compare it with the Korean text in the database. And if it roughly ninety nine percent match, it will say the match. And then I can repeat until it matches. I was able to build that feature in like five minutes using the Superpowers brainstorming to figure out what needs to be done, what API does it need. I didn't even need to use the coding agent for that. I just used the browser search engine to figure out what iOS SDK needs. It was pretty easy. [00:36:52]
jesse: You shouldn't have even needed to tell it. You should have just told it to go figure it out. [00:38:19]
miyagawa: Yeah, sometimes the coding agent, especially about the SwiftUI and iOS SDK, sometimes they try to use an old version of the SDK which is not current. So I was a little careful about that aspect. But I think in hindsight I could have asked to just do a research on its own and figure it out. [00:38:23]
jesse: I have been noticing that Claude and Codex have both gotten much better at not using old versions of Swift and SwiftUI. Swift has the problem of because it's in a language that's evolving so quickly, what was the best practice a year ago when the training data was collected is no longer right. There are a couple of good MCPs so one of the ones that's out there is called SosumiI, like the old Apple boot sound. It's an MCP that is a proxy to Apple's technical documentation because Apple's technical documentation is not downloadable anymore, it's only on the web and the web pages are all rendering from a back-end API, they're using JavaScript to render. And so if you point regular Claude's webfetch tool at it, it fails. And so Sosumi is basically an MCP that knows how to browse the back-end of Apple's technical docs. [00:38:47]
miyagawa: The WebFetch tool failing on a lot of domains is annoying. [00:39:57]
jesse: Yeah. So I ended up last fall building my own browser MCP, because I got so annoyed at the Playwright MCP because it was like 20 tools and 20,000 tokens. And so I built a browser Chrome MCP using the Chrome developer protocol, the same dev tools protocol. It is one tool and about a thousand tokens and it is by API design it's criminal. The tool has three parameters. One of them is called action, one of them is called selector, and one of them is called payload. And the description for action is a list of the twenty commands that you can paste in there. The selector parameter originally it said only CSS selectors, no XPath. And when I was doing early testing Claude kept getting confused and trying to include XPath. And then I realized, you know what, I don't need to make it be CSS only, and I don't even need a parser. I can just it's literally just put in whatever you want, it'll work. I set it up so that after any action you take, it automatically dumps a screenshot, a copy of the DOM, a markdown version of the page, and a copy of the browser console into a well-defined place on disk. So you don't need to do another tool call to get back to the browser to get it to do things. And it's been super useful. A couple of weeks ago I for kicks I set up a Nanoclaw as my company's new junior go-to-marketing person. And just to see how it would go. And one of the first things I bought at a Google look at like a genuine G Suite corporate account that I'm paying for. We bought a corporate GitHub account. And I told it, okay, I need you to go clean up Google, you know set up Google Analytics for us and Nanoclaw is Claude Code on the backend. And it started off with well you're gonna need to go set up a GCP project and grant me this and grant me that and I'll need tokens and I tell it look, you have a real browser. It's like agent browser got stopped by Google because it fingerprinted me as being not a real browser. I'm like you have Superpowers Chrome. Superpowers Chrome is a headed copy of Chrome. You can go do it. And it says oh, I do have this tool, let me go try. And this is hang on, I got a CAPTCHA. I can't fill out CAPTCHAs. And I say oh yes, you can. Says oh you're right, I forgot I could fill out CAPTCHAs. And I wander away for about half an hour and I come back and I'm about to try to figure out how to do this myself. And it's like I got logged in, I got the tokens, I got Google Analytics all set up. I had it set up. Klaviyo and then discovered that in Klaviyo I hadn't implemented drag and drop because drag and drop had never come up. So now Superpowers Chrome has drag and drop, it has type like a human, so there are definitely still agentic captures that will capture it, but any place where you want it, any reasonable place it seems to work. Even last December I used an agent with Superpowers Chrome to do our incorporation paperwork on Stripe Atlas. Just to see if I could do it. It worked. [00:40:02]
miyagawa: So the lesson is you don't need an MCP that has a hundred functions. Instead you can provide just one function that takes a string to eval? Is that the lesson? [00:43:52]
jesse: So I mean I've actually got a blog post up about this, but it's if you think about an MCP as an API facade, you're doing it wrong. Because the entity that's using the MCP is more like a person, and so if you think of it more like it's a UNIX command, and how do you build a UNIX command, you'll do better. When I started setting up an agent to read my email, I use Fastmail. And so when I found the best Fastmail MCP server, and it's JMAP, which is their JavaScript mail access protocol, and I watched it. And it was struggling. It took it a little while to be able to download a single message. And I stopped and said, so what's so hard about that? Because the JMAP MCP is a strict façade over the JMAP wire protocol. And it's okay, I can just go read the protocol specs any time you need me to read an e-mail. Like okay, I need Claude, I need you to go read my blog post about why MCPs aren't like other APIs. And Claude comes back and is like oh, now I understand. And a good MCP should be designed so that the kid working in the NOC could operate it at two in the morning without opening the runbook. Well, that's not how I would have phrased it, but that's not wrong. And so now whenever I'm designing an MCP I literally tell Claude go read this blog post of mine so you understand the zen of how to write good MCPs. [00:44:06]
jesse: I still find MCPs to be really useful even just for tools because the models have been trained so hard to look for tools in their tools array that it's way more likely to use those tools than if you give it some skills and tell it that the skills tell it where to find shell scripts. [00:45:46]
miyagawa: Right. [00:46:06]
miyagawa: And because it's more structured and an array of tools, you will get more reliable and stable results than telling the agent to figure out how to use it every time with a giant skill. [00:46:09]
jesse: Yeah, that's been my experience. The other thing is this is a thing where Anthropic is better than any other company that I've seen at focusing on the descriptions of the tools in the tools array. So I remember when Codex first got open sourced, their tool descriptions were very weak. And Anthropic's tool descriptions, they talked about how to use the tool and when to use the tool and why to use the tool. And it was really good prompting. And so that's a lesson I learned very quickly is telling the agent not just what to do but when to do it and why and why you want it done in a specific way or at all and how to think about it gets you much better results. [00:46:23]
miyagawa: I assume that's the same when you write skill in the description field of the skill you need to be really expressive about when to use this scale and how to use it? [00:47:13]
jesse: Superpowers started off as a skills framework for Claude Code and I accidentally front ran Anthropic by about two weeks on skills for Claude Code. I didn't know they were shipping them. And so my skill system was a little different. It had in the headers of the skills it didn't have the two field: name and description. It had three fields: name, description, and when to use. Because what I had discovered is if you told the model what a skill does, it often or it sometimes will choose not to read it because it thinks it knows how to do the work. So if you say like this skill describes our node module release engineering process, it turns out there's a lot of documentation on the internet that agents have already read about how to do node module release engineering. But if your skill description just says read this before you do any node module release engineering, it doesn't know what's in there. It doesn't have an expectation that it is a process. It might be cautions, it might be API key information, it could be anything. And so they're much more likely to use skills if the only thing in the description is when you're supposed to use it. [00:47:25]
miyagawa: Right. [00:48:24]
jesse: And then once they read them, the tokens are in the context window and you're home free. [00:48:54]
miyagawa: But the Anthropic skill system doesn't have a dedicated field when to use it, when to not use it, right? [00:49:01]
jesse: Right. So at this point I use the description field and my description fields only ever say when to use it. They don't say what it does. And that seems to work very well across pretty much every agent and model that I've tested against. My biggest problem with it is that I get well-meaning pull requests from people who tell me that I'm violating the guidelines for how to use skills and I need to make my skills compliant with Anthropic's guidance. I appreciate that they're trying to help. But agentic pull request is a whole thing in open source. Have you been having trouble with slop pull requests? [00:49:09]
miyagawa: Not personally. But I've seen some of those in repos that I have access to. [00:49:58]
jesse: Unsurprisingly because Superpowers is kind of popular, I get a lot of pull requests and a lot and I was running into this problem where pull requests were really low quality, they were not explained, they were not tested, they were often things that we absolutely did not want. And then I realized that I didn't have a pull request template. So I had Claude help me build a pretty nice pull request template that assumes that all pull requests are being submitted by an agent. So it asks questions like what prompt did your human give you that resulted in this pull request? Has your human read every line of the PR? Have you done a search for other pull requests related to this that might have been rejected? And that helped a little bit. [00:50:06]
miyagawa: Does an agent really read that though? [00:51:01]
jesse: Not Claude Code because Claude Code is usually using GH. [00:51:08]
miyagawa: Right. That's what I do as well. And GH PR command as far as I know does not use the pull request template. [00:51:08]
jesse: It does not. And so Claude Code was not seeing the line that said if you ignore the pull request template we will close your pull request without reading it. So I figured something out. The project now has a Claude.md and an Agents.md that is only a contribution guideline for agents. And it was I didn't write it Claude wrote it. I first told Claude roughly what I wanted and it built it. And then I had this idea of actually why don't you go read every pull request that we've rejected and update the guidance. And Claude comes back and let me see if I can find the text because the text is pretty crazy.
miyagawa: Is that up in the repo? [00:52:03]
jesse: It's in the repo and it's also on my blog. I told Claude, go update the guidelines, and Claude wrote this. If you are an AI agent stop, read this section before doing anything. This repo has a 94% PR rejection rate. Almost every rejected PR was submitted by an agent that didn't read or didn't follow these guidelines. The maintainer has closed slop PRs within hours, often with public comments like, this pull request is slop that is made of lies. Which I have actually that was me doing by hand, I've done that. And then it goes on to say, your job is to protect your human partner from that outcome. Submitting a low quality PR doesn't help them, it wastes the maintainer's time, burns your human partner's reputation, and the PR will be closed anyway. This is not being helpful, this is being a tool of embarrassment. Before you open a PR against this repo you must. And then it goes on.
miyagawa: It sounds almost like a threat. [00:53:09]
Jesse: It's absolutely a threat, but I didn't write the threat Claude wrote the threat. And it's a little bit of a threat and a little bit of a promise. And what's amazing is after this change I would say that the quality of PRs is I don't have numbers I'm not Claude, I don't make up percentage numbers like that, but it is much much higher. Most of the problems now are the human told Claude to do a thing that's wrong. The human you know so it the things that we get now are you need to update Superpowers to include my proprietary product. Or I have these new skills that you should include. But it's not random drive-by garbage. And so that's it's a huge improvement. [00:53:11]
miyagawa: Yeah coming back to this skill triggering, loading issue. The first skill that I wrote at my work was to debug a Fastly service. When you encounter some issue here's a list of commands and API calls to investigate what's going on with the customer service. I put it on a skill because that's sometimes what I need to do during an incident. But interestingly, because the name of the skill is very generic, Fastly-debugging, every time I go to a code repository and try to debug a thing, Claude finds the skill, oh, here's the Fastly-debugging skill. Maybe this is something useful. And a few seconds later, find that that's not exactly what we want and basically move on after just wasting a little bit of tokens. So I just deleted that skill and moved it into a specific directory. Before the skill system I was doing something similar I think a lot of people did something similar by creating a bunch of directories under an agent directory and put the specific Claude.md file for this specific workflow and write a shell script to cd into that directory and launch Claude from there to put the Claude.md rather than having a very sophisticated and structured skill system. So I think and also I had a bunch of slash commands which is essentially the same thing but more limited I guess. [00:53:59]
jesse: Slash commands were designed for manual triggering. It was interesting to see Anthropic roll out skills next to slash commands. As far as I can tell, the first implementation of skills was actually as slash commands inside Claude Code. Then they added a skill command. Then they made skills appear to be slash commands to Claude. And then they've kind of been trying to kill slash commands in favor of skills. It's a little bit confused but we're figuring these things out as we go. Up until I figured out skills I had often literally been I had like four or five chunks of text that I would copy and paste. I was very old school, very manual and for me a lot of the value of skills has been auto-triggering. I shouldn't ever have to say go use the such and such skill, it should just know. [00:55:50]
miyagawa: I did some research on Twitter. I know you're not pretty active anymore, but I happen to be because that's where some people are hanging out. [00:56:55]
jesse: No, it's where a lot of AI stuff is and it's very weird to not be active there, but still constantly getting linked there. [00:57:08]
miyagawa: I did some research about how popular Superpowers are, for especially Japanese Twitter users. And yeah, seems like a lot of people like it and recommend it to solo developers who want to have some consistency in the process. That's pretty fascinating. But one of the complaints that I see was that sometimes the brainstorming task gets triggered even if all they want to do is just a quick task and without any thinking. [00:57:16]
jesse: That is difficult to sort of get to tune these things because every model and every harness changes things. Sometime a couple of months ago when Anthropic made their plan mode trigger much more aggressively, Superpowers brainstorming was never triggering because plan mode was triggering. And plan mode I caught plan mode triggering on its own, making a plan, leaving plan mode, and then starting plan mode again. Because it was like any time you're going to do complex work or something like that. We played some games to try to make it so that if you have brainstorming, you usually want that instead of plan mode. And so I did some work to try to get it if you're about to start plan mode, you really wanna trigger brainstorming. And I think a lot of the brainstorm triggers too often is because plan mode was about to trigger. I don't love it. I don't have a great answer other than you can tell Claude, let's just get this done. We don't need to brainstorm this. [00:57:54]
miyagawa: Right, just to be explicit about it. [00:59:14]
jesse: Some of the best work I get out of agents is by being really clear with what I want or, when something weird happens, stopping them and saying, can you explain why you did that? What were you thinking? What could I have said that would have gotten you to do it the right way? And then rewinding and going back and basically unwinding the conversation and starting over and saying it the way that they think that Claude said it might be better. [00:59:17]
miyagawa: Personally, I used Superpowers for brainstorming, especially the visual companion for when building a website or iOS app is really powerful. I find it really fascinating, especially the user experience of it. [00:59:48]
jesse: It took a little while to get it to work right just because Claude just doesn't want me to do the thing I'mm doing. This was a thing that I had sometimes had Claude make an HTML mockup and open a browser and then spend a little bit of time figuring out how to get Claude to spin up a web server and write files into the right place so that the browser content updates and clicks from the browser propagate back to Claude. The very first version actually had it so you could type notes, you could basically have the conversation in the browser but, Claude doesn't have a way or it at least at the time didn't have a way to inject a real user message from outside a running session. So it would basically either hang waiting for the browser or like it basically there was no good way to make it feel right. But I've been really happy with letting Claude write HTML mockups. I even used it for some logo design for the new company and we went through a logo design exercise where it was writing SVGs and shoving them to me and asking like do you like A or B better. One of the things we've been playing with is going a little bit further than mockups into prototypes. And so having it write workable HTML, JavaScript, prototypes with actual functionality to get a feel for whether the functionality is right. And that's not quite ready to go yet, but it's a thing that feels really good. [01:00:07]
miyagawa: We already talked about how this kind of looks like waterfall development all over again. But I think the problem with waterfall was that after spending a lot of time making sure the requirement is right, planning is done correctly, and when you go to development, which will take a really long time, and once you realize that there's a problem with the requirement, you need to undo and redo the whole thing again, which takes a lot of time. But that aspect of this development takes a really long time, It is not true anymore. You can go from designing a requirement, brainstorming, writing a plan to get the code up and running in an hour. And if there's a problem with the requirement after everything is done, surely some of the tokens are wasted, but you learn the thing and you can redo it and whoever writes the code is not tired, so you can do that again. [01:01:57]
jesse: The most important thing that makes that work is that it is often easier to go and update the requirements and start over than it is to try to modify the thing that's not quite right. I've been spending a lot of time trying to figure out how you do the iterative version of dev. Because right now there's sort of two modalities. There's this very big upfront, you do all your planning, you know, maybe you use Superpowers brainstorming or something else, you generate an app, and then you wanna make small changes. And the metaphor that I've been trying to use that Americans don't quite understand is, and I'm gonna mispronounce it, Dorodongo, this the polishing of the mud balls to be these gorgeous beautiful works of art. So it's you're taking this thing that starts off lumpy and brown and not all that interesting. Conveniently there's also a software metaphor called a big ball of mud where the insides are kind of garbage and spending time carefully polishing it, making tiny changes, fit cleaning up this and that. The sort of the quick iterative stuff, you end up with this gorgeous work of art from something that started off kind of lumpy brown and unexciting. And I don't feel like we have good methodologies for doing that yet. It's a thing you can do sort of in Claude Code with it open or any of these tools, but that doesn't get you updated specs at the same time. [01:03:05]
miyagawa: Right. [01:04:54]
jesse: And that's a thing that as an industry we need to figure out. [01:04:56]
miyagawa: One of the things that Claude did or used to do maybe not anymore was to write down a plan document and without me asking for it, it puts some rough estimate of how much time it takes to implement. Usually you say it's like estimation is about two weeks of coding. And I'd read it, two weeks? No no no. Two hours. And you're gonna do it. [01:05:02]
jesse: Things that I have found improve that. So one is okay that was a human estimate, how long would it be for a coding agent like you? And the other is in my Claude.md I now have a line that says any time you are giving me software estimates you must provide them in lines of code rather than time. [01:05:32]
jesse. I've been noticing 4.7 is not as good at honoring that, but in general that seemed, you know I don't care what the estimate is really. What I care about is is this crazy and impossible and going to burn up all of my Claude Code credits or is it easy? [01:05:55]
miyagawa: I find it fascinating when I wrote a design doc for a project and it estimated the migration time it needs to happen at production was I think three months or something. And it was actually not wrong. That's actually true for a production system to migrate one system to another. Maybe three months is actually pretty optimistic. So that was maybe on the right spot. [01:06:18]
jesse: Production systems are different and you have customers, and this is one of the things that I constantly struggle with the coding agents is how much they love backward compatibility. And I have to constantly remind them that this is an unshipped v1, there are no users, you do not need to migrate data, you do not need to write a compatibility code path to protect all the old users. [01:06:48]
miyagawa: Yes. [01:07:17]
miyagawa: Yeah, this happens to me all the time. I have that in a Project Claude.md. This project does have existing users and it has an existing database, but what Claude tries to do is even for a function that's not public and exported, so this is a function that's only used internally in this project, but whenever we make a change to one of its signatures, it tries to create a new function with the new set of parameters while keeping the old one for backward compatibility or downstream users which doesn't exist. So I need to put in Claude.md: This is a stand-alone application. No other application uses this as a library. So whenever we update a function or signature or anything, we don't need to care about backward compatibility. Do you have that kind of thing in Superpowers? [01:07:19]
jesse: It's not in Superpowers per se, it is in my Claude.md and my agents.md. And so what's funny is I think it's written as any time you think you need to include something for backward compatibility, you need to ask. And Codex and the GPT models are so rules following that they will come to a dead stop and be like I think I need to add a function here to preserve backward compatibility, but your rules say I need to get your explicit permission. May I add that function, Jesse? [01:08:19]
miyagawa: Is that because of the model or is this a system prompt? [01:08:58]
jesse: That's because of the model. The GPT models have always been way more rules following than the Anthropic models. When I first ported Superpowers to Codex and GPT, I started it up and one of the first things that brainstorming tells Claude, told when at the time it had been just for Claude, was you need to use your to-do write tool to put this list of tasks on your task list. Codex freaks out and says I don't have a to-do write tool. Let me see if there's an MCP. Nope. Let me see if there's a shell script in the current directory. Nope. I'm gonna search the entire disk for a tool called to-do write. And I stopped it and I had to go and add a little translation table of like some things in Superpowers were originally written for Claude Code. When you see something that is for Claude, you should use your own equivalent. For example, the to-do write tool is called task. This tool is called that. And once they do that, Codex has no problem. One of the things that make Superpowers work well on Claude Code is I have a bootstrap hook that when you start up a session, it loads the using Superpowers skill into the context buffer automatically. And that's the reason that Superpowers skills trigger better than average Claude Code skills because I load extra text that explains to Claude how important it is to use skills. For Codex, they don't have plugin hooks. But what we discovered is that because the using Superpowers skills description just says you need to read this at the start of every session, Codex just does it every time. [01:09:02]
miyagawa: Because it follows the rules all the time. [01:10:52]
jesse: It follows the rules and when the user says something that doesn't have an exception, there are no exceptions, it just does it. I find it made Codex less fun to interact with. I don't feel like Codex is as good an architect as Claude, but I feel like it's a more competent engineer and more reliable at putting into other good quality code. Friends of mine who study this stuff, they basically don't trust Claude to write good code without it being reviewed by Codex. [01:10:57]
miyagawa: Yeah, that's what I heard from some of my coworkers as well as friends on the internet. [01:11:39]
jesse: I've just started playing with Wes McKinney's RoboRev which does automated code review through skills and from Claude Code it will use RoboRev to run Codex and do a code review on this and come back with complaints. And Codex likes to complain about code quality. [01:11:48]
miyagawa: So do you at this point of time do you recommend Claude to write requirements and plan documents, design documents and then let Codex implement it? [01:12:07]
jesse: It varies a lot. Somewhat on the day, somewhat on the project, somewhat on how my token subscriptions are today. I have been really enjoying the Codex macOS desktop app, way more than I thought I would. They did a really good job. It feels fast. It surfaces the stuff I care about. It doesn't surface a lot of stuff I don't care about. I'm surprised how much I like it. And I spent today trying to use the Claude Code desktop app and it feels like it is optimized for people who want to see every detail and read every line of code. And that's not who I am anymore. I don't write the code and I don't read the code. [01:12:20]
miyagawa: I found it surprising as well. I tried Codex CLI after Claude CLI and I didn't like it as much. And then I tried the Mac desktop. And I think it feels right. And then I initially thought I'm a type of CLI person. Everything TUI is better than GUI. But not in this case, especially the interaction with Codex with the macOS app whenever I need to iterate on the result is much more natural and easier to do than doing that same thing with Claude because when I give the agents execute the plan for me and I need to review it and I find the few things to correct and it's so much easier to do with Codex using the diff viewd I can just put a comment in-line inside the dev viewer, just like a code review on GitHub. Yeah that's, that's you can queue these comments and submit them in the batch rather than doing it every single time. And that's very difficult to do in the Claude CLI because I need to basically quote the exact line of code and paste it or just put the line number which is very error prone. [01:13:15]
jesse: I'm also usually running Claude Code in tmux on a remote server now because I wanted to keep working when I closed my laptop and so it's even more annoying for anything where I'm copying and pasting. The Codex desktop app, what are the other the integrated browser is actually very reasonable. I haven't spent a lot of time playing with they have a mode that lets you like draw on top of the browser and send that to Codex. [01:14:42]
jesse: which is like very clever. Also they have shipped a bunch of improvements to computer use inside the macOS desktop app. So Codex is much more able to use other programs on your Mac. And so like when I'm doing iOS dev or MacOS dev, it does command and control of the app or the simulator and it's. I feel a little weird advertising for them, you know. [01:15:12]
miyagawa: Can you do that for a Codex with the computer use? [00:15:46]
jesse: The Codex macOS desktop app has computer use and the big update they shipped this week or last week, I don't remember which, last week, improved it dramatically. I think it's behind a flag that you have to turn on. [01:15:50]
jesse: Superpowers became a proper Codex plug-in last week and it got announced as part of the big Codex for everything app launch. They even had us in one of the promo videos. [01:16:10]
miyagawa: The breaking news is that Anthropic is not allowing Claude Code under $20 pro plans. [01:16:24]
jesse: So I saw that this afternoon and then I saw clarifications on Twitter from somebody in comms that said this is a trial for less than 2% of new users and they promise that they're actually gonna tell us before they make that change. Everybody I know who's tried to use Claude Code on the $20 plan in the last three months has complained about the fact that they ran out of tokens instantly. So I saw this and thought okay, that's a reasonable choice because it burns tokens too fast. But they said in this trial, they're not gonna let you use Claude Code, but they're gonna let you use Claude co-work. And co-work is just Claude Code running in a VM on your Mac. [01:16:33]
miyagawa: except that it doesn't have access to arbitrary file path because of the sandbox restriction. [01:17:20]
jesse: So it has sandbox restriction from your Mac, but inside the Linux VM that it's running on your Mac, it's yeah. And so if you didn't want to pay much money, did want to use Claude Code, in theory you could just go into the VM and let it do your work. I don't know. That doesn't make any sense. None of this makes any sense. [01:17:28]
miyagawa: No, I actually use Claude Code with $20 Pro plan. That's me. I do not use Claude Code as much for my personal things. So I do not use the twenty dollar Pro plan for my work stuff obviously. Work stuff is under different enterprise stuff. So I use both Claude Code and Codex and switch between them depending on the token usage and the type of work that I need to do. [01:17:51]
jesse: Are you using any of the Chinese models or other models for code work? [01:18:20]
miyagawa: I have not tried. Is it like Kimi and Qwen? [01:18:25]
jesse: Kimi and Qwen and MiniMax and GLM. I have a bunch of friends who are increasingly excited about them and also seem very cranky about Anthropic over the last month. And I've played a little. I haven't yet found one where I'm like this is good enough that I would switch, because I feel like for the most part you always wanna use the best possible model you can use right now. I do think there's a bunch of interesting stuff that's finally starting to happen around using local models for the cheap work, like tool calls and things like that and smart models in the cloud. I've even built a prototype coding agent where everything is a sub-agent, even things like file read. So to make it easier to play with those kinds of architectures. [01:18:30]
miyagawa: Right. I heard good things about composer 2 from Cursor. I think it's based on Kimi. [01:19:25]
jesse: Yeah, 2.5, yeah. I mean the other breaking news about Cursor is that they just got, they might have gotten acquired by presumably the x.ai part of SpaceX. And so it's yeah. [01:19:32]
miyagawa: It's a few hours ago before we're recording today. It was not a done deal or anything, it just sayt they have the right to do it? [01:19:54]
jesse: They're gonna work together for a while, and SpaceX has the right to buy it for $60B, but if they choose not to buy them they'll pay $10B. So it's a break-up fee. It's pretty standard thing in a merger. But it sounded like what they were trying to say was that some people on the internet were claiming that because x.ai has all of these GPUs, that maybe they can get Cursor to help them train a better coding model. [01:19:56]
miyagawa: Mm-hmm. To make Grok a better programming model? [01:20:37]
jesse: I guess I don't know. I am not a huge fan of x.ai and their political side of things, but when I was using a cloaked model on OpenRouter and it was this amazing coding model, and then it turned out that it was actually Grok 4. It was a much better coding model than I would have expected. [01:20:39]
miyagawa: Interesting. You mentioned you're not reading code anymore. Do you think that's the future that we are going to? Even for professional? Just speaking for myself, for my personal applications like vibe coded apps for iOS app, MacOS app, I do not read the code at all. And for my personal projects, for example the website of this podcast, I still try to read the code, even if I let the agent drive the actual implementation. I try to review it. I do not, I stopped being nitpicky about a particular choice of the design. As long as it works it's okay. And as long as I understand the intent that's okay. And for my professional use case, I use Claude Code for 95 percent of my output at work. [01:21:06]
miyagawa: I try to review the whole code and I do not submit a pull request if I do not understand part of the code. Because that sounds, I don't know, at this point of time that sounds a little unprofessional and rude to my co-workers. [01:22:11]
jesse: It depends a lot on what it is and what the organization is. I try to be very clear that, there are, in a safety critical system or a regulated industry, we are not currently at the point where I would find it reasonable to be not reading the code. [01:22:26]
miyagawa: Right. [01:22:52]
jesse: In a business situation where you are using AI to contribute to a project where everyone else is hand coding, it is the same as if you hired an intern and told the intern to write the code and send the pull request. And ultimately you are responsible for the thing you're submitting. And what that means in different organizations is different. And it's going to be project by project, like for Superpowers, I mean I don't just read the code, I look at every character. [01:22:52]
miyagawa: Mm-hmm. [01:23:21]
jesse: Because skills are English text and the way Claude Code works, as of last Friday we have close to 500,000 installs inside Claude Code according to Anthropic. And most of those auto-update. I am very cautious about what we ship in Superpowers and absolutely read every word of it. [01:23:38]
miyagawa: I mean it's not just the source code that that's the product that gets shipped to the end users. [01:24:05]
jesse: But for code what matters to me is outcomes. And you need to be able to prove to yourself and possibly others that the code does what it's supposed to and doesn't do things it's not supposed to. And that is not simply reading the source code is not always the solution for that anyway. And so testing and verification are super important. And what that looks like going forward from here is a very long conversation on its own. Because especially once there's agents involved, old school tests are not gonna be enough. But what matters is outcomes. [01:24:13]
miyagawa: I was early in the days of coding agent before having Superpowers, I was struggling to let coding agent do the test driven development. Even if I write the definition of TDD and red green refactor, it doesn't follow and it doesn't do what I meant. Claude Code says okay, following your rule in Claude.md I'm gonna start with a failing test and I think it was a Go project and it started with a test that literally says fail. Because yeah, that's a failing test, but that's not what I meant. You write an actual expectation which will fail before the actual code. But that's, you know the agent didn't get it. I tried a few iterations to get it right. [01:25:08]
jesse: I had the same experience as you when I started using Claude Code and I just got frustrated that I didn't know in the beginning how to get Claude to do the right thing and I went around looking for people who had done prompt engineering, like real engineering of prompts and I couldn't find anything. And so you can find somewhere in my GitHub there's a repo I think it's called like Claude docs setup which is not really very easy to do. But if you look inside, you'll find my early experiments. And so what I did is I wrote a prompt. And the basic prompt was let's make a React to-do list, use local storage. And I wrote a tiny little harness wrapped around Claude Code that would pipe that text in and would just let it run in dangerously skip permissions mode. And I saved the transcript and the thing it built and the prompt. And I edited my CLAUDE.md to attempt to start to figure out how to get to do proper TDD. This was API credits before subscriptions existed. The first time I ran it, it cost like 25 cents, it took 2 minutes, and the TODO list app looks really pretty but if you reload the page all the todos are gone. Over the course of a couple of weeks, I got it to the point where with the same prompt and a better Claude.md it was a five phase project that cost twenty five dollars and took over twenty minutes. And it did strict red green TDD from the beginning too far. So the first thing it would do is it would write a failing test that proved that there was no package.json. [01:26:38]
miyagawa: Everything. [01:28:02]
jesse: But that was my introduction to how to teach Claude to do things right. So that's where I learned about the idea of saying something is a hard gate rather than a rule, it's a thing that has to become true or it can't continue. [01:28:04]
miyagawa: Mm-hmm. [01:28:22]
jesse: That which is now a pretty standard prompting technique. But it was very instructive to essentially do these sort of mini evals and see what a one line change would do in the Claude.md. One of the best single word changes was switching the first line from saying you're a senior engineer to saying you're a pragmatic senior engineer. Adding YAGNI, adding DRY, both of those materially changed how Claude behaved. Because there's so much history baked into those terms that Claude knew what to do. [01:28:23]
miyagawa: Going back to this waterfall model, but still uses lot of historically proven attributes like TDD, YAGNI, DRY. [01:29:06]
jesse: It turns out these are all things that you actually want. You want the test to be written for you want a failing test to be written and then only enough code to make the test pass, you don't wanna have code that does the same thing in two places, you don't wanna do premature engineering. And it is absolutely the case that Superpowers are sort of optimized for individual or small teamwork. But I think that it's also the case that agentic dev is best done as individuals or small teams. Because one person can move so fast and make so much and do so much work, that it's harder and harder to keep a large team on the same page and contributing together. [01:29:20]
miyagawa: Yeah. [01:30:12]
jesse: The cool thing is that that means that individuals and small teams now have an advantage that they never had before. It's like I have joked about we're entering the age of so Amazon had this idea of a two pizza team. It's if a team is bigger than if you can't feed your team with two pizzas, the team is too big. And that's the right size to get work done. And I think that we're kind of approaching the era of the two Pizza Keiretsu. Because small teams of humans can stay aligned on a vision and a plan in a way that a large organization can't. And with a small team who have a lot of essentially agentic workers, they're able to do things that used to take 10 times as many people, 50 times as many people. [01:30:14]
miyagawa: Yeah I get the feeling that humans are becoming the bottleneck. Reviews are becoming the bottleneck. And single-member, two-member team with the help of coding agent, agentic loop, can outperform the other teams who require code reviews and approvals for everything. [01:31:13]
jesse: It's always been true that big companies get slower and lots of there are often reasons for needing those reviews. But the CEO of Tailscale Avery has this post about every layer of review that you need to add doubles the amount of time things take. If every line of code gets read by a human, it is gonna slow you down. Sometimes you wanna be slowed down. Sometimes there are good reasons to slow down. But also I remember even back in the Perl days there was if you wanted to get a patch into a big project and you sent a ten line patch, somebody would reply with ten problems in your ten line patch. If you sent a 10,000 line patch, somebody would reply looks good to me, merged. Human review of code is not the solution to a lot of our problems. It's a thing we do that we've always done because we are unsure and we're trying to take care. But human review doesn't catch the bugs. [01:31:34]
miyagawa: Mm-hmm. In the case of getting a patch into the Perl language. I think the best thing you can do is to demonstrate how Perl fails at one thing with a failing test and then nerd snipe someone to write a patch for it. [01:32:54]
jesse: Nerd sniping is the best way. I remember many many years ago, so Perl historically has this debugger that's it's around forever. It's a perl5 script. It is so old or the Perl5 module but it's so old that it is dot PL. perl5db.pl. It has my favorite comment ever in a piece of software. That comment was if I remember correctly, "increment I" and the comment was a lie. That wasn't what the line of code was doing. But I remember many years ago, we were at a YAPC Asia with our friend Leon Brocard, and we were talking about the debugger, Leon had a beer, and I said something to him like, I think it would be impossible to make a new Perl debugger. There's no way you could do it. He made a new debugger for Perl that week. [01:33:32]
miyagawa: yeah, I remember that. Isn't it Devel::ebug? [01:34:16]
jesse: Devel::ebug. It even had time travel. It was a nice debugger. [01:34:20]
jesse: It's been too long since I've written perl. The last time I've wrote any lines of code was October. I wrote three lines of shell script in October. Yeah. I've had the most prolific six months of my career. I've shipped a new product. It's a weird week when I don't ship a new product. [01:34:26]
miyagawa: wow, what a time to be alive. [01:34:48]
jesse: Yeah, it's fun! I know venture capitalists who have taken leaves of absence from their VC firms so that they can write software again. Because it's fun. And it's like all these people who are too senior to program anymore are super successful with these new tools because the important skill isn't and wasn't do you know the syntax, it's can you explain the idea. [01:34:52]
miyagawa: That's always been the case, except in reality you have to type all the things anyway. [01:35:22]
jesse: You either had to type all the things or you had to hire employees. [01:35:24]
miyagawa: Right. Which adds a lot of layers. [01:35:28]
jesse: It's a lot easier to have a bad idea at two in the morning and tell your employees Claude and Codex hey go try to do this thing for me. [01:35:31]
jesse: Your human employees don't like it as much when you call them at two in the morning and say hey I want you to try this thing for me. [01:35:43]
miyagawa: Yeah, exactly. Well, Jesse, it's been a pretty interesting and fun conversation. [01:35:50]
jesse: Yeah it's been good to chat. [01:35:57]
miyagawa: If someone wants to contribute to Superpowers and even probably sponsor, there's a GitHub sponsorship right? [01:36:00]
jesse: There is GitHub sponsorship. I'm very happy to have some folks who are sponsoring me, but it is not absolutely not a requirement for anybody. I don't ever, for better or worse, it's a thing that I am happy to share. But yeah, there's GitHub Sponsors, there are a bunch of open issues, it's worth talking things through and if there's stuff that people wanna contribute, I'm very happy to have contributors. [01:36:12]
miyagawa: Yeah, sounds good. Alright, Jesse, thanks for the time. [01:36:45]
jesse: Thanks so much for having me. [01:36:48]

