**Lenny Rachitsky:** 今天我的嘉宾是 Alexander Imbiricos,他是 OpenAI 旗下广受欢迎且功能强大的编码智能体 Codex 的产品负责人。用 ChatGPT 负责人、之前也上过播客的 Nick Turley 的话来说,Alex 是他合作过的最喜欢的人之一,把他和他的公司引入 OpenAI 是他们做过最好的决定之一。同样,OpenAI 首席产品官 Kevin Weil 也评价说 Alex 就是最棒的。在我们的对话中,我们聊了在 OpenAI 做产品到底是什么感觉,Codex 是如何帮助 Sora 团队发布 Sora App 的——这个 App 在不到一个月内就登顶了 App Store。我们还聊了 Codex 目前 20 倍的增长,他们做了什么让 Codex 在编程上如此出色,为什么团队现在把重心放在让代码审查变得更容易而不仅仅是写代码,他对 AGI 时间线的看法,以及他关于 AI 智能体何时才会真正好用的思考等等。
**Lenny Rachitsky:** Today my guest is Alexander Imbiricos, product lead for Codex, OpenAI's incredibly popular and powerful coding agent. In the words of Nick Turley, head of ChatGPT and former podcast guest, Alex is one of my all-time favorite humans I've ever worked with and bringing him and his company into OpenAI ended up being one of the best decisions we've ever made. Similarly, Kevin Weil, OpenAI CPO, said Alex is simply the best. In our conversation, we chat about what it's truly like to build product at OpenAI, how Codex allowed the Sora team to ship the Sora app, which became the number one app in the App Store in under 1 month. Also, the 20x growth Codex is seeing right now and what they did to make it so good at coding, why his team is now focused on making it easier to review code, not just write code, his AGI timelines, his thoughts on when AI agents will actually be really useful, and so much more.
**Lenny Rachitsky:** Alexander,非常感谢你来做客,欢迎来到播客。
**Lenny Rachitsky:** Alexander, thank you so much for being here and welcome to the podcast.
**Alexander Imbiricos:** 非常感谢。我关注这个播客很久了,很高兴能来。
**Alexander Imbiricos:** Thank you so much. I've been following for ages and I'm excited to be here.
**Lenny Rachitsky:** 我更高兴。非常感谢。我想先聊聊你在 OpenAI 的经历。你大约一年前加入了 OpenAI,在那之前你自己创业了大约 5 年,再之前你在 Dropbox 做产品经理。我猜 OpenAI 跟你之前待过的任何地方都非常不同。直接问你:OpenAI 的运作方式最大的不同是什么?你在那里学到了什么,是你觉得不管以后去哪儿都会带着走的?
**Lenny Rachitsky:** I'm even more excited. I really appreciate that. I'm going to start with your time at OpenAI. So you joined OpenAI about a year ago. Before that, you had your own startup for about 5 years. Before that, you were a product manager at Dropbox. I imagine OpenAI is very different from every other place you've worked. Let me just ask you this. What is most different about how OpenAI operates and what's something that you've learned there that you think you're going to take with you wherever you go, assuming you ever leave?
**Alexander Imbiricos:** 我觉得最突出的,就是在 OpenAI 工作的速度和野心,跟我以前能想象的完全不在一个量级。说起来其实有点不好意思,因为每个创业公司创始人都觉得"我们公司节奏超快,人才标准超高,雄心壮志超大"。但我必须说,在 OpenAI 工作让我重新定义了这些词的含义。我们经常听到每家 AI 公司都在说"天哪,他们怎么能推进这么快"。
**Alexander Imbiricos:** By far, I would say the speed and ambition of working at OpenAI are just like dramatically more than what I can imagine. And you know, I guess it's kind of an embarrassing thing to say because you know, everyone who's a startup founder thinks like, "Oh yeah, my startup moves super fast and the talent bar is super high and we're super ambitious." But I have to say like working at OpenAI just kind of like made me reimagine what that even means. We hear this a lot about, you know, feels like every AI company is just like, "Oh my god, I can't believe how fast they're moving."
**Lenny Rachitsky:** 有没有一个具体的例子,就是"哇,这在其他任何地方都不可能这么快发生"?
**Lenny Rachitsky:** Is there an example of just like, "Wow, that wouldn't have happened this quickly anywhere else?"
**Alexander Imbiricos:** 最直观的就是 Codex 本身的爆发式增长。Codex 达到 10 倍规模只用了几个月,之后还远不止于此。经历过这件事之后——至少对我个人而言——我觉得以后不管做什么技术产品,都需要达到这种速度和规模。回想我在自己创业公司做的事情,节奏慢太多了。创业总有这个平衡问题:你到底要在一个想法上投入多少,还是发现行不通就赶紧转型。但我在 OpenAI 意识到的是,我们能够并且需要创造的影响力是如此之大,以至于我必须在时间分配上变得更加果断。
**Alexander Imbiricos:** The most obvious thing that comes to mind is just like the explosive growth of Codex itself. I think it's wild since we bumped our thermal number of it, like you know, it's like the 10x thing of Codex's scale was just like super fast in a matter of months. And it's like well more since then. And you know, like once you've lived through that or at least in speaking for myself, like having lived through that now, I feel like anytime I'm going to spend my time on like, you know, building tech product, there's that kind of that speed and scale that I now need to meet. If I think of like what I was doing at my startup, it moved like way slower. And you know, there's always this balance with startups of like how much do you commit to an idea that you have versus like find out that it's not working and then pivot. But I think one thing I've realized at OpenAI is like the amount of impact that we can have and in fact need to have to do a good job is so high that I have to be like way more ruthless with how I spend my time now.
**Lenny Rachitsky:** 在聊 Codex 之前,OpenAI 在组织架构或运作方式上有什么特别的设计,让团队能推进得这么快?因为每个人都想快,我猜背后一定有某种结构性的做法支撑。
**Lenny Rachitsky:** Before we get to Codex, is there a way that they've structured the org or the way that OpenAI operates that allows the team to move this quickly? Because everyone wants to move super fast. I imagine there's a structural approach to allowing this to happen.
**Alexander Imbiricos:** 一方面是我们正在构建的技术本身就改变了太多东西,不管是我们的构建方式还是我们能为用户实现什么。大家花了很多时间讨论基础模型的进步,但我认为即使模型今天停止进步——当然绝对不是这样——但即使停了,我们在产品上也远远没做够。还有太多产品可以做。所以我觉得就是这个时刻恰到好处。
**Alexander Imbiricos:** I mean, so one thing is just the technology that we're building with has like just transformed so many things, you know, from like both how we build, but also like what kinds of things we can enable for users. And you know, we spend most of our time talking about like the sort of improvements in the foundation models, but I believe that even if we had no more progress today with models, which is absolutely not the case, but even if we had no more progress, we are way behind on product. There's so much more product to build. So I think like just like the moment is right, if that makes sense.
但有很多反直觉的事情让我到了之后很惊讶。一个例子是,我之前在创业公司和在 Dropbox 的时候,作为 PM 很重要的一件事就是让团队对齐方向——先确保方向正确,然后在这个方向上加速。但在这里,因为我们不确定接下来会出现什么新能力,不确定什么技术方案会跑通,即使跑通了也不确定用户会不会买账,所以对我们来说更重要的是保持谦逊,更多地通过实验来学习,快速尝试。整个组织就是这样搭建的,极度自下而上(bottoms up)。
But I think there's a lot of sort of counterintuitive things that surprised me when I arrived as far as like how things are structured. One example that comes to mind is like when I was working at my startup and before that when I was at Dropbox, it was like very important, you know, especially as a PM to like always kind of rally the ship and it was kind of like make sure you're pointed in the right direction and then you can like accelerate in that direction. But here, I think because we don't exactly know like what capabilities will even come up soon and we don't know what's going to work technically and then we also don't know what's going to land even if it works technically, it's much more important for us to be very like humble and learn a lot more empirically and just try things quickly. And like the org is set up in that way to be incredibly bottoms up.
这又是一个大家都爱说的事——每个人都说自己的组织是自下而上的。但 OpenAI 是真正意义上的自下而上,这对我来说是一个学习过程。以后如果我在别的地方工作——我觉得未来可能不存在"非 AI 公司"这个概念了,我甚至不知道那意味着什么——但假设回到过去,我会完全换一种方式来管理。
You know, this is again one of those things that like as you were saying, everyone wants to move fast. I think everyone likes to say that they're bottoms up or at least a lot of people do, but OpenAI is like truly truly bottoms up and that's like been a learning experience for me. That now like it'll be interesting if I ever work at — I don't think that it'll even make sense to work at a non-AI company in the future. I don't even know what that means. But if I were to imagine it or go back in time, I think I would like run things totally different.
**Lenny Rachitsky:** 我听到的是一种"准备、开火、再瞄准"的方式,而不是"准备、瞄准、开火"。虽然这听起来可能不太靠谱,但我在 AI 公司确实经常听到这种说法。因为你不知道人们会怎么用,所以花大量时间把它做完美没有意义。不如先以原始的形态推出去,看看人们怎么用,然后在那个方向上全力以赴。
**Lenny Rachitsky:** What I'm hearing is kind of this ready, fire, aim is the approach more than ready, aim, fire. And there's something — and as you process that cuz that may not come across well, but I actually have heard this a lot at AI companies — is because you don't know how people will use it, it doesn't make sense to spend a lot of time making it perfect. It's better to just get it out there in a primordial way, see how people use it, and then go big on that use case.
**Alexander Imbiricos:** 对。用你这个类比来说,其实有"瞄准"的成分,只是瞄准的精度比较模糊。大致上我们觉得会发生什么?我在这里跟一位研究负责人学到很多,他喜欢说:在 OpenAI,你可以就一年以后的事情进行非常好的讨论,虽然有很多不确定性,但那是一个合适的时间尺度。然后对于几周或几个月内的事,我们也能有很好的讨论。但中间有一个尴尬的灰色地带——当你接近一年但还没到一年的时候,非常难以推理和规划。
**Alexander Imbiricos:** Yeah. It's like to okay, to use this analogy a little bit, I feel like there is an aim component, but the aim component is much fuzzier. You know, it's kind of like roughly what do we think can happen? Like someone I've learned a ton from working here is a research lead and he likes to say that like at OpenAI, you can have really good conversations about something that's like a year plus from now. And you know, there's a lot of ambiguity in what will happen, but like that's a right sort of timeline. And then we can have really good conversations about what's happening like in like low months or weeks. But there's kind of this like awkward middle ground, which is like as you start approaching a year, but you're not at a year where it's like very difficult to reason about, right?
所以在"瞄准"这件事上,我觉得我们想知道的是:我们试图构建的未来大致是什么样的?很多 AI 相关的问题,比如对齐(alignment),需要你思考非常远的未来,所以我们在那个层面上是模糊地瞄准的。但到了更具体的战术层面——我们要做什么产品、用户会怎么用这个产品——这才是我们更倾向于"实验验证"的地方。
And so as far as like aiming, I think we want to know like, okay, what are some of the futures that we're trying to build towards? And like a lot of the problems we're dealing with in AI like such as alignment or problems you need to be thinking out like really far out into the future. So we're kind of aiming fuzzily there. But when it comes down to the more tactically like, oh yeah, like what product will we build and therefore how will people use that product? That's the place where we're much more like, let's find out empirically.
**Lenny Rachitsky:** 说得好。还有一点,当别人听到你们说"我们要自下而上,我们要大量尝试,未来几个月没有精确计划"的时候,关键前提是你们招到了全世界最顶尖的人。这感觉是自下而上能成功的核心要素——有了最好的人,基本上只需要做好指导就行了。
**Lenny Rachitsky:** That's a good way of putting it. Something else that when people hear this, people sometimes hear companies like yours saying, okay, we're going to be bottoms-up, we're going to try a bunch of stuff, we're not going to have exactly a plan of where it's going in the next few months. The key is you'll hire the best people in the world. And so that feels like a really key ingredient in order to be this successful at bottoms-up work. And just supervising basically.
**Alexander Imbiricos:** 我到了之后确实很惊讶,甚至可以说震惊于这里每个人的自驱力和自主性。所以我觉得 OpenAI 的运作方式——跟很多东西一样——你不能听一期播客就说"我要把这套搬到我公司去"。这话可能有点刺耳,但我觉得确实只有极少数公司有这样的人才密度能这么做。如果你要实施这种模式,可能需要做一些调整。
**Alexander Imbiricos:** Um I was just like, again, surprised or even shocked when I arrived at like the level of like individual like drive and like autonomy that everyone here has. So I think like the way that OpenAI runs — like many — you can't like read this or listen to a podcast and go like, I am just going to deploy this to my company. Um you know, maybe this is a harsh thing to say, but I think like yeah, very few companies have the talent caliber to be able to do that. So it might need to be like adjusted if you were going to implement this.
**Lenny Rachitsky:** 好,那我们聊聊 Codex。你负责 Codex 的工作。Codex 进展怎么样?有什么数据可以分享吗?另外也不是所有人都完全了解 Codex 是什么,能介绍一下吗?
**Lenny Rachitsky:** Okay, so let's talk Codex. You lead work on Codex. How's Codex going? What numbers can you share? Is there anything you can share there? Also just not everyone knows exactly what Codex is. Explain what Codex is.
**Alexander Imbiricos:** 当然。我很幸运能活在未来并且领导 Codex 的产品工作。Codex 是 OpenAI 的编码智能体(coding agent)。具体来说,它是一个 IDE 扩展——比如 VS Code 扩展——你可以安装,或者是一个终端工具。安装之后,你就可以和 Codex 配对,让它帮你回答代码相关的问题、写代码、跑测试、执行代码,完成软件开发生命周期中间那一大段跟"把代码写好并部署到生产环境"相关的工作。
**Alexander Imbiricos:** Totally, yeah. So I had the very lucky job of of living in the future and leading products on Codex. And Codex is OpenAI's coding agent. So super concretely that means it's an IDE extension, like a VS Code extension, that you can install or terminal tool that you can install. And when you do so, you can then basically pair with Codex to answer questions about code, write code, run tests, execute code, and do a bunch of the work in sort of that like thick middle section of the software development life cycle, which is all about writing code that you're going to get into production.
更广义地说,我们把 Codex 看作一个软件工程队友的起点。当我们用"队友"这个大词的时候,我们想象的是它不仅能写代码,还能在前期的创意和规划阶段就参与进来,在后期的验证、部署和维护阶段也能发挥作用。
More broadly, we think of Codex as like — what it currently is is just the beginning of a software engineering teammate. And so, you know, when we use a big word like teammate, like some of the things we're imagining are that it's not only able to write code, but actually it participates like early on in like the ideation and planning phases of writing software, and then further downstream in terms of like validation, deploying, and like maintaining code.
说得有趣一点,我喜欢这样想象 Codex 现在的状态:它就像一个特别聪明的实习生,但死活不肯看 Slack,也不会主动去查 DataDog 或 Sentry,除非你让它去看。所以不管它多聪明,你都很难完全信任它独立写代码。这就是现在大多数人的用法——跟它结对编程。但我们想达到的状态是,它能像你新招的一个实习生一样工作:你不只让他写代码,而是让他参与整个开发周期。即使他第一次没做对,他最终也能通过迭代做到位。
To make that a little more fun, like one thing I like to imagine is like if you think of what Codex is today, it's a bit like this like really smart intern that like refuses to read Slack, and like doesn't check DataDog or like Sentry unless you ask it to. And so like no matter how smart it is, like how much you're going to trust it to write code without you also working with it, right? So that's how people use it mostly today is they pair with it. But we want to get to the point where, you know, it can work like just like a new intern that you hire, and you don't only ask them to write code, but you ask them to participate across the cycle. And so you know that like even if they don't get something right the first try, they're eventually going to be able to iterate their way there.
**Lenny Rachitsky:** 我本来以为你说它不看 Slack 和 DataDog,是说它不会分心,能一直保持专注和心流状态。但我明白你的意思了——它缺少对整体上下文的了解。
**Lenny Rachitsky:** I thought the point about not reading Slack and DataDog was it's just not distracted, it's just constantly focused, and is always in flow. But I get what you're saying — there is it doesn't have all the context on everything that's going on.
**Alexander Imbiricos:** 而且这不仅仅是在执行任务时的问题。如果你想想最好的人类队友,你不会告诉他们该做什么。也许刚入职的时候你们开几个会,彼此了解一下沟通方式。然后你给他们一些入门任务,委派几件事。但最终你会说:"好的,你负责跟这些人一起在代码库的这个部分工作。你也可以去其他部分跟其他人合作。你觉得该做什么就做什么。"我们把这叫做主动性(proactivity),让 Codex 达到主动性是我们的核心目标之一。
**Alexander Imbiricos:** And like that's not only true when it's performing a task, but again, if you think of like the best human teammates, like you don't tell them what to do. Right? Like maybe when you first hire them, you have like a couple meetings and you're like, hey, like you kind of learn like, okay, this is how to communicate with this person. Then eventually you give them some starter tasks, you delegate a few tasks. But then eventually you just say like, hey, great, okay, you're working with this set of people in this area of the code base. You know, feel free to work with other people in other parts of the code base too, even. And yeah, you tell me what you think makes sense to be done. Right? And so, you know, we think of this as proactivity, and like one of our major goals with Codex is to like get to proactivity.
我觉得这对 OpenAI 的使命至关重要——把 AGI 的益处带给全人类。我经常半开玩笑地说,现在的 AI 产品其实挺难用的,因为你必须自己去想它什么时候能帮到你。如果你没有在 prompt 模型,它大概率就没在帮你。想想普通用户每天 prompt AI 的次数,大概是几十次。但如果你想想一个真正智能的实体每天能帮到你多少次,那是几千次。所以 Codex 的一个重要目标,就是搞清楚一个真正的队友型智能体应该是什么形态——默认就是有帮助的。
I think this is critically important to like achieve the mission of OpenAI, which is to deliver the benefits of AGI to all humanity. You know, I like to joke today that like AI products — and it's a half joke — they're actually like really hard to use, because you have to like be very thoughtful about when it could help you. And if you're not prompting a model to help you, it's probably not helping you at that time. And if you think of how many times like the average user is prompting AI today, it's probably like tens of times. But if you think of how many times people could actually get benefit from a really intelligent entity, it's thousands of times per day. And so a large part of our goal with Codex is to figure out like, what is the shape of an actual teammate agent that is sort of helpful by default?
**Lenny Rachitsky:** 当人们想到 Cursor 甚至 Claude Code 的时候,想到的是一个帮你写代码、自动补全、可能做一些 agentic 工作的 IDE。但我听出来你们的愿景不同——它是一个队友。它就像一个远程队友在帮你写代码,你跟它交流、给它分配任务。同时它也能做 IDE 自动补全之类的事情。你是这样理解 Codex 的差异化定位的吗?
**Lenny Rachitsky:** When people think about Cursor and even Claude Code, it's like IDE that helps you code and kind of auto-completes code and maybe does some agentic work. What I'm hearing here is the vision is different, which is it's a teammate. It's like a remote teammate building code for you that you talk to and ask to do things. And it also does IDE auto-complete and things like that. Is that kind of a differentiator in the way you think about Codex?
**Alexander Imbiricos:** 基本上就是这个意思——如果你是一个开发者,想要完成某件事,我们希望你觉得自己拥有了超能力,能够快得多。但我们不认为你需要一直坐在那里想"此刻我该怎么调用 AI 来做这件事"才能获得这些好处。我们希望你把它接入你的工作方式,然后它就自动开始干活,你不需要刻意去想它。
**Alexander Imbiricos:** It's basically this idea that like we want — if you're a developer and you're trying to get something done, we want you to just feel like you have superpowers and you're able to move much, much faster. But we don't think that in order for you to reap those benefits, you need to be sitting there constantly thinking about like, how can I invoke AI at this point to do this thing? We want you to be able to sort of like plug it in to the way that you work and have it just start to do stuff without you having to think about it.
**Lenny Rachitsky:** 好的,这些话题我还有很多问题,但先说说进展如何?有什么统计数据或数字可以分享吗?
**Lenny Rachitsky:** Okay, I have a lot of questions on these lines, but just how's it going? Is there any stats, any numbers you can share about how Codex is doing?
**Alexander Imbiricos:** Codex 自从 8 月份 GPT-5 发布以来一直在爆发式增长。这里面确实有一些有趣的产品洞察,关于我们是怎么解锁这个增长的,如果你感兴趣的话可以聊。我们之前公开过的数据是 8 月以来增长了远超 10 倍,实际上已经是 20 倍了。Codex 的模型每周处理数万亿 token,它基本上是我们最常用的编码模型。
**Alexander Imbiricos:** Yeah, it's been — Codex has been growing like absolutely explosively since the launch of GPT-5 back in August. Listen, definitely some interesting like product insights to talk about as to like how we unlocked that growth if you're interested. But again, the last thought we shared there was like we were like well over 10x since August. In fact, it's been like 20x since then. Also, the Codex models are serving many many trillions of tokens a week now, and it's basically like our most served coding model.
一个很酷的事情是,我们组建 Codex 团队的方式是打造一个高度整合的产品和研究团队,让他们同时迭代模型和harness(运行环境)。这让你能做更多事、跑更多实验来探索两者如何协同。我们一开始只是在为自己的第一方 harness 训练这些模型,有很强的产品观点。但最近我们开始看到其他大型 API 编码客户也在采用这些模型。所以 Codex 模型实际上也成了 API 上最常用的编码模型。
Um one of the really cool things that we've seen is that the way that we decided to set up the Codex team was to build a you know, a really tightly integrated product and research team that are iterating on the model and the harness together. And it turns out that lets you just do a lot more and try many more experiments as to how these things will work together. And so, we were just training these models for use in our first-party harness that we were very opinionated about. And then what we've started to see more recently actually is that other major sort of API coding customers are now starting to adopt these models as well. And so we've reached a point where actually the Codex model is the most served coding model in the API as well.
**Lenny Rachitsky:** 你提到了是什么解锁了这个增长——我非常想听。之前我觉得 Claude Code 势不可挡,所有人都在用 Claude Code,它远远是最好的编码方式。然后突然 Codex 出现了。我记得 Karpathy 发推说他从没见过这样的模型——他遇到的最棘手的 bug,花好几个小时都搞不定,别的工具也解决不了,交给 Codex 跑一个小时就解决了。你们到底做了什么?
**Lenny Rachitsky:** You hinted at this — what unlocked this growth. I am extremely interested in hearing that. It felt like before, I don't know, maybe this was before you joined the team. It just felt like Claude Code was killing it. Just everyone was sitting on top of Claude Code. It was by far the best way to code. And then all of a sudden Codex comes around. I remember Karpathy tweeted that he just like has never seen a model like this. I think the tweet was the gnarliest bugs that he runs into that he just spends hours trying to figure out, nothing else has solved. He gives it to Codex, lets it run for an hour, and it solves it. What did you guys do?
**Alexander Imbiricos:** 我们在 OpenAI 有一个坚定的使命,就是构建 AGI。所以我们花了很多时间思考如何让产品具有可扩展性。我前面提过,如果你是工程师,你每天应该从 AI 那里获得几千次帮助。基于这个想法,我们在发布第一版 Codex 的时候思考了很多底层原语(primitives),那就是 Codex Cloud——一个拥有自己的计算机、运行在云端、你可以把任务委派给它的产品。最酷的是你可以并行运行大量任务。
**Alexander Imbiricos:** We have the strong sort of mission here at OpenAI to, you know, basically to build AGI. And so we think a lot about what — how can we shape the product so that it can scale? Right? You know, earlier I was mentioning like, hey, like if you're an engineer, you should be getting help from AI like thousands of times per day, right? And so we thought a lot about the primitives for that when we launched our first version of Codex, which was Codex Cloud. And that was basically a product that had its own computer, lived in the cloud, you could delegate to it. And you know, the sort of the coolest part about that was you could run many, many tasks in parallel.
但我们遇到的挑战是,这种方式的上手门槛比较高——无论是环境配置、给模型提供验证所需的工具,还是学习这种 prompt 方式。用队友的类比来说,就好像你招了一个队友,但永远不能跟他打电话,只能异步来回沟通。对某些队友来说这是可行的,而且最终你确实希望大部分时间都这样工作,所以这仍然是未来的方向。但初始的采用门槛太高了。
But some of the challenges that we saw are that it's a little bit harder to set that up both in terms of like environment configuration, like giving the model the tools it needs to validate its changes, and to learn how to prompt in that way. And sort of my analogy for this is going back to this teammate analogy. It's like if you hired a teammate, but you're never allowed to get on a call with them, and you can only go back and forth, you know, asynchronously over time. Like that works for some teammates. And eventually that's actually how you want to spend most of your time, so that's still the future. But it's hard to initially adopt.
所以我们仍然有那个愿景——让它成为你可以委派任务、并且具有主动性的队友。我们看到这个方向在增长。但关键的突破点其实是,你首先需要让用户以一种更直观、几乎零门槛就能获得价值的方式上手。所以现在绝大多数用户发现 Codex 的方式,要么是下载一个 IDE 扩展,要么是在命令行里运行它,智能体就在你的电脑上跟你互动地工作。它在一个沙箱(sandbox)里运行——这是一项很酷的技术,确保了安全性。但它能访问所有依赖。如果智能体需要执行某个命令,它可以在沙箱中执行,不需要任何环境配置。如果某个命令在沙箱中无法运行,它会直接问你。这样你就能进入一个非常强的反馈循环。然后随着时间推移,我们团队的工作就是帮你把这个反馈循环转化为——作为使用产品的副产品——你在配置它,以便后续可以把任务委派给它。
And so we still have that vision of like that's what we're trying to get you to, a teammate that you delegate to and then is proactive. And we're seeing that growing, but the key unlock is actually first you need to land with users in a way that's like much more intuitive and like trivial to get value from. So the way that most people discover — like the vast majority of users discover Codex today is either they download an IDE extension, or they run it in their CLI, and the agent works there with you on your computer interactively. And it works within a sandbox, which is actually like a really cool piece of tech to help that be safe and secure. But it has access to all those dependencies. So if the agent needs to do something, like it needs to run a command, it can do so within the sandbox, we don't have to set up any environment. And if it's a command that doesn't work in the sandbox, it can just ask you. And so you can get into this like really strong feedback loop using the model. And then over time like our team's job is to like help turn that feedback loop into you sort of as a byproduct of using the product configuring it so that you can then be delegating to it down the line.
继续用队友的类比——如果你招了一个人,给他分配工作,但只给他一台刚从商店买的空电脑,他很难开展工作。但如果你跟他并肩工作,你可以说:"哦,你没有这个服务的密码,给你密码。对,放心运行这个命令。"这样他之后就能独立工作好几个小时了。
And again, analogy — you know, keep going back to it, but like if you hire a teammate and you ask them to do work, but you just give them like a fresh computer from the store, it's going to be hard for them to do their job, right? But if as you work with them side by side, you could be like, oh, you don't have a password for this service we use. Here's the password for this service. You know, yeah, don't worry, feel free to run this command. Then it's like much easier for them to then go off and do work for hours without you.
**Lenny Rachitsky:** 我理解了——最初版本的 Codex 太超前了,是一个在云端异步帮你编码的远程智能体。你们做的是退回来一步,先融入工程师已有的工作方式——IDE 和本地环境——帮他们逐步过渡到这个新世界。
**Lenny Rachitsky:** So what I'm hearing is the initial version of Codex was almost too far in the future. It's like a remote in the cloud agent that's coding for you asynchronously. And what you did is, okay, let's actually come back a little bit. Let's integrate into the way engineers already integrate into IDEs and locally and help them kind of on-ramp to this new world.
**Alexander Imbiricos:** 完全正确。这一点很有意思,因为我们在 OpenAI 大量使用自己的产品(dogfooding)。Codex 整年都在加速 OpenAI,云端产品对公司也是一个巨大的加速器。只不过在这个场景下,我们从 dogfooding 获得的信号跟市场上的信号有点不同。因为在 OpenAI,我们每天都在训练推理模型,所以很习惯这种先想清楚、大规模并行运行、过一段时间再异步回来看结果的 prompt 方式。所以现在我们做产品的时候,仍然从内部 dogfooding 获取大量信号,但也非常注意不同受众使用产品的不同方式。
**Alexander Imbiricos:** Totally. And this was quite interesting because we dogfood product a ton at OpenAI. So, you know, dogfood we as in we use our own product. And so, Codex has been accelerating OpenAI over the course of the entire year and the cloud product was a massive accelerator to the company as well. It just turns out that this is one of those places where the signal we got from dogfooding is a little bit different from the signal you get from like the general market because at OpenAI, you know, we train reasoning models all day and so, we're very used to this kind of prompting and like, you know, think up front, run things massively in parallel. And it will take some time and then come back to it later asynchronously. And so, you know, now when we build, we still get a ton of signal from dogfooding internally, but we're also very cognizant of like the different ways that different audiences use the product.
**Lenny Rachitsky:** 这真的很有趣。就像"活在未来,但别太未来"。我能想象 OpenAI 的人都活在非常遥远的未来,有时候这对普通人不适用。那在智能程度、训练数据方面呢?有什么其他因素帮助 Codex 提升了编码能力?是更好更干净的数据?还是模型本身在进步?还有什么真正推动了加速?
**Lenny Rachitsky:** That's really funny. It's like live in the future, but maybe not too far in the future. And I could see how everyone at OpenAI is living very far in the future and sometimes that won't work for everyone. What about just like intelligence, training data? I don't know. Is there something else that helped Codex accelerate its ability to actually code? Is it like better, cleaner data? Is it more just models advancing? Is there anything else that really helped it accelerate?
**Alexander Imbiricos:** 这里有几个方面。模型确实进步了很多。就在上周三,我们发布了 GPT-5.1 Codex Max——一个名字非常贴切的模型。它很棒。一方面,对于你之前用 GPT-5.1 Codex 做的同样任务,它大约快了 30%。另一方面,它解锁了大量智能。如果你在更高的推理级别使用它,它明显更聪明。你提到的 Karpathy 那条推文——把你最棘手的 bug 交给它——市场上当然有很多变化,但 Codex Max 绝对是在扛起攻克最难 bug 的大旗。
**Alexander Imbiricos:** Yeah, so there's like a few components here. I guess, you know, you were mentioning models and the models have improved a ton. In fact, just last Wednesday, we shipped GPT-5.1 Codex Max. A very, you know, accurately named model. That is that is awesome. It is awesome both because it is for any given task that you were using GPT-5.1 Codex for, it's like, you know, roughly 30% faster at accomplishing that task. But also, it unlocks a ton of intelligence. So, if you use it at our higher reasoning levels, it's just like even smarter. And you know, that feedback — or that tweet you were saying like Karpathy made about like, "Hey, give this your gnarly bugs." — like, you know, obviously there's a ton going on in the market right now, but like Codex Max is definitely like carrying that mantle of tackling the hardest bugs.
但我想说的是,我们的思路正在从"只管训练最好的模型"向"真正理解一个智能体整体是什么"演进。我不打算精确定义智能体,但至少我们认为它有三层:一个非常聪明的推理模型,知道如何把特定任务做好;然后需要通过 API 把这个模型供给一个 harness(执行环境);这两层也都起着非常关键的作用。
But I will say it's like some of what how we're thinking about this is evolving a little bit from being like, "Yeah, we're just going to think about the model and like let's just like train the best model." to really thinking about like what is an agent actually overall. Right? And you know, I'm not going to try to define agent exactly, but at least the stack that we think of it as having is just like you have this model, really smart reasoning model, that knows how to do a specific kind of task really well, so we can talk about how we make that possible. But then actually, we need to serve that model through an API into a harness and both of those things also have a really big role here.
比如,我们很自豪的一点是,GPT-5.1 Codex Max 可以长时间持续工作。这不是常规情况,但你可以让它这么做,而且确实会发生。现在我们经常听到用户说"它跑了一整夜"或者"它跑了 24 小时"。要让一个模型连续工作这么长时间,它会超出上下文窗口(context window)。我们的解决方案叫做 compaction(压缩)。这个功能实际上用到了三层栈的全部。模型需要理解 compaction 的概念——知道当接近上下文窗口极限时,它可能需要准备在新的上下文窗口中继续运行。API 层需要一个理解这个概念的端点。harness 层需要能够准备相应的负载。所以发布这个 compaction 功能——让任何使用 Codex 的人都能获得这种能力——实际上意味着我们需要跨三层协同工作,我觉得这种模式会越来越普遍。
So, for instance, one of the things that we're really proud of is you can have GPT-5.1 Codex Max work for really long periods of time. That's not like normal, but you can set it up to do that or that might happen. But now routinely we'll hear about people saying like, "Yeah, it ran like overnight." or it ran for 24 hours. And so, you know, for a model to work continuously for that amount of time, it's going to exceed its context window. And so, we have a solution for that which we call compaction. That compaction is actually a feature that uses like all three layers of that stack. So, you need to have a model that has a concept of compaction and knows like, "Okay, as I start to approach its context window, I might be asked to like prepare to be run in a new context window." And then at the API layer, you need an API that like understands this concept and like has an endpoint that you can hit to do this change. And at the harness layer, you need a harness that can like prepare the payload for this to be done. And so, like shipping this compaction feature that now just like made this behavior possible to like anyone using Codex, actually meant working across all three things and I think that's like increasingly going to be true.
另一个可能被低估的方面是,如果你看看市面上各种编码产品,它们都有非常不同的工具 harness 和不同的设计理念。如果你想训练一个模型在所有这些方式下都表现出色——也许你坚持用语义搜索,也许你坚持调用特定工具,也许像我们一样坚持直接用 shell、在终端中工作——如果你只优化一种方式,你可以快得多。所以 Codex 的做法就是直接用 shell,为了确保安全,我们有一个模型已经习惯在其中运行的沙箱。回到你最初的问题,我觉得最大的加速器就是我们在并行构建三层栈,同时不断调优每一层,并且通过一个紧密整合的产品和研究团队持续实验它们的协同方式。
Another maybe like under-appreciated version of this is if you think about all the different coding products out there, they all have like very different tool harnesses with like very different opinions on how the model should work. And so, if you want to train a model to be good at like all the different ways it could work — like, you know, maybe you have a strong opinion that it should work using semantic search. Right? Maybe you have a strong opinion that it should like call the spoke tools. Or maybe you have like in our case a strong opinion that it should just use like the shell, work in the terminal. You know, you can move much faster if you're just optimizing for one of those worlds. Right? And so, the way that we built Codex is that it just uses the shell, but in order to make that like safer and secure, we have a sandbox that the model is used to operating in. So, I think one of the biggest accelerators to go all the way back to your question is just like we're building all three things in parallel and like kind of tuning each one and constantly experimenting with how those things work with like a tightly integrated product and research team.
**Lenny Rachitsky:** 你觉得你们怎么赢?你觉得会一直是这种各家模型不断超越彼此的竞赛吗?还是有可能某一家一骑绝尘,别人永远追不上?有没有一条通往"我们赢了"的路径?
**Lenny Rachitsky:** How do you think you win in this space? Do you think it'll always be this kind of like race with other models constantly kind of leapfrogging each other? Do you think there's a world where someone just runs away with it and no one else can ever catch up? Is there like a path to just — we win?
**Alexander Imbiricos:** 还是回到构建队友这个理念。不仅是一个参与团队规划和优先级的队友,不仅是一个能认真测试代码、帮你维护和部署的队友,甚至是一个——如果你再想想工程师队友——他还能帮你发日历邀请、调整站会时间,对吧?在我看来,如果我们假设每天或每周都有某个实验室部署一项疯狂的新能力,我们人类根本不可能跟上并使用所有这些技术。所以我觉得我们需要达到这样一个世界:你有一个 AI 队友或者超级助手(super assistant),你只管跟它说话,它就知道怎么主动帮忙。你不需要去读最新的使用技巧,只要接入它就能获得帮助。
**Alexander Imbiricos:** Again, comes back to this idea of like building a teammate. And not just a teammate that, you know, participates in team planning and prioritization. Not just a teammate that, you know, really tests its code and like helps you maintain and deploy it. But even a teammate, you know, like if you think again, an engineering teammate, they can also like schedule a calendar invite, right? Or move standup. Or do whatever, right? And so, in my mind, if we just imagine that every day or every week some like crazy new capability is just going to be deployed by a research lab, it's just impossible for us like, you know, as humans to keep up and like use all this technology. And so, I think we need to get to this world where you kind of just have like an AI teammate or a super assistant that you just talk to and it just knows how to be helpful like on its own. Right? And so, you don't have to be like reading the latest tips for how to use it and you just like you plug it in and it just provides help.
这就是我认为我们正在构建的东西的形态,如果能做到,我觉得它会是一个非常有粘性的赢家产品。我脑子里的想象是——也许一个有趣的话题是:聊天是 AI 的正确界面吗?我觉得当你不知道该用它做什么的时候,聊天是非常好的界面。就像跟队友在 Teams 或 Slack 上聊天一样,你可以问任何东西,它就像一个万能公约数。所以你可以跟超级助手聊任何话题,不管是不是编码。
And so, that's kind of the shape of what I think we're building and I think that will be like a very sticky like winning product if we can do so. So, the shape that I'm in my head at least I have is — I mean maybe a fun topic is like is chat the right interface for AI? I think chat is a very good interface when you don't know what you're supposed to use it for. In the same way that if I think of like I'm on a Teams or in Slack with a teammate, chat is pretty good. I can ask for whatever I want, right? It's like kind of the common denominator for everything. So, you can chat with a super assistant about whatever topic you want, whether it be coding or not.
然后如果你是某个特定领域的专家,比如编码,你可以拉起一个 GUI 深入查看和编辑代码。所以我觉得作为 OpenAI,我们需要构建的是:你有 ChatGPT,它是一个对所有人都普遍可用的工具。你甚至在工作之外就开始用它,你习惯了被 AI 加速这件事。然后你到了工作中,很自然地就会想"我直接问它就好了",不需要知道所有的连接器或功能。你只管问它寻求帮助,它会在那个时刻以最好的方式帮助你。甚至在你没有求助的时候也会主动介入。在我看来,如果我们能做到这一点,这就是真正的赢家产品。
And then if you are like a functional expert in a specific domain such as coding, there's like a GUI that you can pull up to go really deep and like look at the code and like work with the code. So, I think like what we need to build as OpenAI is basically this idea of like you have chat, ChatGPT, and that is a tool that's like ubiquitously available to like everyone. You start using it even like outside of work, right? To just help you. You become very comfortable with the idea of being accelerated with AI. And so, then you get to work and you just can naturally just, yeah, I'm just going to ask it for this and I don't need to know about all the connectors or like all the different features. I'm just going to ask it for help and it'll surface to me the best way that it can help at this point in time. And maybe even chime in when I didn't ask it for help. So, in my mind if we can get to that, I think that's, you know, that's how we really build like the winning product.
**Lenny Rachitsky:** 这真的很有意思,因为我跟 ChatGPT 负责人 Nick Turley 聊天的时候,他说 ChatGPT 最初的名字就是 super assistant 之类的。有意思的是一边是超级助手这个方向,另一边是 Codex 这个方向——几乎像是 to C 版本和 to B 版本。我听到的是,想法是从编码和构建开始,然后扩展到为你做各种事情——安排会议、在 Slack 里发消息、交付设计。这是不是某种意义上 ChatGPT 的企业版?还是有别的什么?
**Lenny Rachitsky:** This is so interesting because at with my chat with Nick Turley, the head of ChatGPT, I think he shared that the original name for ChatGPT was super assistant or something like that. Yeah. And it's interesting that there's like that approach to the super assistant and then there's this Codex approach. It's almost like the B2C version and the B2B version. And what I'm hearing is the idea here is, okay, you start with coding and building and then it's doing all this other stuff for you, scheduling meetings, I don't know, probably posting in Slack. I don't know, shipping designs. Is the idea there this is like the the business version of ChatGPT in a sense or is there something else there?
**Alexander Imbiricos:** 好的,这就进入一年时间尺度的讨论了。很多事情可能更快发生,但从模糊程度来看,我觉得我们在一年的尺度上。我给你一个论点和一个可能的路径,至于具体怎么实现,谁知道呢。基本上,如果我们要构建一个超级助手,它必须能做事情。所以我们会有一个模型,它能够影响你的世界。我们过去一年学到的一个经验是,模型在能使用计算机的时候效率高得多。所以现在变成了"我们需要一个能使用计算机的超级助手"。那问题来了:它该怎么使用计算机?有很多方法——你可以尝试侵入操作系统用无障碍 API,或者你可以让它点击操作屏幕,但那有点慢且不稳定。还有另一种方式——事实证明,模型使用计算机的最好方式就是写代码。所以我们逐渐形成了这样的理念:如果你想构建任何智能体,也许你都应该构建一个编码智能体。对于非技术用户来说,他们甚至不会知道自己在使用一个编码智能体,就像没人会想自己到底在不在用互联网一样,只会想"Wi-Fi 连上了没"。
**Alexander Imbiricos:** Yeah, so you know, so we're getting to the like the one-year time horizon conversation. A lot of this might happen sooner, but in terms of fuzziness, I think we're at the one-year. So, I'll give you like a contention and like a plausible way to get there, but as for how it happens, who knows. So, basically, if we're going to build a super assistant, it has to be able to do things, right? So, like we're going to have a model and it's going to be able to do stuff affecting your world. And one of the learnings I think we've seen over the past year or so is that for models to do stuff, they're much more effective when they can use a computer. Right? Okay. So, now we're like, "Okay, we need the super assistant that can use a computer." Right? Or many computers. And now the question is, okay, well, how should it use the computer? Right? And there's lots of ways to use a computer. You know, you could try to hack the OS and like use accessibility APIs. Maybe a bit easier is you could point and click. That's a little slow, you know, and unpredictable sometimes. And another way — it turns out the best way for models to use computers is simply to write code. Right? And so, we're kind of getting to this idea where like, "Well, if you want to build any agent, maybe you should be building a coding agent." And maybe to the user, a non-technical user, they won't even know they're using a coding agent the same way that no one thinks about are they using the internet or not. It's just they're more just like, "Is Wi-Fi on?"
所以我觉得我们在 Codex 做的事情是,我们在构建一个软件工程队友,作为其中的一部分,我们在构建一个能通过写代码来使用计算机的智能体。我们已经看到了一些需求拉动。虽然还很早期,但我们开始看到有人把 Codex 用于编码相邻的产品用途。随着这个趋势发展,我觉得我们会自然地发现:只要存在一种可以通过写代码解决的方式——即使你在做金融分析——也许都应该让智能体写代码来做。
So, I think that what we're doing with Codex is we're building a software engineering teammate and as part of that, we're kind of building an agent that can use a computer by writing code. And so, we're already seeing like some pull for this. It's like quite early, but we're starting to see people who are using Codex for like coding-adjacent product purposes. And so, as that develops, I think we'll just naturally see that like, "Oh, it turns out like we should just always have the agent write code if there is a coding way to solve a problem instead of you know, even if you're doing in financial analysis, right? Like maybe write some code for that."
你刚才问的是不是超级助手的两端。在我看来,编码就是任何智能体(包括 ChatGPT)的核心能力。所以我们真正在构建的是这个能力。但最酷的地方在于,智能体写的代码是可以被 import 的,代码是可组合的、可互操作的。一种非常简化的智能体观是:给它一台电脑,让它点点点去操作。但未来要到达那里,路径很难规划,因为构建智能体的很多问题不是"智能体能不能做到",而是"我们怎么帮智能体理解它所在的上下文"——使用它的团队有自己的做事方式,有规范指南,可能需要对智能体能做什么、不能做什么有确定性的保证。
So basically like you know, you were like, hey, is this like the two ends of of this product for the super assistant, right? Of ChatGPT. In my mind like just coding is a core competency of any agent including ChatGPT. And so like what we really think we're building is like that competency. But so here's here's like the really cool thing about agents writing code is that you can import code. Right? Code is like composable. Interoperable. Right? Cuz if we — you know, one very reductive view we could have for an agent is it's just going to be given a computer and it's just going to like point and click and go around. But you know, that is the future and then how we get there is difficult to sort of chart a path because a lot of the questions around building agents aren't like can the agent do it, but it's more about well, how can we help the agent understand the context that it's working in and like the team that's using it, you know, probably has a way that they like to do things. They have guidelines. They probably want certain deterministic guarantees about what the agent can or cannot do.
或者他们需要确保智能体理解某些细节。比如,如果我们在看一个崩溃报告工具,通过连接器接入,每个子团队可能都有不同的元提示(meta prompt)来分析崩溃。所以我们到了这样一个阶段:我们有一个坐在电脑前的智能体,但我们需要让它对团队或用户来说是可配置的。智能体经常做的事情,我们可能就想内化为它的一项内置能力。所以我觉得最终会到达你说的那种通用化的状态——一个能为任何需求自己编写脚本的智能体。但关键是,我们能不能把智能体经常做的事或做得好的事记住并存储下来,这样它就不需要再为同一件事重新写脚本?如果我刚加入一个团队,你已经在团队里了,我可以直接使用智能体之前已经写好的那些脚本。
Or they want to know that the agent understands sort of this detail. Like an example would be, you know, if we're looking at a crash reporting tool hitting a connector for it, every sub-team probably has a different meta prompt for like how they want the crashes to be analyzed. Right? And so we start to get to this thing where like, yeah, we have this agent sitting in front of a computer, but we need to make that configurable for the team or for the user, right? And let them like — stuff that the agent does often, we probably just want to like build in as a competency that this agent has that it can do. So I think we end up with this generalizable thing that you were saying of like an agent that can just write its own scripts for whatever it wants to do. But I think that the really key part here is can we make it so that everything that the agent has to do often or that it does well, we can just like remember and store so that the agent doesn't have to write a script for that again, right? Or maybe like if I just joined a team and you are already on the same team as me, I can just like use all those scripts that the agents have written already.
**Lenny Rachitsky:** 对,就像这个队友可以分享它从跟公司其他人合作中学到的东西。作为一个比喻完全说得通。感觉你是 Karpathy 那一派的——觉得今天的智能体不太行,大部分输出质量堪忧(slop),但未来会变得很厉害。你认同这个看法吗?
**Lenny Rachitsky:** Yeah, it's like if this is our teammate, they can share things that it's learned from working with other people at the company. Just makes sense as a metaphor. It feels like you're in the Karpathy camp of agents today are not that great and mostly slop and maybe in the future they'll be awesome. Does that resonate?
**Alexander Imbiricos:** 差不多——我觉得编码智能体已经相当不错了。没错。但编码之外的智能体还是非常早期的。这只是我个人的看法,但我觉得一旦这些智能体能以可组合的方式使用代码,它们会好非常多。
**Alexander Imbiricos:** I think so — I think coding agents are pretty great. Uh that's right. Yeah. And then I think like agents outside of coding, it's still like very early. And you know, this is just my opinion, but I think they're going to get a whole lot better once they can use coding and like in a composable way.
**Lenny Rachitsky:** 这就是为工程师做产品的有趣之处。在我之前的经历中,我们也有很长时间是为软件工程师做产品的,他们是特别有趣的用户群体。他们自己也喜欢做东西,而且在想象技术用途方面往往比我们还有创意。所以为工程师做产品,你能观察到大量涌现的行为和洞察,然后内化到产品中。
**Lenny Rachitsky:** This is kind of the fun part of like when you're building for software engineers. Like in my setup we were building for software engineers too for a lot of that journey and they're just such a fun audience to build for because you know, they also like building for themselves and are often like even more creative than we are in thinking about how to use the technology. And so like by building for software engineers, you get to just observe a ton of emergent behaviors and like things that you should do and build into the product.
**Alexander Imbiricos:** 我喜欢你这么说,因为很多为工程师做产品的人其实很头疼,因为工程师总在抱怨——"这个太烂了,你为什么这么设计?"
**Alexander Imbiricos:** I love how you say that cuz a lot of people building for engineers get really annoyed cuz the engineers are just always complaining about stuff. They're like, ah, this sucks. Why did you build it this way?
**Lenny Rachitsky:** 我觉得你能享受其中,大概是因为你们做的工具对工程师来说确实太好了,真的能解决问题、帮他们写代码。说到这个方向——你知道总有人讨论工作会怎样、编程的未来、要不要学编程。很显然你描述的方式是它是一个队友,跟你一起工作让你变得更强大,而不是取代你。你怎么看这个超级智能的工程队友对整个工程领域的影响?
**Lenny Rachitsky:** I love that you enjoy it, but I think it's probably because you're building such an amazing tool for engineers that can actually solve problems and just you know, code for them. Um kind of along those lines, you know, there's always this talk of what will happen to jobs, coding, do you have to learn coding, all these things? Clearly the way you're describing it is it's a teammate. It's going to work with you making more superhuman. It's not going to replace you. What's the way you just think about the impact on the field of engineering having this super intelligent engineering teammate?
**Alexander Imbiricos:** 我觉得这有两个面。我们刚讨论的一个面是,也许每个智能体都应该使用代码、本质上是一个编码智能体。在我看来,这只是一个更大理念的一小部分:随着代码变得越来越普及——你甚至可以说在 AI 之前它就已经无处不在了——随着代码进一步普及,它会被用于更多的目的。所以对具备这项能力的人才的需求只会更大。这是我的观点。这是一个非常复杂的话题,我们讨论了很多,需要看看实际会怎么发展。
**Alexander Imbiricos:** I think there's there's two sides to it, but the one we were just talking about is this idea that maybe every agent should actually use code and be a coding agent. And in my mind that's just like a small part of this like broader idea that like, hey, as we make code even more ubiquitous. I mean, you could probably claim it's ubiquitous today even pre-AI, right? And as you make code even more ubiquitous, it's actually just going to be used for many more purposes. And so there's just going to be a ton more need for people with this like humans with this competency. So that's my view. I think this is like quite a complex topic. So you know, it's something we talk about a lot and we have to kind of see how it pans out.
但我觉得作为在这个领域做产品的团队,我们能做的就是始终思考:我们如何把工具做成让人感到被最大程度加速的样子?而不是做一个让人更困惑——不知道自己该干什么的工具。举个现在的例子——当你和编码智能体合作时,它写了大量代码。但对很多软件工程师来说,写代码其实是最有趣的部分之一。结果你变成了在审查 AI 写的代码,而审查代码对很多工程师来说是不那么有趣的工作。我觉得这种情况在无数微决策中不断出现,所以我们作为产品团队一直在想:怎么让这个过程更有趣?怎么让你感到更被赋能?哪里不好用?我认为审查智能体写的代码就是目前不太好玩的一个环节。那我们能做什么?
But I think what we can do basically as a product team building in the space is just try to always think about how are we building a tool so that it feels like we're like maximally accelerating people. You know, rather than building a tool that makes it like more unclear what you should do as the human. Right? Like I think like to give an example right now — nowadays when you work with a coding agent, it writes a ton of code, but it turns out writing code is actually one of the most fun parts of software engineering for many software engineers. And so then you end up reviewing AI code. Right? And that's often a less fun part of the job for many software engineers, right? And so I actually think like we see that this comes out plays out all the time in like a ton of micro decisions and so we as a product team are always thinking about like, okay, how do we make this more fun? How do we make you feel more empowered? Where is this not working? And I would argue that like reviewing agent written code is like a place that today is like less fun and so you know, then I think, okay, what can we do about that?
我们可以发布一个代码审查功能,帮你对 AI 写的代码建立信心。另外我们可以让智能体更好地自己验证工作。甚至细到微观层面的决策——如果你有一个智能体验证工作的能力,比如在 Codex 网页版里,有一个面板展示智能体的工作成果,你最先看到的是什么?是代码 diff,还是代码生成的图像预览?如果你从"如何赋能人类、让他们感觉最大程度被加速"的角度思考,你显然应该先看图像。在 AI 审查通过之后才需要你看代码。
Well, we can ship a code review feature that like helps you confidence the AI written code. Okay, cool. You know, another thing we could do is we can make it so that the agent's like better able to validate its work. And you know, it gets all the way down into like micro decisions. Like if you're going to have the an agent capability to validate work and let's say you have like — I'm thinking of Codex web right now. Like you have a pane that sort of reflects the work the agent did. What do you see first? Do you see the diff or do you see the image preview of the code it wrote? Right? And you know, I think if you're thinking about this from perspective like, how do I empower the human? How do I make them feel like as accelerated as possible? Like you obviously see the image first, right? You shouldn't be reviewing the code unless first you know, you've seen the image unless it's maybe it's been like reviewed by an AI and now it's time for you to take a look.
**Lenny Rachitsky:** 我之前采访 Cursor 的 CEO Michael Turalda 时,他有一个愿景是我们会走向代码之上的某种东西。我也看到了规格驱动开发(spec driven development)的兴起——你写规格,然后 AI 帮你写代码,你在更高的抽象层上工作。你觉得这是未来的方向吗?工程师不再需要实际写代码或看代码,而是专注于更高层的抽象?
**Lenny Rachitsky:** When I had Michael Turalda, CEO of Cursor on the podcast, he had this kind of vision of us moving to something beyond code. And I've seen this rise of something called spec driven development where you kind of just write the spec and then the code, you know, the AI writes the code for you and so you kind of start working at this higher abstraction level. Is that something you see where we're going? Just like engineers not having to actually write code or look at code and there's going to be this higher level of abstraction that we focus on.
**Alexander Imbiricos:** 我觉得抽象层级在不断提升,而且这种趋势今天就已经在发生了。现在的编码智能体大多是 prompt 到 patch(补丁)。我们开始看到有人做规格驱动或计划驱动的开发。这也是人们问"怎么让 Codex 跑一个很长的任务"时的常见做法——先跟它一起写一个 plan.md,一个 markdown 的计划文件。你觉得满意之后,再让它去执行。如果那个计划有可验证的步骤,它就能工作更长的时间。
**Alexander Imbiricos:** Yeah, I mean, I think I think there's like constantly these levels of abstraction and they're actually already played out today. Right? Like today like coding agents mostly it's like prompt to patch. Right? We're starting to see people doing like spec driven development or like plan driven development. That's actually one of the ways people ask like, hey, how do you run Codex on a really long task? Well, it's like often collaborate with it first to write like a plan.md like a markdown file that's your plan. And once you're happy with that, then you ask it to go off and do work and if that plan has verifiable steps, it'll like work for much longer.
所以我们完全看到了这个趋势。规格驱动开发是一个有趣的想法,但我不确定最终会走那条路,因为很多人也不喜欢写规格。不过有些人确实会以那种方式工作,这是可能的。但有个半开玩笑的想法是——很多团队现在的工作方式其实没有规格,但团队自驱力很强,事情就是会被做完。这几乎就是——我现在临时想到的,名字不太好——但就是"闲聊驱动开发"(chatter driven development)。就是事情在社交媒体上、在团队沟通工具里发生着,然后代码就被写出来并部署了。
Um so we're totally seeing that. I think spec driven development is like an interesting idea. It's not clear to me that it'll work out that way cuz a lot of people don't like writing specs either. But it seems plausible that some people will work that way. You know, like a bit of a joke idea though is like if you think of the way that many teams work today, they often like don't necessarily have specs, but the team is just really self-driven and so stuff just gets done and so almost that is like — I'm coming up with this on the spot so it's not a good name, but like chatter driven development. Where it's just like stuff is happening, you know, on social media and like in your team communications tools and then as a result like code gets written and deployed.
所以我更倾向于这个方向。我甚至不一定想写规格——只有我喜欢写规格的时候才写。其他时候我可能只想说"嘿,这是客服频道,告诉我有什么值得关注的,如果是个小 bug 就直接修了,不需要我为此写一份规格。"
Right? So yeah, I think I'm a little bit more oriented in that way of you know, I don't even necessarily want to have to write a spec. Like sometimes I want to only if I like writing specs. Right? Other times I might just want to say like, hey, here's like the customer service channel and like tell me what's interesting to know, but if it's a small bug, just fix it. I don't want to have to write a spec for that.
我有一个假想的未来,有时候会作为挑衅性的话题跟人分享:在一个拥有真正出色的智能体的世界里,做一个独立创业者是什么样的?一个糟糕但有趣的想法是:它其实是一个手机 App。智能体有的每个想法都以竖屏短视频的形式出现在你手机上。你觉得是坏主意就左划,觉得好就右划,想给反馈可以长按说话再划。在这个世界里,你的工作基本上就是把这个 App 接入所有信号系统和记录系统,然后坐在那里划就行了。
I have this sort of hypothetical future that I like to share with sometimes with people as a provocation which is like in a world where we have like truly amazing agents, like what does it look like to be a solopreneur? And one terrible idea for how it could look is that it's actually is a mobile app. And every idea that the agent has to do is just like vertical video on your phone. And then you can like swipe left if you think it's a bad idea and you can like swipe right if it's a good idea and like you can press and hold and like speak to your phone if you want to give feedback on the idea before you swipe. You know, and in this world like basically what your job is just like plug in this app into like every single like signal system, you know, system of record and then you just sort of sit back and like swipe.
**Lenny Rachitsky:** 我觉得这个太好了。这就像 Tinder 加 TikTok 加 Codex。挺离谱的。不,这真的很棒。所以这个想法是——这个智能体在监听和观察你,关注市场和用户动态,然后它说"我觉得应该做这个"。就像一个主动的工程师,"来,我们应该做这个功能"或者"修这个东西"。
**Lenny Rachitsky:** I don't know. I love this. So this is like Tinder meets TikTok meets Codex. It's pretty terrible. No, this is great. So the idea here is this thing is this agent is watching and listening to you, paying attention to the market, your users and it's like, well, I hear something I should do. It's like a proactive engineer. Just like, here, we should build this feature or fix this thing.
**Alexander Imbiricos:** 没错。我觉得这是一个很好的想法。用最低门槛的方式跟你沟通。
**Alexander Imbiricos:** Exactly. I think it's a really good idea. Communicating with you in like the lowest like effort way.
**Lenny Rachitsky:** 对,左划右划加竖屏信息流,然后是 Sora 视频。好的,我终于看到这些是怎么串在一起的了。
**Lenny Rachitsky:** Yeah, swipe left or right and in vertical feed. And then the Sora video. Okay, I see how this all connects now.
**Alexander Imbiricos:** 说清楚,我们没有在做这个东西,但这是个有趣的想法。不过你现在可以看到,在这个例子中它正在消费外部信号。
**Alexander Imbiricos:** Yeah, to be clear, we're not building that, but like, you know, it's a fun idea. I mean, you can see now like in this example though, like one of the things that it's doing is it's consuming external signals, right?
我觉得另一个很有趣的事情是——如果我们想想迄今为止最成功的 AI 产品是什么?我会说——其实有点容易搞混——OpenAI 第一次使用 Codex 这个品牌,其实是给 GitHub Copilot 背后的模型起的名字,那是好几年前的事了。我们最近决定复用这个品牌,因为实在太好了——Codex,代码执行。但我觉得 IDE 里的自动补全才是目前最成功的 AI 产品之一。它的神奇之处在于,当它能快速给你提供建议时,如果对了你就被加速了,如果错了也不会太烦人。它有时确实烦人,但大部分时候还好。所以你可以构建这样一种混合主动性系统(mixed initiative system),能根据上下文响应你正在做的事情。
I think the other really interesting thing is like if we think about like what is the most successful like AI product to date? I would argue — funny actually not to confuse things at all, but like with the first time we used the brand Codex at OpenAI was actually the model powering GitHub Copilot. This is like way back in the day years ago. And we decided to reuse that brand recently cuz it's just so good. You know, Codex, code execution. But I think actually like auto completion in IDEs is like one of the most successful AI products today. And part of what's so magical about it is that when it can surface like ideas for helping you really rapidly. When it's right, you're accelerated. When it's wrong, it's not like that annoying. It can be annoying, but it's not that annoying. Right? And so you can create this like mixed initiative system that's like contextually responding to like what you're attempting to do.
在我看来,这对于我们 OpenAI 做产品来说非常有启发。比如当我想到发布一个浏览器——就像我们做的 Atlas——在我看来,一个很有意思的事情是我们可以在你日常浏览的过程中,根据上下文主动呈现帮助你的方式。这样我们就突破了"只看代码"或"只在终端里"的限制,进入了"一个真正的队友要处理的远不止代码"这个理念。队友还要处理大量的网页内容,那我们怎么在这方面帮你?
And so in my mind, this is like a really interesting thing for us as OpenAI as we're building. So for instance, you know, when I think about launching a browser, which we did with Atlas, right? Like in my mind, one of the really interesting things we can then do is we can then like contextually surface like ways that we can help you as you're going about your day. Right? And so we break out of this like you know, we're just looking at code or we're just in your terminal into this idea that like, "Hey, like a real teammate is dealing with a lot more than just code, right? They're dealing with a lot of things that are web content." So like, you know, how can we help you with that?
**Lenny Rachitsky:** 天哪,信息量太大了。我很喜欢。好的,浏览器中的自动补全——在你浏览和日常工作中呈现所有能帮到你的东西——这真的很有意思。我想聊聊 Atlas,我们待会再回来。Codex 代表"代码执行",我之前不知道,这个命名真的很聪明,现在懂了。
**Lenny Rachitsky:** Man, there's so much there. I love this. Okay, so auto complete on web with the browser, that's so interesting. Just like here's all the things that we can help you with as you're browsing and going about your day. I want to talk about Atlas. I'll come back to that. Codex, code execution, did not know that. That's really clever. I get it now.
好,然后这个"闲聊驱动开发"——我想到了之前采访过 Block 的 CTO John Gargiulo,他们有一个叫 Goose 的内部智能体产品。他提到 Block 的一个工程师让 Goose 一直看着他的屏幕、听他的每个会议,然后主动做他可能想做的事——提交 PR、发邮件、起草 Slack 消息。所以他正在做的就是你描述的那种事情,只不过是非常早期的版本。
Okay, and then this chatter driven development — I had a — no, this is a really good idea. But it reminds me I had John Gargiulo on the podcast, CTO of Block, and they have this product called Goose, which is their own internal agent thing. And he talked about an engineer at Block just has Goose watch him with like his screen and listens to every meeting and proactively does work that he should probably want to do. So ships a PR, sends an email, drafts a Slack message. So he's doing exactly what you're describing in kind of a very early way.
**Alexander Imbiricos:** 很有意思。我打赌如果我们去问他们,这种生产力的瓶颈是什么——他们有说吗?
**Alexander Imbiricos:** Yeah, that's super interesting. And you know, I bet you if we went and asked them what the bottleneck to that productivity is — did they share what it is?
**Lenny Rachitsky:** 大概是需要检查确认这些做的事情是不是对的。
**Lenny Rachitsky:** Uh probably looking at it and just making sure this is the right thing to do.
**Alexander Imbiricos:** 对。我们现在就看到了这个。我们有一个 Codex 的 Slack 集成。大家很喜欢,如果有什么需要快速处理的事情,直接在 Slack 里 @Codex——"你觉得这个 bug 是什么原因?"不一定要是工程师。甚至数据科学家也在大量用 Codex 回答问题——"你觉得这个指标为什么变了?发生了什么?"问题类的需求,你在 Slack 里就能直接得到答案,太好用了。但当它写代码的时候,你还是得回去看代码。所以我觉得现在真正的瓶颈是验证代码是否正确以及做代码审查。
**Alexander Imbiricos:** Yeah. So like we see this now. Like we have a Slack integration for Codex. People love, you know, if there's like something that you need to do quickly, people will just like at mention Codex. Like, "Why do you think this bug is happening?" Right? Doesn't have to be an engineer. Even like maybe, you know, data scientists often in here are using Codex a ton to just like answer questions. Like, "Why do you think this metric moved? What happened?" So questions, you know, you get the answer right back in Slack, it's amazing, super useful. But when it's writing code, then you have to go back and look at the code. Right? And so the real like I think bottleneck right now is like validating that the code worked and like writing code review.
所以在我看来,如果我们想达到你刚才说的那位朋友的那种状态,我们真的需要搞清楚怎么让人们配置编码智能体,使其在后期阶段更加自主。
So in my mind, if we wanted to get to something like you know, the friend you were talking about's world, I think we really need to figure out how to get people to configure their coding agents to be much more autonomous on those later stages of the work.
**Lenny Rachitsky:** 说得通。就像你说的,写代码——我当了 10 年工程师。写代码很有趣,进入心流、构建、设计架构、测试,都很爽。但看别人的代码,还得为生产环境的稳定性负责,如果代码做了什么蠢事导致系统崩溃——那就不太有趣了。现在构建变容易了,我从走在前沿的公司那里一直听到的是,瓶颈现在在两头:前面是搞清楚该做什么,后面是——好了,这 100 个 PR 谁来审?
**Lenny Rachitsky:** It makes sense. Like you said, writing code — I used to be an engineer, I was an engineer for 10 years. Really fun to write code, really fun to just get in the flow, build, architect, test. Not so fun to look at everyone else's code and just have to go through and be on the hook if it is doing something dumb that's going to take down production. And now that building has become easier, what I've always heard from companies that are really at the cutting edge of this is the bottleneck is now like figuring out what to build and then it's at the end of like — okay, we have all this — all 100 PRs to review. Who's going to go through all that?
**Alexander Imbiricos:** 对,没错。
**Alexander Imbiricos:** Right. Yeah.
**Lenny Rachitsky:** Codex 对你作为产品经理的工作方式有什么影响?工程这边的影响很明显——代码帮你写了。它对你作为 PM 的工作方式和 OpenAI 的 PM 们有什么改变?
**Lenny Rachitsky:** What is the impact of Codex been on the way you operate as a product person as a PM? It's clear how engineering is impacted — code is written for you. What has it done to the way you operate in the way PMs operate at OpenAI?
**Alexander Imbiricos:** 我觉得主要是感觉被赋能了很多。我一直是偏技术的 PM。尤其是做面向工程师的产品时,我觉得 dogfooding 是必须的。但除此之外,我就是觉得作为 PM 能做的事情多太多了。Scott Belsky 提过一个概念叫"压缩人才栈"(compressing the talent stack),不确定我说得对不对。但基本意思是:也许这些角色之间的边界不再像以前那么必要了,因为每个人都能做更多事。每当一个人能做更多,就能跳过一个沟通环节,让团队效率更高。
**Alexander Imbiricos:** Yeah, I mean I think mostly I just feel like much more empowered. I've always been sort of more technical leaning PM. And especially when I'm working on products for engineers, I feel like it's necessary to like, you know, dogfood the product. But even beyond that, I just feel like I can do much much more as a PM. And you know, Scott Belsky talks about this idea of like compressing the talent stack. I'm not sure if I phrased that right. But it's basically this idea that like maybe the boundaries between these roles are a little bit like less needed than before because people can just do much more. And every time someone can do more, you can like skip one communication boundary and make the team like that much more efficient.
我们在很多职能上都看到了这一点。你问的是产品方面——回答问题现在容易多了,直接问 Codex 就行。很多 PM 类型的工作——了解什么在变化——直接让 Codex 帮忙。做原型通常比写规格还快,这是很多人都在说的事。
Right? So I think we see it you know, in a bunch of functions now, but I guess since you asked about like product specifically, you know, now like answering questions much much easier. You know, just ask Codex for thoughts on that. A lot of like PM type work, understanding what's changing, again, just ask Codex for help with that. Prototyping is often faster than writing specs. This is something that a lot of people have talked about.
我觉得有一点稍微让人意外的是——我们做 Codex 主要是为了写要部署到生产环境的代码。但实际上我们看到现在有大量一次性代码是用 Codex 写的。回到代码无处不在这个理念。你会看到有人想做一个分析——比如我想搞清楚什么东西,就说"给 Codex 一堆数据,让它做一个交互式的数据查看器"。这以前太麻烦了,但现在完全值得让智能体去做。
I think something that I don't think it's super surprising, but something that's slightly surprising is like we see like we're mostly building Codex for to write code that's going to be deployed to production. But actually we see a lot of throwaway code written with Codex now. It's kind of going back to this idea of like, you know, ubiquitous code. So you'll see you know, someone wants to do an analysis. Like if I want to understand something, it's like, "Okay, just give Codex a bunch of data, but then ask it to build like an interactive like data viewer for this data." Right? That's just like too annoying to do in the past, but now it's just like totally worth the time of just getting an agent to go do something.
类似的,我在设计团队也看到了很酷的例子。一个设计师想做一个动画——就是 Codex 里的硬币动画。通常要编程做这个动画太费劲了,他们就 vibe code 了一个动画编辑器。然后用这个编辑器做出了动画,提交到了代码仓库。实际上我们的设计师在这方面获得了巨大的加速。说到"压缩人才栈",我们的设计师非常像 PM。他们做大量的产品工作,实际上还有一整个 vibe code 的 Codex App 侧边原型。我们很多讨论的方式是快速碰一下——因为同时在推进一万件事——然后设计师去想该怎么做,但不是再开一次会来讨论,而是直接 vibe code 一个原型。
**Lenny Rachitsky:** 好的,浏览器里的网页自动补全,这太有意思了。就是在你浏览和日常工作时呈现所有可以帮到你的东西。我想聊聊 Atlas,待会再回来。Codex 代表代码执行,我之前不知道,真的很聪明,现在懂了。好,然后这个闲聊驱动开发是什么?我想到了——不,这确实是个好主意。但让我想起之前采访 Block 的 CTO John Grodon Ji,他们有一个叫 Goose 的内部智能体产品。他提到 Block 的一个工程师让 Goose 一直看着他的屏幕、听他的每个会议,然后主动做他可能想做的事——提交 PR、发邮件、起草 Slack 消息。所以他正在做你描述的那种事情,只是非常早期的版本。
Similarly, I've seen like some pretty cool prototypes on our design team about like if you want to — well, like a designer basically wanted to build an animation. And this is the coin animation in Codex. And it was like normally it'd be too annoying to program this animation, so they just vibe coded a animation editor. And then they use the animation editor to build the animation, which they then checked into the repo. Actually, our designers — there's a ton of acceleration there and like speaking of compressing the talent stack, I think our designers are very PM-y. So, you know, they do a ton of product work and like they actually have like an entire like vibe coded sort of side prototype of the Codex app. And so a lot of how we talk about things is like we'll have like a really quick jam cuz there's like 10,000 things going on. And then the designer will like go think about how this should work, but instead of like talking about it again, they'll just like vibe code a prototype of that.
**Lenny Rachitsky:** Okay, so auto complete on web with the browser, that's so interesting. Just like here's all the things that we can help you with as you're browsing and going about your day. I want to talk about Atlas. I'll come back to that. Codex, code execution, did not know that. That's really clever. I get it now. Okay, and then this chatter — what is a chatter driven development? I had a — no, this is a really good idea. But it reminds me, I had John Grodon Ji on the podcast, CTO of Block, and they have this product called Goose, which is their own internal agent thing. And he talked about an engineer at Block just has Goose watch him with like his screen and listens to every meeting and proactively does work that he should probably want to do. So ships a PR, sends an email, drafts a Slack message. So he's doing exactly what you're describing in kind of a very early way.
**Alexander Imbiricos:** 很有意思。我打赌如果我们去问他们这种生产力的瓶颈是什么——他们说了吗?
**Alexander Imbiricos:** Yeah, that's super interesting. And you know, I bet you — so if we went and asked them what the bottleneck to that productivity is, did they share what it is?
**Lenny Rachitsky:** 大概是需要检查确认做的事情是不是对的。
**Lenny Rachitsky:** Uh, probably looking at it and just making sure this is the right thing to do.
**Alexander Imbiricos:** 对。我们现在就看到了这个问题。比如我们有一个 Codex 的 Slack 集成,大家很喜欢。如果有什么需要快速处理的事情,直接在 Slack 里 @Codex 就行——"你觉得这个 bug 是什么原因?"不一定要是工程师。甚至数据科学家也在大量用 Codex 回答问题——"你觉得这个指标为什么变了?发生了什么?"
**Alexander Imbiricos:** Yeah. So like, we see this now. Like we have a Slack integration for Codex. People love — you know, if there's like something that you need to do quickly, people will just like at-mention Codex. Like, "Why do you think this bug is happening?" Right? Doesn't have to be an engineer. Even like maybe, you know, data scientists often in here are using Codex a ton to just like answer questions. Like, "Why do you think this metric moved? What happened?"
问题类的需求,你在 Slack 里就能直接得到答案,太好用了。但当它写代码的时候,你还是得回去看代码。所以真正的瓶颈是验证代码是否正确以及做代码审查。
So questions, you know, you get the answer right back in Slack, it's amazing, super useful. But when it's writing code, then you have to go back and look at the code. Right? And so the real bottleneck right now is like validating that the code worked and like writing code review.
所以在我看来,要达到你说的那位朋友的那种状态,我们真的需要搞清楚怎么让人们把编码智能体配置得在后期阶段更加自主。
So in my mind, if we wanted to get to something like you know, the friend you were talking about's world, I think we really need to figure out how to get people to configure their coding agents to be much more autonomous on those later stages of the work.
**Lenny Rachitsky:** 说得通。就像你说的,写代码——我曾经当了 10 年工程师。写代码很有趣,进入心流、构建、设计架构、测试,都很爽。但看别人的代码,还得为生产环境的稳定性负责,如果代码搞了什么蠢事导致系统崩溃——那就不太有趣了。现在构建变得更容易了,我从很多走在前沿的公司那里听到的是,瓶颈现在转移到了两头:前面是搞清楚该做什么,后面是——好了,这 100 个 PR 谁来审?
**Lenny Rachitsky:** It makes sense. Like you said, writing code — I used to be an engineer, I was an engineer for 10 years. Really fun to write code, really fun to just get in the flow, build, architect, test. Not so fun to look at everyone else's code and just have to go through and be on the hook if it is doing something dumb that's going to take down production. And now that building has become easier, what I've always heard from companies that are really at the cutting edge of this is the bottleneck is now like figuring out what to build and then it's at the end of like — okay, we have all 100 PRs to review. Who's going to go through all that?
**Alexander Imbiricos:** 没错。
**Alexander Imbiricos:** Right. Yeah.
**Lenny Rachitsky:** Codex 对你作为产品经理的工作方式有什么影响?工程师这边的影响很明显——代码帮你写了。它对你作为 PM 的工作方式和 OpenAI 的 PM 们有什么改变?
**Lenny Rachitsky:** What is the impact of Codex been on the way you operate as a product person, as a PM? It's clear how engineering is impacted — code is written for you. What has it done to the way you operate and the way PMs operate at OpenAI?
**Alexander Imbiricos:** 我觉得主要是感觉被赋能了很多。我一直是偏技术的 PM,尤其是做面向工程师的产品时,我觉得 dogfooding 是必须的。但除此之外,我就是觉得作为 PM 能做的事情多太多了。
**Alexander Imbiricos:** Yeah, I mean I think mostly I just feel like much more empowered. I've always been sort of more technical leaning PM. And especially when I'm working on products for engineers, I feel like it's necessary to like dogfood the product. But even beyond that, I just feel like I can do much much more as a PM.
Scott Belsky 提过一个概念叫"压缩人才栈"(compressing the talent stack),不确定我用词是否准确,但基本意思是:也许这些角色之间的边界不再像以前那么必要了,因为每个人都能做更多事。每当一个人能做更多的事,就能跳过一个沟通环节,让团队效率更高。
And Scott Belsky talks about this idea of like compressing the talent stack. I'm not sure if I'm phrasing that right. But it's basically this idea that like maybe the boundaries between these roles are a little bit less needed than before because people can just do much more. And every time someone can do more, you can like skip one communication boundary and make the team that much more efficient.
我们在很多职能上都看到了这一点。你问的是产品方面——现在回答问题容易多了,直接问 Codex 就行。很多 PM 的工作——了解什么在变化——直接让 Codex 帮忙。做原型通常比写规格还快,这一点很多人都谈过了。
So I think we see it in a bunch of functions now, but I guess since you asked about product specifically — now like answering questions is much much easier. You know, just ask Codex for thoughts on that. A lot of like PM type work, understanding what's changing, again, just ask Codex for help with that. Prototyping is often faster than writing specs. This is something that a lot of people have talked about.
一个稍微出乎意料的现象是,我们做 Codex 主要是为了写会部署到生产环境的代码,但实际上现在有大量的"一次性代码"(throwaway code)是用 Codex 写的。回到代码无处不在的理念。
Something that's slightly surprising is like we see — we're mostly building Codex to write code that's going to be deployed to production. But actually we see a lot of throwaway code written with Codex now. It's kind of going back to this idea of like, you know, ubiquitous code.
比如有人想做一个分析。如果我想搞懂什么东西,可以说"给 Codex 一堆数据,让它做一个交互式的数据查看器"。这以前太麻烦了不值得做,但现在完全值得花时间让智能体去做。
So you'll see someone wants to do an analysis. Like if I want to understand something, it's like, "Okay, just give Codex a bunch of data, but then ask it to build like an interactive data viewer for this data." Right? That's just like too annoying to do in the past, but now it's just like totally worth the time of just getting an agent to go do something.
类似的,我在设计团队也看到了很酷的例子。一个设计师想做一个动画——就是 Codex 里的硬币动画。通常要编程做这个动画太费劲了,他们就用 vibe coding 做了一个动画编辑器,然后用这个编辑器做出了动画,最后把动画提交到了代码仓库。
Similarly, I've seen like some pretty cool prototypes on our design team about — like a designer basically wanted to build an animation. And this is the coin animation in Codex. And it was like normally it'd be too annoying to program this animation, so they just vibe coded an animation editor. And then they used the animation editor to build the animation, which they then checked into the repo.
事实上,我们的设计师在这方面获得了巨大的加速。说到"压缩人才栈",我们的设计师非常像 PM,他们做大量的产品工作,实际上他们有一整个 vibe coded 的 Codex App 侧边原型。我们很多讨论的方式是快速碰一下——因为同时在推进一万件事——然后设计师去想这个东西该怎么做,但不是再开一次会来讨论,而是直接 vibe code 一个原型。我们试用一下,如果喜欢,他们就把原型做成真正的 PR 来合入。根据他们对代码库的熟悉程度——Codex 的 UI 是用 Rust 写的,这比较难——也许他们自己合入,也许做到差不多然后让工程师帮忙合入 PR。
Actually, our designers — there's a ton of acceleration there and like speaking of compressing the talent stack, I think our designers are very PM-y. So they do a ton of product work and they actually have like an entire vibe coded sort of side prototype of the Codex app. And so a lot of how we talk about things is like we'll have a really quick jam because there's like 10,000 things going on. And then the designer will like go think about how this should work, but instead of like talking about it again, they'll just like vibe code a prototype of that in their standalone prototype. We'll play with it. If we like it, they'll vibe engineer that prototype into an actual PR to land. And then depending on their comfort with the code base — like Codex UI is in Rust, that's a little harder — maybe they'll land it themselves or they'll like get close and then an engineer can help them land the PR.
我们最近发布了 Sora Android App。这是最令人震撼的加速案例之一。OpenAI 内部使用 Codex 的频率本来就非常高,但这一年来一直在增长——现在基本上所有技术员工都在用——而且使用编码智能体的深度和技巧也大幅提升了。Sora Android App——一个全新的 App——我们用 18 天从零做到了员工内测版。然后又过了 10 天,也就是总共 28 天,我们就向公众正式发布了。这都是在 Codex 的帮助下完成的。
We recently shipped the Sora Android app. And that was one of the most sort of mind-blowing examples of acceleration actually, because the usage of Codex internally at OpenAI is obviously really really high, but it's been growing over the course of the year both in terms of — now it's basically like all technical staff use it — but even like the intensity and know-how of how to make the most of coding agents has gone up by a ton. And so the Sora Android app, right? Like a fully new app, we built it in 18 days. It went from like zero to launch to employees. And then 10 days later, so 28 days total, we went to just like GA to the public. And that was done just like with the help of Codex.
速度非常惊人。我不想说这是简单模式,但 Codex 有一个特别擅长的事情:如果你是一家在多个平台上构建软件的公司,你已经搞定了底层 API 和系统,让 Codex 帮你移植到另一个平台上效果特别好,因为它有东西可以参考。那个团队的工程师基本上就是让 Codex 看 iOS App 的代码,生成工作计划,然后执行实现。它同时看 iOS 和 Android 的代码。所以大致是 2 周内测,总共 4 周上线。
So pretty insane velocity. I would say it was a little bit — I don't want to say easy mode, but there is one thing that Codex is really good at if you're a company that's like building software on multiple platforms. So you've already figured out like some of the underlying APIs or systems. Asking Codex to port things over is really effective because it has like something it can go look at. And so the engineers on that team were basically having Codex go look at the iOS app, produce plans of work that needed to be done, and then go implement those. And it was kind of looking at iOS and Android at the same time. And so, you know, basically it was like 2 weeks to launch to employees, 4 weeks total.
**Lenny Rachitsky:** 快得不可思议。更不可思议的是它成了 App Store 排名第一的 App。这简直不可想象。好的——
**Lenny Rachitsky:** Insanely fast. What makes that even more insane is it became the number one app in the App Store. I don't — this just boggles the mind. Okay, so —
**Alexander Imbiricos:** 对,想想看,App Store 排名第一的 App,只有几个工程师。我记得大概两三个人,用了几周时间。
**Alexander Imbiricos:** Yeah, so imagine there's the number one app in the App Store with like a handful of engineers. I think it was like two or three, possibly, in a handful of weeks. Yeah.
**Lenny Rachitsky:** 太荒谬了。
**Lenny Rachitsky:** This is absurd.
**Alexander Imbiricos:** 所以这是一个很有趣的加速案例。然后 Atlas 是另一个例子——我记得 Atlas 的工程师 Ben 做过一期播客,分享了一些构建过程。Atlas 其实是一个浏览器,做浏览器非常难,我们不得不构建很多复杂的系统。那个团队现在有大量 Codex 的重度用户。我跟他们聊这件事,因为很多工程师是我之前创业时的同事。他们说以前同样的工作需要两到三个工程师花两到三周,现在是一个工程师一周搞定。
**Alexander Imbiricos:** So yeah, that's a really fun example of acceleration. And then Atlas is the other one — I think Ben did a podcast, the engine on Atlas, sharing a little bit about how we built there. You know, many — Atlas is actually a browser, right? And building a browser is really hard. And so we had to build a lot of difficult systems in order to do that. And basically we got to the point where that team has a ton of power users of Codex right now. And you know, got to the point where they were — we were talking to them about it because a lot of those engineers are people I used to work with before at my startup. And so they'd say, you know, before this would have taken us like 2 to 3 weeks for two to three engineers. And now it's like one engineer, one week.
巨大的加速。另一个很酷的事是,Atlas 先在 Mac 上发布,但现在我们在做 Windows 版。团队正在上手 Windows 开发,同时也在帮我们让 Codex 在 Windows 上做得更好。Windows 这边确实更早期。我们上周发布的模型是第一个原生理解 PowerShell 的 Codex 模型——PowerShell 是 Windows 的原生 shell 语言。
So massive acceleration there as well. And what's quite cool is that we shipped Atlas on Mac first, but now we're working on the Windows version. So the team now is like ramping up on Windows, and they're helping us make Codex better on Windows too. Which is admittedly earlier. Like, just the model we shipped last week is the first model that natively understands PowerShell — the native shell language on Windows.
整个公司被 Codex 加速了,真的很棒。最明显的当然是研究团队——加速模型训练的速度和质量。然后是设计,就像我们聊过的,还有营销。实际上现在我的产品营销人员经常直接在 Slack 里改文案字符串、更新文档。
So yeah, it's been really awesome to see like the whole company getting accelerated by Codex. And you know, most obviously also research and like improving how quickly we train models and how well we do it, and then even like design as we talked about and marketing. Like, actually we're at this point now where my product marketer is often also making string changes just directly from Slack. Or like updating docs directly from Slack.
**Lenny Rachitsky:** 这些例子太棒了。你们活在可能性的最前沿,其他公司未来也会这样工作。App Store 排名第一的 App,全世界都在用的 App——至少火了一整周——你说是 28 天做出来的,18 天就把核心搞定了。
**Lenny Rachitsky:** These are amazing examples. You guys are living at the bleeding edge of what is possible, and this is how other companies are going to work. Just shipping again, what became the number one app in the App Store and just beloved all over the — it just like took over the world for at least a week. Built, you said in 28 days, and like 10 days, 18 days just to get the core of it working.
**Alexander Imbiricos:** 对,18 天我们就有了一个员工在用的版本,10 天后就公开发布了。
**Alexander Imbiricos:** Yeah, so it's like 18 days we had a thing that employees were playing with, and then 10 days later we were out.
**Lenny Rachitsky:** 你说只有几个工程师。两三个。好的。然后 Atlas,你说一周就做出来了?
**Lenny Rachitsky:** And you said just a couple engineers. Yeah. Two or three. Okay. And then Atlas, you said it took a week to build.
**Alexander Imbiricos:** 不不不。Atlas 不是一周做完的,Atlas 是一个非常重量级的项目。我跟 Atlas 的一位工程师聊他们怎么用 Codex 的,他说基本上什么都用 Codex。我问那怎么衡量加速的幅度?他给我的回答是:以前需要两到三个工程师两到三周,现在是一个工程师一周。
**Alexander Imbiricos:** No, no, no. So Atlas is not the whole week, but Atlas was like a really meaty project. I was talking to one of the engineers on Atlas about like, you know, what they use Codex for. And it's basically like, we use Codex for absolutely everything. I was like, okay. Well, like how would you measure the acceleration? And so basically the answer I got back was previously would have taken two to three weeks for two to three engineers, and now it's like one engineer, one week.
**Lenny Rachitsky:** 你觉得这最终会扩展到非工程师吗?一定要是工程师来做这些事吗?PM 或设计师能不能做?
**Lenny Rachitsky:** Do you think this eventually moves to non-engineers doing this sort of thing? Like, does it have to be an engineer building this thing? Could it have been built by a PM or a designer?
**Alexander Imbiricos:** 我觉得我们肯定会到达角色边界变得模糊的阶段。你还是需要有人理解所构建东西的细节,但什么算"细节"会不断演变。就像现在你写 Swift 不需要会写汇编语言一样。世界上有一小撮人会写汇编,他们的存在很重要,但这是一个极专业化的能力,大多数公司不需要。
**Alexander Imbiricos:** I think we will very much get to the point where basically the boundaries are a little bit blurred. Right? Like I think you're going to want someone who understands the details of what they're building, but what details those are will evolve. Kind of like how now if you're writing Swift, you don't have to speak assembly. You know, there's a handful of people in the world, and it's really important that they exist and speak assembly. But that's like a specialized function that like most companies don't need to have.
所以我觉得我们会自然地看到抽象层级不断增加。有意思的是,我们现在进入了自然语言这个抽象层。自然语言本身非常灵活——你可以让工程师讨论计划,讨论规格,或者讨论一个产品或想法。所以我们也可以在这些抽象层级之间自如移动。
So I think we're just going to naturally see like an increase in layers of abstraction. And then the cool thing is now we're entering like the language layer of abstraction, like natural language. And then natural language itself is really flexible. Right? Like, you could have engineers talking about like a plan, and then you could have engineers talking about a spec, and then you could have engineers talking about just you know, a product or an idea. So I think we can also start moving up those layers of abstraction as well.
但这个过程会是渐进的。不会突然一夜之间大家都不写代码了、只写规格。更可能是这样的:我们把编码智能体配置好了,能很好地预览构建结果、跑测试——也许这是大多数人最先配置好的部分。然后是让它能执行构建并看到自己的改动效果。但我们可能还没有做好集成框架,比如在 Atlas 的场景中,让它加载几个示例页面来看效果。我觉得至少在一段时间内,还是需要人类来策划智能体需要对接的这些连接器、系统和组件。在更远的未来,会有一个更大的突破——Codex 告诉你怎么配置,甚至自己在代码仓库里完成配置。
But you know, I do think this is going to be gradual. I don't think it's going to go to like all of a sudden nobody ever writes anything and it's just specs. I think it's going to be much more like, okay, we've set up our coding agent to be really good at like previewing the build or like at running tests. Maybe that's the first part that most people have set up. And it's like, okay, now we've set it up so that it can execute the build and it can see the results of its own changes, but you know, we haven't yet built a good integration harness so that it can — in the case of Atlas — enable it to like load a few sample pages to see how well those work, right? And I think for some time at least we're going to have humans kind of curating which of these connectors or systems or components that the agent needs to be good at talking to. And then in the future there will be an even greater unlock where Codex tells you how to set it up or maybe sets itself up in a repo.
**Lenny Rachitsky:** 活在这个时代真是太疯狂了。我很好奇这种事情的二阶效应——构建速度这么快意味着什么?分发(distribution)会变得更重要吗?创意本身会更值钱吗?很有趣。你怎么看?
**Lenny Rachitsky:** What a wild time to be alive. Wow. I'm curious just the second-order effects of this sort of thing, just how quickly it is to build stuff. What does that do? Does that mean distribution becomes much much more important? Does it mean ideas are just worth a lot more? It's interesting to think about how that changes. I'm curious what you think.
**Alexander Imbiricos:** 我仍然不认为创意像很多人以为的那么值钱。执行仍然很难。你可以快速构建一个东西,但你仍然需要把它执行好,让它整体上说得通、有连贯性。然后分发确实是巨大的。
**Alexander Imbiricos:** I still don't think ideas are worth as much as maybe a lot of people think. I still think execution is really hard, right? Like, you can build something fast, but you still need to execute well on it. Still needs to make sense and be a coherent thing overall. And yeah, distribution is massive.
**Lenny Rachitsky:** 对。感觉现在除了构建以外的一切都变得更重要了——想出创意、推向市场、分发——这些东西。
**Lenny Rachitsky:** Yeah. Just feels like everything else is now more important. Everything that isn't the building piece, which is coming up with the idea, getting to market, distribution — all that kind of stuff.
**Alexander Imbiricos:** 对。我觉得我们可能经历了一个奇怪的过渡阶段——有一段时间,做产品太难了,你基本上只需要擅长做产品就行,可能不太需要对特定客户有深入了解。
**Alexander Imbiricos:** Yeah. I think we might have been in this weird temporary phase where, you know, for a while like you could — it was so hard to build product that you mostly just had to be really good at building product, and it maybe didn't matter if you had an intimate understanding of a specific customer.
但现在我觉得我们到了这样一个阶段:如果我只能选一件事来了解,那就是对某一类客户的问题有真正深刻的理解。如果我只能带一项核心能力进场的话。所以我觉得这最终仍然是最重要的。如果你今天在创业,你对一群目前被 AI 工具服务不好的客户有深刻理解和人脉网络,我觉得你就赢了。反过来,如果你擅长做网站但没有特定的目标客户,你会面临更艰难的局面。
But now I think we're getting to this point where actually, if I could only choose like one thing to understand, it would be like really meaningful understanding of the problems that a certain customer has. Right? If I could only go in with one core competency. So I think that's ultimately still what's going to matter most, right? Like, if you're starting a new company today and you have like a really good understanding and network of customers that are currently underserved by AI tools, I think you're set. Right? Whereas if you're good at building like, you know, websites, but you don't have any specific customer to build for, I think you're in for a much harder time.
**Lenny Rachitsky:** 听起来你看好垂直 AI 创业公司。
**Lenny Rachitsky:** Bullish on vertical AI startups is what I'm hearing.
**Alexander Imbiricos:** 完全同意。有通用的东西能解决很多问题,也有那种"我们要把演示文稿这件事做到极致,我们对演示文稿的理解比任何人都深,我们要接入你的工作流程"以及所有那些对特定问题重要的细节。
**Alexander Imbiricos:** Yeah, I completely agree. There's like the general thing that can solve a lot of problems, and then there's like we're going to solve presentations incredibly well, and we're going to understand the presentation problem better than anyone, and we're going to plug into your workflows and all these other things that matter for a very specific problem.
**Lenny Rachitsky:** 太棒了。当你衡量 Codex 的进展时,我猜你们有大量的 eval,还有各种公开的 benchmark。你看什么指标来判断"我们进展得很好"?我猜不会只有一个,但你关注什么?你在推动什么?有哪一两个 KPI?
**Lenny Rachitsky:** Okay. Incredible. When you think about progress on Codex, I imagine you have a bunch of evals, and there's all these public benchmarks. What's something you look at to tell you, okay, we're making really good progress? I imagine it's not going to be the one thing, but what do you focus on? What's like something you're trying to push? What's like a KPI or two?
**Alexander Imbiricos:** 我不断提醒自己的一件事是,像 Codex 这样的工具天然会让你成为重度用户。所以我们可能会不小心花太多时间思考那些用户采纳旅程中非常深入的功能,然后过度优化那个方向。所以我觉得非常关键的是,去看你的 D7 留存率。重新注册一个账号从头试一遍产品。
**Alexander Imbiricos:** One of the things that I'm constantly reminding myself of is that a tool like Codex sort of naturally is a tool that you would become a power user of. Right? And so, we can accidentally spend a lot of our time thinking about features that are like very deep in the user adoption journey. And so, we can kind of end up over-solving for that. And so, I think it's like just critically important to like go look at your D7 retention. Right? Just go try the product. Like, sign up from scratch again.
我有好几个 ChatGPT Pro 账号——为了尽可能正确地 dogfooding——用 Gmail 注册的,每个月收费 200 美元。我得报销一下。但我觉得作为用户的真实感受和早期留存数据对我们来说仍然非常重要,因为虽然这个品类在爆发,但我们仍然处于非常早期的阶段。
I have a few too many ChatGPT Pro accounts that I've just — in order to maximally correctly dogfood — like signed up for on my Gmail, and they charge me like 200 bucks a month. I need to expense those. But you know, like I think just the feeling of being a user and the early retention stats are still like super important for us, because as much as this category is taking off, I think we're still in the very early days of like people using them.
另一件我们做的事——我觉得我们可能是这个领域里最关注用户反馈和社交媒体的团队——团队里好几个人一直在刷 Reddit 和 Twitter。上面有好评也有大量吐槽,但我们对吐槽非常认真,会仔细看。因为编码智能体可以用于太多场景了,它在很多特定行为上经常有各种各样的问题。
Another thing that we do that — I think we might be the most like user feedback slash social media pilled team out there in this space — is like a few of us are like constantly on Reddit and Twitter. And you know, there's praise up there, and there's a lot of complaints, but we take the complaints like very seriously and look at them. And I think that again, because you can use a coding agent for so many different things, it often is like kind of broken in many sort of ways for like specific behaviors.
所以我们其实经常在社交媒体上监测舆情。Twitter/X 上更偏炒作一些。Reddit 上更负面,但更真实。所以我最近越来越关注人们在 Reddit 上讨论使用 Codex 的情况。
And so we actually monitor a lot just like what the vibes are on social media pretty often. Especially I think for Twitter X, it's a little bit more hypey. And then Reddit is a little more negative, but real actually. So I've started increasingly paying attention to like how people are talking about using Codex on Reddit actually.
**Lenny Rachitsky:** 这对大家来说是很重要的信息。你最常看哪些 subreddit?有 r/Codex 吗?
**Lenny Rachitsky:** This is important for people to know. Which subreddits do you check most? Is there like an r/Codex?
**Alexander Imbiricos:** 算法推荐其实挺好的,但 r/Codex 确实存在。
**Alexander Imbiricos:** I mean, the algorithm's pretty good at surfacing stuff, but like r/Codex is there.
**Lenny Rachitsky:** 好的。很有趣。如果在 Twitter 上 @你,你也能看到,但可能没有 Reddit 上的信号那么有价值。
**Lenny Rachitsky:** Okay. I'll take that. Very interesting. And then if people tag you on Twitter, you still see that, but maybe not as powerful as seeing it on Reddit.
**Alexander Imbiricos:** Twitter 上更像一对一的交流,虽然是公开的。而 Reddit 有很好的投票机制,大部分用户可能还不是机器人——虽然不确定。但你能从中获得什么是重要的、其他人怎么看这些的好信号。
**Alexander Imbiricos:** Well, the thing with Twitter is it's a little bit more one-to-one even if it's like in public. Whereas like with Reddit, there's really good upvoting mechanics, and like maybe most people are still not bots — unclear. But you get like good signal on what matters and what other people think.
说到 Atlas——我简单说一下。你们发布了 Atlas。我其实发了条推说我试了 Atlas,但我不太喜欢纯 AI 搜索的体验——我有时候就想直接用 Google。等 AI 给我答案的时候我就觉得不太想等——而且当时没法切换,所以我发推说"我换回去了,体验不太好。"我感觉让 OpenAI 的一些 PM 不开心了。然后我看到有人发推说"好了,我们加了这个功能。"我猜这本来就在计划中。这大概就是"先发布、看用户怎么用、再调整"的一个例子。
So, interestingly, Atlas — I want to talk about that briefly. You guys launched Atlas. I tweeted actually that I tried Atlas, and then I don't love the AI-only search experience. I just like — I just want to Google sometimes or whatever. Like, waiting for the AI to give me an answer, I'm like, I don't want to — and there was no way to switch, so I just tweeted, "Hey, I'm switching back. It's not a great." And I feel like I made some PMs at OpenAI sad. And I saw someone tweet, "Okay, we have this now." Which I imagine was always part of the plan. It's probably an example of we just ship stuff, see how people use it, and then we figure it out.
**Lenny Rachitsky:** 一个问题是——这里面有什么故事吗?另外我很好奇,你们为什么要做一个浏览器?
**Lenny Rachitsky:** So I guess one is — I don't know, is there anything there? And two, I'm just curious, why are you guys building a web browser?
**Alexander Imbiricos:** 我之前参与过 Atlas 一段时间,现在不做了。讲讲我个人的故事——我之前做的创业公司是做屏幕共享和结对编程的,然后加入了 OpenAI。最初的想法其实是做一个上下文感知的桌面助手(contextual desktop assistant)。我觉得这很重要的原因是,把所有上下文都手动给助手、然后搞清楚它怎么帮你,这个过程太烦了。如果它能直接理解你正在做什么,就能最大程度地加速你。
**Alexander Imbiricos:** So I worked on Atlas for a bit. I don't work on it now. But you know, the a bit of the narrative here for me to tell my story — I was working on this screen sharing like pair programming startup. Right? And then we joined OpenAI. And so the idea was really to build a contextual desktop assistant. And the reason I believe that's so important is because I think it's really annoying to have to give all your context to an assistant and then to figure out how it can help you. Right? And so if it could just like understand what you were trying to do, then it could maximally accelerate you.
所以我其实仍然把 Codex 看作一个上下文感知的助手,只是从编码任务这个不同的角度切入。但至少对我个人来说——我不能代表整个产品——一部分思考是:大量工作是在网页上完成的,如果我们能做一个浏览器,就能以更 first-class 的方式为你提供上下文感知。我们不需要去 hack 其他桌面软件——那些软件对无障碍树(accessibility tree)的支持参差不齐。我们也不需要依赖截图——截图慢而且不可靠。我们可以直接在渲染引擎里面,提取任何我们需要的东西来帮助你。
And so I would — I still think of Codex actually as like a contextual assistant, from a little bit of a different angle, like starting with coding tasks. But some of the thinking at least for me personally — I can't speak for the whole product — but it was that a lot of work is done in the web, and if we could build a browser, then we could be contextual for you, but in a much more first-class way. We weren't hacking like other desktop software which have like very varied support for what content they're rendering to the accessibility tree. We wouldn't be relying on screenshots, which are a little bit slower and unreliable. Instead, we could like be in the rendering engine, right? And like extract whatever we needed to help you.
另外我喜欢用一个电子游戏的类比——不知道你有没有玩过 Halo?你走近一个物体,按 X 键它就做了正确的事。我是那种每款游戏都会读说明书的人。我记得第一次读到"上下文动作"这个概念的时候,觉得这个主意太酷了。上下文动作的关键是我们需要知道你正在尝试做什么。我们有一点上下文,然后就能帮忙。
And also, I like to think of — you know, in video games, like I don't know if you've played Halo, right? Like, you walk up to an object. You press X and it just does the right thing, right? And I was one of those guys who always read the instruction manual for every video game that I bought. And I remember the first time I read about a contextual action and I just thought it was like this really cool idea. And the thing about a contextual action is we need to know what you are attempting to do. We have a little bit of context, and then we can help.
我觉得这至关重要。想象一下我们达到了那个世界——智能体每天帮你几千次。假设我们通知你"我帮你做了这件事"的唯一方式是推送通知,那你一天收到一千条推送通知说"嘿,我做了这个,你觉得怎么样?"——那太烦了。但回到软件工程的场景——假如我正在看一个仪表板,发现某个关键指标下降了。这时候 AI 可以去查一下原因,然后在我看仪表板的那个时刻就把它的分析和可能的修复方案呈现出来。这样我就能保持心流状态,智能体也能对更多事情采取行动。
And I think this is critically important because, you know, imagine this world that we reach, right? Where we have agents that are helping you thousands of times per day. Imagine if the only way we could tell you that we helped you is if we could like push notify you. So you get a thousand push notifications a day of an AI saying like, "Hey, I did this thing. Do you like it?" It gets super annoying, right? Whereas, imagine going back to software engineering — like, I was looking at a dashboard and I noticed some key metric had like gone down. And at that point in time, an AI could like maybe go take a look and then surface the fact that it has an opinion on why this metric went down and maybe a fix right there, right when I'm looking at the dashboard. Right? That would much more keep me in flow and enable the agents to take action on many more things.
所以在我看来,拥有一个浏览器让我们有了更多上下文来判断该帮什么忙。用户对我们该看什么有更多控制权——"嘿,如果你想让我们对某件事采取行动,就在 AI 浏览器里打开它。不想的话就用别的浏览器。"非常清晰的控制和边界。然后我们有能力构建混合主动性的 UX——在合适的时机呈现上下文化的操作建议,而不是随机地打扰你。
So in my mind, part of why I'm excited for us to have a browser is that I think we then have much more context around like what we should help with. Users have much more control over what they want us to look at. It's like, "Hey, if you want us to take action on something, you can open it in your AI browser. If you don't, then you can open it in your other browser." Right? So really clear control and boundaries. And then we have the ability to build UX that's like mixed initiative so that we can surface contextual actions to you like at the time that they're helpful, as opposed to just like randomly notifying you.
**Lenny Rachitsky:** 听了 Codex 的愿景——它要做超级助手——不只是帮你写代码,而是作为一个队友甚至超级队友让你在工作中变得更强。我理解了。
**Lenny Rachitsky:** Hearing the vision for Codex being the super assistant — it's not just there to code for you. It's trying to do a lot for you as a teammate and as this kind of super teammate that makes you awesome at work. So I get this.
说到这个,Codex 还有其他非工程类的常见用例吗?非工程师的使用方式——我们聊过了设计师做原型、构建东西。还有没有什么意想不到的、非工程师使用 Codex 的方式?
Speaking of that, are there other non-engineering common use cases for Codex? Just ways that non-engineers — we talked about it, you know, designers prototyping, building stuff. Are there any kind of other unexpected ways people are using Codex that aren't engineers?
**Alexander Imbiricos:** 意想不到的用法有很多,但真正有 traction 的使用场景目前还是集中在编码相邻的或者偏技术的领域,有成熟生态的地方,比如数据分析之类的。我个人预期未来会看到更多。但目前我们让团队非常聚焦在编码这件事上,因为还有太多工作要做。
**Alexander Imbiricos:** I mean, there's a load of unexpected ways, but I think like most of what we're seeing like real traction with people using things are still for now like very — I would say coding adjacent or like sort of tech-oriented places where there's like a mature ecosystem. Or, you know, maybe you're doing data analysis or something like that. I personally am expecting that we're going to see a lot more of that over time. But for now, like, we're keeping the team very focused on just coding for now because there's so much more work to do.
**Lenny Rachitsky:** 对于想要尝试 Codex 的人,它对所有类型的代码库都有效吗?支持什么代码?如果你是 SAP,能接入 Codex 开始构建吗?最佳使用场景是什么?它在哪些场景下还不太行?
**Lenny Rachitsky:** For people that are thinking about trying out Codex, does it work for all kinds of code bases? What code does it support? If you're like, I don't know, SAP, can you add Codex and start building things? What's kind of like the sweet spot? Where does it start to not be amazing yet?
**Alexander Imbiricos:** 我很高兴你问了这个问题,因为试用 Codex 的最好方式是给它最难的任务,这跟其他编码智能体有点不同。其他工具你可能会想"让我先从简单的开始,或者 vibe code 个什么东西试试看喜不喜欢这个工具。"但我们真正把 Codex 做成了一个专业工具,你可以把最难的问题交给它。它能在你那个庞大的、目前并不完美的代码库中写出高质量的代码。
**Alexander Imbiricos:** I'm really glad you asked this question actually, because the best way to try Codex is to give it your hardest tasks, which is a little different than some of the other coding agents. Like, you know, some tools you might think, "Okay, let me start easy or just like, you know, vibe code something random and decide if I like the tool." Whereas, like, we're really building Codex to be the professional tool that you can give your hardest problems to. And you know, that writes high-quality code in your enormous code base that is in fact not perfect right now.
所以如果你要试 Codex,应该拿一个你真实面对的任务来试,不需要把它简化成一个很简单的东西。一个好的例子是你有一个棘手的 bug,不知道什么原因,让 Codex 帮你排查或者修复。
So yeah, I think if you're going to try Codex, you want to try it on a real task that you have, and not necessarily like dumb that task down to something that's like trivial, but actually like — a good one would be you have a hard bug and you don't know what's causing that bug and you ask Codex to help figure that out. Or like to implement the fix.
**Lenny Rachitsky:** 我喜欢这个回答。把你最难的问题交给它。
**Lenny Rachitsky:** I love that answer. Just give it your hardest problem.
**Alexander Imbiricos:** 但我要补充一下,如果你说"我最难的问题是我需要创建一家独角兽企业"——那显然不行,至少现在还不行。所以是给它最难的问题,但还是限定在一个问题或一个任务的范围内。这是你测试的时候的建议,之后你可以逐渐学习怎么用它处理更大的事情。
**Alexander Imbiricos:** I will say, you know, if you're like, "Hey, okay, but the hardest problem I have is that I need to build like a new unicorn business." Like, obviously, it's not going to work. Not yet. So I think it's like give it the hardest problem, but something that is still like one question, right? Or one task, to start. That's if you're testing, and then over time you can learn how to use it for like bigger things.
**Lenny Rachitsky:** 好,它支持什么编程语言?
**Lenny Rachitsky:** Yeah, what languages does it support?
**Alexander Imbiricos:** 我们训练 Codex 时支持的语言分布基本上跟这些语言在现实世界中的使用频率对齐。所以除非你在写非常冷门的语言或者某种内部私有语言,它应该都能应付你的语言。
**Alexander Imbiricos:** Basically, the way we've trained Codex is like there's a distribution of languages that we support, and it's like fairly aligned with like the frequency of these languages in the world. So, unless you're writing some like very esoteric language or like some private language, it should do fine in your language.
**Lenny Rachitsky:** 如果有人刚开始用,你能分享一个小技巧帮他们用好吗?就是那种你会悄悄告诉第一次用 Codex 的人的小建议,帮他们有一个好的体验?
**Lenny Rachitsky:** If someone was just getting started, is there a tip you could share to help them be successful? Like, if you could just whisper a little tip into someone just setting up Codex for the first time to help them have a really good time. What's something you'd whisper?
**Alexander Imbiricos:** 我会建议并行尝试几件事。给它一个难任务。让它了解一下你的代码库。围绕你的一个想法跟它制定一个计划,然后从那里慢慢推进。这里的元理念是——你在跟一个新队友建立信任。你不会直接跟一个新队友说"做这个,零上下文"。你会先确保他们了解代码库,然后对齐计划和方法,再让他们一步步去做。
**Alexander Imbiricos:** I might say try a few things in parallel. Right? So, you could try giving it a hard task. Maybe ask it to understand the code base. Formulate a plan with it around an idea that you have, and kind of build your way up from there. And like, sort of the meta idea here is — it's again, it's like you're building trust with a new teammate. Right? And so, like, you wouldn't go to a new teammate and just give them like, "Hey, do this thing. Here's zero context." You would start by like first making sure they understand the code base, and then you would like maybe align on a plan and approach, and then you would have them go off and do it bit by bit.
如果你按这种方式使用 Codex,你会自然地理解各种 prompt 的技巧,因为它是一个非常强大的智能体和模型,但 prompt Codex 跟 prompt 其他模型有点不同。
Right? And I think if you use Codex in that way, you'll just sort of naturally start to understand like the different ways of prompting it because it is a super powerful agent and model, but it is a little bit different to prompt Codex than other models.
**Lenny Rachitsky:** 再问几个问题。我们稍微提过了——随着 AI 越来越多地写代码,总有人问要不要学编程、该把时间花在什么上面。对于正在思考职业发展的人,特别是对软件工程、计算机科学感兴趣的人,你觉得计算机科学中有哪些具体方面越来越重要,应该深入?有什么可以不用太担心的?随着编码智能体在工作中越来越普及,应该在哪些技能上加大投入?
**Lenny Rachitsky:** Just a couple more questions. One, we touched on this a little bit. As AI does more and more coding, there's always this question of should I learn to code? And what should I spend time doing this sort of thing? For people that are trying to figure out what to do with their career, especially if they're into software engineering, computer science, do you think there are specific elements of computer science that are more and more important to lean into? Maybe things they don't need to worry about. Like, what do you think people should be leaning into skill-wise as this becomes more and more of a thing in our workplace?
**Alexander Imbiricos:** 这可以从几个角度来看。最直接的一个是——做一个实干的人。随着编码智能体越来越强大,即使你还是在校生或应届生,你能做的事情比以前多太多了。所以你应该充分利用这一点。
**Alexander Imbiricos:** I think there's like a couple angles you could go at this from. Well, the easiest one to think of at least is just like be a doer of things. I think that, you know, with coding agents getting better and better over time, it's just what you can do as even like someone in college or a new grad is just so much more than what that was before. And so I think you just want to be taking advantage of that.
我在招聘早期职业的人才时确实会考虑——他们用最新工具的生产力怎么样?他们应该是非常高产的。从这个角度看,相比资深人士,他们的劣势其实在缩小,因为有了这些强大的编码智能体。所以第一条建议就是——学你想学的东西,但确保花时间去做事情,而不只是完成作业。
And definitely when I'm looking at like hiring folks who are earlier career, it's definitely something that I think about — how productive are they using the latest tools? Right? They should be like super productive. And if you think of it in that way, they actually have like less of a handicap than before versus a more senior career person, because you know, the divide is actually getting smaller because they've got these amazing coding agents now. So that's one thing, which is just — I guess the advice is just like learn about whatever you want, but just make sure you spend time doing things, not just like fulfilling homework assignments.
另一面是,深入理解什么构成一个好的整体软件系统仍然非常值得。强大的系统工程能力,或者高效的团队沟通和协作能力——这些技能仍然重要,在相当长一段时间内都会继续重要。AI 编码智能体不会突然就能在没有你帮助的情况下构建完美的系统。过程会更渐进——编码智能体能验证自己的工作了。
I think the other side of it, though, is that it's still deeply worth understanding like what makes a good overall software system. So I still think that like skills like really strong systems engineering skills, or even like really effective communication and collaboration with your team — skills like that are important. They're going to continue to matter for quite some time. Like, I don't think it's going to be like all of a sudden the AI coding agents are just able to build like perfect systems without your help. I think it's going to look much more gradual, where it's like — "Okay, we have these AI coding agents, they're able to validate their work."
但仍然很重要的是——拿 Atlas 的一个工程师来说,他配置 Codex 让它能自己验证工作,这有点不容易,因为 Atlas 产品的特殊性。他的做法是提示 Codex:"嘿,你为什么不能验证自己的工作?修好它。"然后循环执行这个过程。所以在各个阶段你仍然需要人类参与来配置编码智能体使其有效工作。你仍然需要具备这种推理能力。也许打字速度有多快、精确记住怎么写一个 for-each 循环不那么重要了——但你需要能够推理不同的系统以及什么让一个高效的软件工程团队高效。
It's still important — like thinking of an engineer who was working on Atlas since we were talking about it, he set up Codex so that it can verify its own work, which is a little bit non-trivial because of the nature of the Atlas product. So the way that he did that was he actually prompted Codex like, "Hey, why can't you verify your work? Fix it." And did that on a loop. Right? And so you still at various phases are going to want a human in the loop to help configure the coding agent to be effective. And so I think you still want to be able to reason about that. So maybe it's like less important that you can type really fast and like you understand exactly how to write — not that anyone writes a for-each loop or something, right? But I think you need to be able to reason about the different systems and like what makes an effective software engineering team effective.
最后一个角度——如果你在某个领域处于知识前沿,我觉得这仍然非常值得深入。一方面是因为智能体在这些前沿领域还没那么强;另一方面是因为当你尝试推进某个领域的前沿时,你实际上会被迫利用编码智能体来加速自己的工作流程。
And then like maybe the last angle that you could take is — I think if you're on the frontier of knowledge for a given thing, I still think that's like deeply interesting to go down. Partially because that knowledge is still going to be — you know, agents aren't going to be as good at that, but also partially because I think that by trying to advance the frontier of a specific thing, you'll actually like end up being forced to take advantage of coding agents and like using them to accelerate your own workflows as you go.
**Lenny Rachitsky:** 你说的在前沿领域能举个例子吗?
**Lenny Rachitsky:** What's an example when you talk about being at the frontier of something?
**Alexander Imbiricos:** Codex 写了大量管理自身训练运行的代码和关键基础设施。我们推进得很快,所以我们有一个 Codex 代码审查流程来捕捉大量错误。它确实捕捉到了一些很有意思的配置错误。我们开始看到未来的端倪——Codex 甚至开始为自己的训练过程值班(on call),这很有意思。
**Alexander Imbiricos:** Codex writes a lot of the code that helps like manage its training runs, the key infrastructure. You know, we move pretty fast, and so we have a Codex code review that is catching a lot of mistakes. It's actually caught some like pretty interesting configuration mistakes. And we're starting to see glimpses of the future where we're actually starting to have Codex even like be on call for its own training, which is pretty interesting.
**Lenny Rachitsky:** 等等,为自己的训练值班是什么意思?就是它在跑、在训练,然后"哦,出问题了,需要有人处理"——然后它自己报警还是自己修?
**Lenny Rachitsky:** Wait, what does that mean to be on call for its own training? So it's running, it's training, and it's like, "Oh, something broke. Someone needs —" and it does — does it alert people or it's like, "Here, I'm going to fix the problem and re-"
**Alexander Imbiricos:** 这还是一个早期想法,我们还在摸索。但基本理念是,在训练运行过程中,有一堆图表和指标,现在是人类在看,这很重要。我们管这叫"看孩子"(babysitting)。
**Alexander Imbiricos:** Yeah, this is an early idea that we're figuring out. But the basic idea is that, you know, during a training run, there's like a bunch of graphs that like today like humans are looking at and it's really important to like look at those. We call this babysitting.
**Lenny Rachitsky:** 因为训练非常昂贵,而且需要快速推进。
**Lenny Rachitsky:** Because it's very expensive to train, I imagine, and very important to move fast.
**Alexander Imbiricos:** 没错。底层有很多系统在支撑训练运行。某个系统可能会挂掉,或者某个环节引入了错误。我们可能需要修复、暂停或者采取其他行动。所以让 Codex 在一个循环中持续评估这些图表随时间的变化——这就是我们关于如何大幅提升训练效率的一个想法。
**Alexander Imbiricos:** Exactly. And there's a lot of systems underlying the training run. And so like a system could go down or there could be an error somewhere that gets introduced. And so we might need to like fix it or pause things or I don't know, there's lots of actions we might need to take. And so basically having Codex like run on a loop to evaluate how those charts are moving over time — this sort of idea that we have of how to enable us to train way more efficiently.
**Lenny Rachitsky:** 我喜欢这个。这完全符合——这就是智能体的未来。Codex 不只是用来写代码的,它远不止于此。
**Lenny Rachitsky:** I love that. This is very much along the lines of — this is the future of agents. Codex isn't just for building code, right? It's a lot more than that.
**Alexander Imbiricos:** 对。
**Alexander Imbiricos:** Yeah.
**Lenny Rachitsky:** 好的。最后一个问题。你在 OpenAI,我不能不问你的 AGI 时间线以及你觉得我们离 AGI 还有多远。我知道这不是你的工作内容,但各种观点和时间线很多。你觉得我们离人类水平的 AI 还有多远,不管那对你意味着什么?
**Lenny Rachitsky:** Okay. Last question. Being at OpenAI, I can't not ask about your AGI timeline and how far you think we are from AGI. I know this isn't what you work on, but there's a lot of opinions, a lot of timelines. How far do you think that we are from a human-level version of AI, whatever that means to you?
**Alexander Imbiricos:** 对我来说,关键是什么时候我们能看到加速曲线出现——什么时候能看到曲棍球棒形的增长?我觉得目前的限制因素——有很多——但一个被低估的限制因素是人类的打字速度,或者说人类在写 prompt 时的多任务处理速度。你也提到了——你可以让智能体监控你所有的工作,但如果智能体不能自己验证工作成果,你仍然被卡在"你能不能来审查所有那些代码"这个瓶颈上。
**Alexander Imbiricos:** For me, I think that it's a little bit about like when do we see the acceleration curves kind of go like — when do we see the hockey stick? And I think that the current limiting factor — I mean, there's many — but I think a current underappreciated limiting factor is like literally human typing speed or human multitasking speed on like writing prompts. Right? And like, you know, you were talking about it — it's like you can have an agent like watch all the work you're doing, but if you don't have the agent also validating its work, then you're still bottlenecked on like, can you go review all that code, right?
所以我的看法是,我们需要打通这些生产力循环——不能再让人类来 prompt 和手动评估所有工作。如果我们能重建系统让智能体默认就是有用的,就能开始解锁曲棍球棒式的增长。
So my view is that we need to unblock those productivity loops from like humans having to prompt and humans having to like manually evaluate all the work. And so if we can like rebuild systems to let the agent be default useful, we'll start unlocking hockey sticks.
但不幸的是,这不会是二元的。它会非常依赖于你在做什么。比如我能想象明年,如果你是一家创业公司在做全新的东西——某个新 App——你可以在一个智能体能基本自主运行的技术栈上搭建。但如果你在 SAP 工作,他们有很多复杂系统,不可能一夜之间就让智能体在这些系统中自主工作。他们需要慢慢替换或更新系统,让智能体能端到端地处理更多工作。
Unfortunately, I don't think that's going to be binary. I think it's going to be very dependent on what you're building, right? So like I would imagine that like next year, if you're a startup and you're building new pieces — like some new app or something — it'll be possible for you to set it up on a stack where agents are much more self-sufficient than not, right? But now, let's say — I don't know, you mentioned SAP, right? Let's say you work in SAP. They have many complex systems and they're not going to be able to just like get the agent to be self-sufficient overnight in those systems. So they're going to have to slowly like maybe replace systems or update systems to allow the agent to like handle more of the work end to end.
所以对你问题的(也许有点无聊的)长答案是:我觉得从明年开始,我们会看到早期采用者的生产力开始出现曲棍球棒式的增长。然后在接下来的几年里,越来越大的公司也会出现这种增长。在那个模糊的中间地带的某个时刻,这种加速会反馈到 AI 实验室自身,那就是我们基本达到 AGI 级别的时候。
And so basically my sort of long answer to your question — maybe boring answer — is that I think starting next year, we're going to see early adopters like starting to hockey stick their productivity. And then over the years that follow, we're going to see larger and larger companies hockey stick their productivity. And then somewhere in that fuzzy middle is like when that hockey sticking will be flowing back into the AI labs and that's when we'll basically be at the AGI tier.
**Lenny Rachitsky:** 我很喜欢这个回答。非常务实,而且这个话题在播客里经常出现——审查 AI 做的所有事情的时间成本很烦人,也是一个大瓶颈。我很高兴你在做这件事,因为让编码变高效是一回事,解决最后那一步——"这真的好吗?"——又是另一回事。你觉得那是限制因素,这很有意思。它呼应了你前面说的——即使 AI 不再进步,我们学会更有效地使用它就还有巨大的潜力可以释放。
**Lenny Rachitsky:** I love this answer. It's very practical and it's something that comes up a lot on this podcast, just like the time to review all the things AI is doing is really annoying and a big bottleneck. I love that you're working on this because it's one thing to just make coding much more efficient and do that for people, it's another to take care of that final step of — okay, is this actually great? And that's so interesting that your sense is that's the limiting factor. It comes back to your earlier point of even if AI did not advance anymore, we have so much more potential to unlock as we learn to use it more effectively.
这真的是一个独特的回答。我从没听过这个视角——人类的打字速度才是最大的瓶颈,本质上是审查 AI 为我们做的事情的速度。
So that is a really unique answer. I've never heard that perspective on what is the big unlock — human typing speed to review basically what AI is doing for us.
**Alexander Imbiricos:** 嗯。
**Alexander Imbiricos:** Mhm.
**Lenny Rachitsky:** 太好了。好的,Alexander,我们覆盖了很多内容。还有什么我们没聊到的吗?在进入精彩的闪电轮之前,有什么你想分享或者强调的吗?
**Lenny Rachitsky:** So good. Okay. Alexander, we covered a lot of ground. Is there anything that we haven't covered? Is there anything you wanted to share, maybe double down on before we get to our very exciting lightning round?
**Alexander Imbiricos:** 有一件事——Codex 团队在扩招。就像我刚说的,我们仍然在一定程度上受限于人类的思考速度和打字速度。我们在努力解决这个问题。所以如果你是工程师、销售人员,或者——我正在招一个产品人——请联系我们。我不确定最好的联系方式是什么,但你可以去招聘页面看看。或者听众可以联系你吗?
**Alexander Imbiricos:** I think one thing is that the Codex team is growing. And as I was just saying, we're still somewhat limited by human thinking speed and human typing speed. We're working on it. So if you're an engineer or a salesperson, or I'm hiring for a product person — please hit us up. I'm not sure the best way to give contact info, but I guess you can go to a jobs page. Or do they have contact for you? Actually, do listeners have contact for you?
**Lenny Rachitsky:** 在大家给我发"嘿,我想申请 Codex"之前——我确实有一个联系表单在 lennyrachitsky.com 上。我有点怕所有优秀的人涌过来——好吧,可以联系我,我们试试看。
**Lenny Rachitsky:** Before they send me like, "Hey, I want to apply to Codex" — I do have a contact form at lennyrachitsky.com. I'm afraid of all the amazing people that are here — okay, ping me, but there you go, we could try that. Let's see how that goes.
**Alexander Imbiricos:** 或者直接给我们发私信也行。我在 Twitter 上是 imbrico。如果你有兴趣加入团队,可以联系我。
**Alexander Imbiricos:** Yeah, or I would just say you can drop us a DM. For example, I'm at imbrico on Twitter. Hit me up if you're interested in joining the team.
**Lenny Rachitsky:** 对很多人来说这是梦寐以求的工作。有什么筛选标准吗,这样不至于把你的收件箱挤爆?
**Lenny Rachitsky:** What a dream job for so many people. What's a sign they — what's like a way to filter people a little bit so they're not flooding your inbox?
**Alexander Imbiricos:** 如果你想加入 Codex 团队,你需要是一个使用这些工具的技术人才。我建议你问自己一个问题——假如我加入 OpenAI 做 Codex,在接下来 6 个月里全力以赴做出成绩,那时候一个软件工程师的工作日常会是什么样的?如果你对这个问题有自己的看法,你就应该申请。如果你没有想法需要先想想——取决于你需要想多久——这大概就是筛选标准了。很多人都在思考这个领域,我们对那些已经在思考智能体未来应该是什么样子的人非常感兴趣。我们不一定要对方向达成共识,但我们想要对这个话题充满热情的人。
**Alexander Imbiricos:** So specifically if you want to join the Codex team, then you need to be a technical person who uses these tools. And I think I would just ask yourself the question — hey, let's say I were to join OpenAI and work on Codex over the next 6 months, you know, and crush it, what does the life of a software engineer look like then? And I think if you have an opinion on that, you should apply. And if you don't have an opinion on that and have to think about it first, you know, depending on how long you have to think about it, I guess that would be the filter, right? Like I think there's a lot of people thinking about the space and so we're very interested in folks who sort of have already been thinking about like what the future should look like with agents. And like we don't have to agree on where we're going. But I think we want people who are very passionate about the topic.
**Lenny Rachitsky:** 能参与一个影响力这么大、处于可能性最前沿的产品,非常难得。对合适的人来说是一个很酷的角色。希望我们能找到合适的人,那就太棒了。好了,我们进入精彩的闪电轮。我有五个问题问你,Alexander,准备好了吗?
**Lenny Rachitsky:** It's very rare to be working on a product that has this much impact and is at such a bleeding edge of what's possible. It's a — what a cool role for the right person. So I hope we find someone, that would be incredible. With that, we've reached our very exciting lightning round. I've got five questions for you, Alexander. Are you ready?
**Alexander Imbiricos:** 我不知道是什么问题,但我很期待。来吧。
**Alexander Imbiricos:** I don't know what these are, but I'm excited. Let's do it.
**Lenny Rachitsky:** 每位嘉宾都会被问到同样的问题,除了最后一个。第一个问题,你最常推荐给别人的两三本书是什么?
**Lenny Rachitsky:** They're the same questions asked everyone except for the last one. Okay, first question, what are a couple books that you recommend most to other people? Two or three books that come to mind.
**Alexander Imbiricos:** 我最近在大量读科幻小说。我相信这本应该有人推荐过——Culture 系列,作者是 Iain Banks。我喜欢它的原因是,它是相对晚近的关于 AI 未来的写作,而且是一个乐观的未来。很多科幻都偏反乌托邦,但这个——Culture 子版块上的梗是说它是一个"太空共产主义乌托邦",或者说是"gay 太空共产主义乌托邦"。我觉得用 Culture 作为一个思考框架很有意思——我们可以迎来什么样的世界,我们今天可以做什么决定来推动那个世界的到来。
**Alexander Imbiricos:** I have been reading a lot of science fiction recently. And I'm sure this has been recommended before, but the Culture. I think it's Iain Banks is the name of the author. Part of why I love it is because it's basically relatively recent writing about a future with AI, but it's an optimistic future with AI. And I think, you know, a lot of sci-fi is like fairly dystopian. But this is like — people — uh the joke at least on the Culture subreddit is that it is a space communist utopia. Or like I think it's a gay space communist utopia. And I just think it's like really fun to think about — to use the Culture as a way to think about like what kind of world can we usher in and like what decisions can we make today to help usher in that world.
**Lenny Rachitsky:** 我不记得有人推荐过这个。我知道你在读——你录播客前提到的——《指环王》。如果你想再看一本 AI 相关的科幻,你读过《深渊上的火》(A Fire Upon the Deep)吗?
**Lenny Rachitsky:** I don't think anyone's recommended that. I know you're reading — you mentioned before we started recording — Lord of the Rings right now. If you want another AI-ish sci-fi book, have you read A Fire Upon the Deep?
**Alexander Imbiricos:** 没有。
**Alexander Imbiricos:** No, I haven't.
**Lenny Rachitsky:** 好的。非常好看。是一部科幻太空歌剧式的史诗故事,讲的是超级智能。大部分不太乐观,但有一些乐观的成分。好,下一个问题。你最近有喜欢的电影或电视剧吗?
**Lenny Rachitsky:** Okay. It's incredibly good. It's like a sci-fi space opera sort of epic tale with super intelligence. Cool. Yeah, somewhat mostly not optimistic, but somewhat optimistic. Okay, next question. Is there a favorite recent movie or TV show that you've really enjoyed?
**Alexander Imbiricos:** 有一部动漫叫《咒术回战》(Jujutsu Kaisen),我很喜欢。题材稍微有点暗,跟恶灵有关。但我喜欢的是主角非常善良。我觉得现在有一股新潮流——动漫和动画片里的主角都是非常友好、关心世界的人,而不像一些开创了这个类型的老一代动漫——比如《新世纪福音战士》(Evangelion)或者《阿基拉》(Akira)——那些主角都有严重的缺陷,非常不快乐。当时有一段时间的趋势是讽刺这些动画片里的主角年纪很小却被赋予拯救世界的荒谬重任,所以出现了一波在剧情中让角色经历严重心理问题的作品。
**Alexander Imbiricos:** Yeah, there's an anime called Jujutsu Kaisen, which I really like. Again, it's got a kind of a slightly dark topic of like demons. But what I love about it is that the hero is really nice. And I think there's this new wave of like anime and cartoons where the protagonists are really friendly and like people who care about the world, rather than being like — if you look at like some older anime like that started the genre, like you know, there was Evangelion or Akira. And like those characters, the protagonists are like deeply flawed, like quite unhappy. They didn't start the genre, but it was like a trend for a while to sort of poke fun at the idea that in these cartoons, the protagonist was very young but being given a ridiculous amount of responsibility to like save the world. And so there was kind of a wave of content that was like critiquing this by making the character like basically go through like serious mental issues in the middle of the show.
我不是说现在的一定更好,但至少看这些非常正面的主角试图帮助身边所有人,确实很开心。
And I'm not saying this is better, but at least it's quite fun to have like these really positive protagonists just trying to help everyone around them.
**Lenny Rachitsky:** 通过你的推荐能了解到好多你的性格。善良的主角、乐观的未来。
**Lenny Rachitsky:** I love how much we're learning about your personality hearing these recommendations. Nice protagonists, optimistic futures.
**Alexander Imbiricos:** 我觉得如果你不相信它,你就不可能让它成为现实。你需要一个平衡。这是你的训练数据。
**Alexander Imbiricos:** I think you know, if you don't believe it, you can't will it into existence. You need a balance. This is your training data.
**Lenny Rachitsky:** 你最近有没有发现什么特别喜欢的产品?可以是 App、衣服、厨房小工具、科技产品、帽子什么的。
**Lenny Rachitsky:** Is there a product you recently discovered that you really love? Could be an app, could be some clothing, could be some kitchen gadget, tech gadget, a hat.
**Alexander Imbiricos:** 我一直挺喜欢内燃机和汽车的。其实我最初来美国是想做美国飞机相关的工作,但后来做了软件。所以很长一段时间我只开比较老的跑车——老是因为便宜。最近我们买了一辆 Tesla。我必须说 Tesla 的软件让我觉得很有启发。
**Alexander Imbiricos:** Yeah, so I have been like quite into combustion engines and cars. Actually, the reason I came to America initially was because I wanted to work on like US aircraft. But you know, now I work in software. And so for the longest time I've basically only had like quite old sports cars. Old just because they were more affordable. And then recently — we got a Tesla instead. And I have to say that I find the Tesla software like quite inspiring.
特别是它的自动驾驶功能。我今天多次提到,我觉得思考怎么构建混合主动性软件——让你作为人类感觉最大程度被赋能、最大程度在掌控中,同时又获得大量帮助——非常有意思。我觉得他们在让汽车自己驾驶的同时,给你各种不需要关闭自动驾驶就能调整的方式,做得非常好。你可以踩油门它会听从,你可以转旋钮调速度,你可以轻微转向。我觉得这其实是构建一个让人类保持控制权的智能体的教科书级案例。
In particular, it has like the self-driving feature and you know, I've mentioned a few times today like I think it's really interesting to think about how to build like mixed initiative software that makes you feel maximally empowered as a human. Maximally in control, but yet you're getting a lot of help. And I think they did a really good job with enabling sort of the car to drive itself, but all these different ways that you can adjust what it's doing without turning off the self-driving. So like you can accelerate and it'll like listen to that. You can turn a knob to change its speed. You can steer slightly. I think it's actually a master class in like building an agent that still leaves the human in control.
**Lenny Rachitsky:** 这让我想到 Nick Turley 的口头禅就是"我们是不是被最大程度地加速了"。
**Lenny Rachitsky:** This reminds me Nick Turley's whole mantra was are we maximally accelerated.
**Alexander Imbiricos:** 对,这个理念已经完全渗透到 OpenAI 的方方面面了,这很合理。
**Alexander Imbiricos:** Yeah. It was like it's completely infiltrated everything at OpenAI, which makes sense. That tracks.
**Lenny Rachitsky:** 还有两个问题。你有没有一个人生格言或者信条,工作和生活中经常会想起的?
**Lenny Rachitsky:** Two more questions. Do you have a life motto that you often think about, come back to in work or in life that's been helpful?
**Alexander Imbiricos:** 我不确定有没有一个人生格言,但也许我可以说说我创业公司排第一的公司价值观。到现在仍然对我影响很深,就是"善良且坦诚"(be kind and candid)。
**Alexander Imbiricos:** I don't know if I have a life motto, but maybe I can tell you about the number one company value from my startup. Which is still something that sticks with me, which is to be kind and candid.
**Lenny Rachitsky:** 这很符合你的风格。善良且坦诚。
**Lenny Rachitsky:** That tracks. Kind and candid. Wow.
**Alexander Imbiricos:** 对,我们之所以把这两个放在一起,是因为我们作为创始人意识到,我们经常只是在做"好人",但那并不是正确的做法。我们会拖延那些困难的对话,不够坦诚。所以每次提醒自己这个信条后,我们就会变得更坦诚。然后 6 个月后又发现,6 个月前其实仍然不够坦诚,需要更坦诚。那问题变成了怎么坦诚?我们的答案是把坦诚视为一种善意——不仅要做到,要驱使自己去做,还要在表达方式上让对方感受到善意。
**Alexander Imbiricos:** Yeah, and we had to put them together because as founders we realized that we often would be nice, and it wasn't actually the right thing to do. We would like delay the difficult conversations and we were not candid. And so every time we would like remind ourselves of this motto and then we would become more candid and then 6 months later we would realize that we were in fact not candid 6 months ago. And we needed to be even more candid. So then the question is like okay, how should we be candid? It's like okay, well let's think of being candid as an act of kindness, but also think of that both in terms of doing it and willing ourselves to do it, but also in terms of how we frame it to people.
**Lenny Rachitsky:** 这真是总结领导力的一种优美方式。那本关于"直接挑战但发自内心地关心"的书是什么来着——《极度坦诚》(Radical Candor)。对,所以这是另一种方式来理解 radical candor。
**Lenny Rachitsky:** That is a beautiful way of summarizing how to lead well. What's the — the book about challenge directly but care deeply. Radical candor. Oh yeah, right. Yeah, so it's like another way of thinking about radical candor.
好的,最后一个问题。我查了一下你的姓——想了解一下这个姓的故事。你姓 Imbiricos。我跟 ChatGPT 聊了一下,它告诉我这个姓氏最知名的人是有影响力的希腊诗人兼精神分析学家 Andreas Imbiricos,以及他的亲戚——富有的航运巨头和艺术收藏家 George Imbiricos。所以问题是:你更认同哪一位?希腊诗人兼精神分析学家,还是富有的航运巨头和艺术收藏家?
Okay, last question. I was looking up your last name, just like hey, what's the story here? So your last name is Imbiricos. And I was talking to ChatGPT. And it told me the most famous individuals with the surname are the influential Greek poet and psychoanalyst Andreas Imbiricos. And his relative the wealthy shipping magnate and art collector George Imbiricos. So the question is which of these two do you most identify with? The Greek poet and psychoanalyst or the wealthy shipping magnate and art collector?
**Alexander Imbiricos:** 应该是诗人那位,因为他热爱我们家族的起源之岛。等等,你了解这些人?好吧,你早就知道了。
**Alexander Imbiricos:** I think it's going to have to be the poet because he loved the island that our family's from. Wait, you know these people? Okay, this is not news to you. Okay.
**Lenny Rachitsky:** 嗯,这个家族很大,但毕竟是希腊人嘛。你知道的,大家族里每个人都是你叔叔。我妈妈是马来西亚人,在马来西亚也是——所有人都是我叔叔阿姨。
**Lenny Rachitsky:** Well, I mean it's an enormous family, but it's Greek. So you know, these big families, everyone's your uncle. You know what I mean? Like my mother's Malaysian and also like everyone is my uncle or aunt in Malaysia too, if that makes sense.
**Alexander Imbiricos:** 对。但他热爱我们家族起源的那个岛。那个航运巨头我不太确定——好像在纽约吧。总之我们都来自一个叫 Andros 的岛。那是一个非常美丽的地方,牲畜比人多,游客也不太多。但我觉得他很酷的一点是他发表了很多作品,其中很多都是关于那个岛的美丽,我觉得这特别棒。
**Alexander Imbiricos:** Yeah. But he loved this island that the family sort of initiated from. I believe — I don't actually know where that shipping magnate — I think it was New York or something, but anyway, we all came from this island called Andros. Which is a really beautiful place and there's more like livestock there than humans. Not too many tourists go there. But I think he — part of what I think is really cool is like he published a lot and a lot of his writing is about like the beauty of that island, which I think is super cool.
**Lenny Rachitsky:** 哇,这个回答太好了。再问两个。大家在哪里可以找到你,如果想在网上关注你或者联系你的话?听众怎样才能对你有帮助?
**Lenny Rachitsky:** Wow, that was an amazing answer. Two more questions. Where can folks find you if they want to follow you online and maybe reach out, and then how can listeners be useful to you?
**Alexander Imbiricos:** 我是那种只为工作需要才用社交媒体的人。我的手机晚上 9 点会变成黑白模式。总之,Twitter 或者 X 上搜 imbrico 就能找到我。然后如果你在 r/Codex 上发帖我大概率会看到。
**Alexander Imbiricos:** I'm one of those people who has social media only for the purposes of having work. My phone turns black and white at like 9:00 p.m. at night. But yeah, so Twitter or X at imbrico. And yeah, if you post in r/Codex I'll probably see it. So you know, you can go there.
**Lenny Rachitsky:** 听众怎样能帮到你?
**Lenny Rachitsky:** How can listeners be useful?
**Alexander Imbiricos:** 请试用 Codex。请分享反馈。告诉我们该改进什么。我们非常认真地对待反馈。虽然增长非常棒,但仍然是非常早期的阶段,我们仍然非常关注反馈,希望永远如此。另外如果你对编码智能体和智能体的未来感兴趣,请到我们的招聘页面申请,或者在社交媒体上私信我。
**Alexander Imbiricos:** I would say please try Codex. Please share feedback. Let us know what to improve. We pay a ton of attention to feedback. I think it's like — honestly, the growth has been amazing, but it's still very early times. So we still pay a lot of attention and hope to do so forever. And also I would say if you're interested in working on the future of coding agents and then agents generally, then please apply to our job site, or message me in those social media places.
**Lenny Rachitsky:** Alexander,这次对话太棒了。每次见到做 AI 的人我都很开心,因为 AI 总给人一种冰冷、可怕、神秘的感觉。但当你见到做这些工具的人,他们总是特别好。你尤其如此——从你分享的例子中就能感受到你的乐观和善良。这就是我们希望看到的——这样的人在做将会塑造未来的工具。非常感谢你来做这期节目,很荣幸认识你。
**Lenny Rachitsky:** Alexander, this was awesome. I always love meeting people working on AI because it always feels like there's very — I don't know — sterile, scary, mysterious thing. And then you meet the people building these tools and they're always just so awesome. And you especially, just so nice — and as you like the examples you shared, optimism and kindness, you know, this is what we want to be. These are the kinds of people we want to be building these tools that are going to drive the future. So I'm really thankful that you did this. Grateful to have met you and thank you so much for being here.
**Alexander Imbiricos:** 非常感谢邀请。今天很开心。
这些例子太精彩了。你们是真的在最前沿探路,而这就是其他公司未来的工作方式。再说一次,你们做出了登上 App Store 第一名的应用,全世界都在用,至少火了一整周。你说 28 天就做出来了,其中核心功能只用了 18 天、10 天就跑通了。
对,大概 18 天的时候我们就有了一个内部员工在玩的版本,然后 10 天之后就正式上线了。
你说只用了几个工程师,两三个人。好的。然后 Atlas 你说一周就搭出来了。
不不不,Atlas 不是一周就做完的。Atlas 其实是个非常重的项目。我跟 Atlas 团队的一个工程师聊过,问他们怎么用 Codex。他说我们什么都用 Codex 做。我说好吧,那你觉得效率提升了多少?他给我的回答是:以前需要两到三个工程师花两到三周,现在是一个工程师一周搞定。
你觉得未来这种事会不会扩展到非工程师?是不是一定得工程师来做?PM 或者设计师能做吗?
我觉得我们一定会走到那个阶段,就是边界会变得模糊。你肯定还是需要有人理解他在做的东西的细节,但"细节"的定义会不断演变。就像现在写 Swift 的人不需要懂汇编语言。全世界会写汇编的人寥寥无几,他们的存在很重要,但那是一种高度专业化的能力,大多数公司其实用不到。
所以我觉得我们会自然地看到抽象层不断增加。而且现在很酷的一点是,我们正在进入"自然语言"这一层抽象。自然语言本身非常灵活。工程师可以讨论一个计划,可以讨论一份技术规格,也可以讨论一个产品想法。所以我们也可以在这些抽象层之间不断向上攀升。
但这个过程会是渐进的。不会突然之间所有人都不写代码了,全靠写需求文档。我觉得更可能的路径是这样的:先把你的 coding agent 调到能很好地预览构建结果、跑测试,这可能是大多数人最先搞定的部分。然后再让它能执行构建、看到自己改动的效果。但可能还没有搭好集成框架,让它像 Atlas 那样能加载几个示例页面来验证效果。我觉得至少在一段时间内,还是需要人来维护这些连接器、系统和组件,告诉 agent 应该跟哪些东西交互。未来会有更大的突破——Codex 自己告诉你怎么配置,甚至自己在仓库里完成配置。
活在这个时代真是太疯狂了。我很好奇这种事情的二阶效应——构建速度这么快,会带来什么变化?是不是意味着分发能力变得更重要了?是不是意味着创意本身更值钱了?想想这些变化挺有意思的,你怎么看?
我仍然不觉得创意像很多人以为的那么值钱。我还是认为执行力非常难。你可以很快把东西做出来,但你还得把它做好。它必须是一个整体上连贯、合理的产品。而且分发确实非常关键。
对,感觉现在除了"造东西"以外的所有环节都变得更重要了——想出创意、推向市场、分发渠道,这些全都更关键了。
对。我觉得我们可能经历了一个奇怪的过渡期。有一段时间,构建产品太难了,所以你只需要擅长构建就行,可能不需要对某个特定客户有深刻的理解。
但现在我觉得情况变了。如果我只能选择理解一件事,我会选择真正深入地理解某一类客户面临的问题。如果我只能带着一项核心能力入场,这就是我的选择。我认为这最终才是最重要的。如果你今天要创业,而你对一群目前被 AI 工具服务不足的客户有很深的理解和人脉,那你就稳了。反过来,如果你很擅长做网站,但没有特定的客户群体,那处境就会困难得多。
所以你看好垂直 AI 创业公司,我理解对了。
对,我完全同意。有通用的工具能解决很多问题,但也有那种专注做好某一件事的公司——比如我们要把演示文稿这件事做到极致,我们要比任何人都更理解"做演示"这个问题,我们要嵌入你的工作流。对于非常具体的问题来说,这些才是真正重要的。
好的,太棒了。说到 Codex 的进展,我猜你们有很多评测指标,也有各种公开基准。你会看什么来判断"我们在正确的方向上快速前进"?我想应该不止一个指标,但你最关注什么?有什么关键 KPI 吗?
我经常提醒自己的一件事是:像 Codex 这样的工具,用户天然会成为深度用户。所以我们很容易不知不觉把大量精力花在用户旅程后期的深度功能上,过度优化那部分。所以我觉得特别关键的是要去看你的 D7 留存率。就是重新走一遍新用户流程,自己从零开始注册试用。
我有好几个 ChatGPT Pro 账号——为了最大程度地像真实用户一样体验产品——用我的 Gmail 注册的,每个月收我 200 美元。我得把这些报销了。总之,作为一个用户的真实感受和早期留存数据,对我们来说仍然非常重要。虽然这个品类确实在爆发,但我们还处在用户使用的非常早期阶段。
另一件我们做的事是——我觉得我们可能是这个领域里最痴迷用户反馈和社交媒体的团队——我们有好几个人基本上一直泡在 Reddit 和 Twitter 上。上面有夸我们的,也有大量吐槽,但我们非常认真地对待这些吐槽并逐条分析。因为 coding agent 可以用在太多不同场景了,所以在很多特定行为上它确实有各种各样的问题。
所以我们其实经常监控社交媒体上的"风向"。Twitter/X 上的内容偏炒作一些,Reddit 上更负面,但更真实。所以我现在越来越关注人们在 Reddit 上讨论使用 Codex 的那些帖子。
这个信息很重要,大家应该知道。你最常看哪些 subreddit?有 r/Codex 吗?
算法推荐其实挺准的,会自动推相关内容。但 r/Codex 确实存在。
好的,记下了,很有意思。那如果有人在 Twitter 上 @ 你,你还是能看到的,但可能没有 Reddit 上的信号那么有价值。
Twitter 的问题是它更像一对一的对话,即使是在公开场合。而 Reddit 有很好的投票机制,大部分用户可能还不是机器人——虽然不确定。但你能从中获得很好的信号,知道什么问题重要,其他人怎么看。
说到这里我想聊一下 Atlas。你们发布了 Atlas,我当时发了条推说我试了 Atlas,但我不太喜欢纯 AI 搜索的体验。有时候我就想用 Google,不想等 AI 给我一个答案。而且当时没办法切换,所以我就发推说"我换回去了,体验不太好"。我感觉可能让 OpenAI 的一些 PM 不太开心了。然后我看到有人发推说"我们现在有这个功能了"。我猜这可能本来就在计划里。这大概就是"先发布再看用户怎么用"的典型案例。
那我有两个问题。第一,这里面有什么值得说的吗?第二,你们为什么要做一个浏览器?
我在 Atlas 上做过一段时间,现在不做了。但让我讲讲背景——我之前做的创业项目是屏幕共享、结对编程方向的,后来我们加入了 OpenAI。最初的想法其实是做一个有上下文感知能力的桌面助手。我觉得这很重要,因为每次都得手动把所有上下文喂给助手,然后再想办法让它帮你,这件事真的很烦。如果它能直接理解你正在做什么,就能最大程度地帮到你。
所以我其实一直把 Codex 看作一个有上下文感知能力的助手,只是切入角度不同,从编程任务入手。但至少从我个人的思考来看——我不能代表整个产品——逻辑是这样的:很多工作都在浏览器里完成。如果我们能做一个浏览器,就能以一种更原生的方式理解你的上下文。我们不用去 hack 其他桌面软件那些参差不齐的无障碍树(accessibility tree)支持,也不用依赖截图这种又慢又不可靠的方式。相反,我们可以直接在渲染引擎里,提取我们需要的一切信息来帮助你。
另外我喜欢用一个电子游戏的比喻。不知道你有没有玩过 Halo?你走到一个物体面前,按 X 键,它就自动执行正确的操作。我小时候买每一款游戏都会看说明书。我记得第一次读到"上下文操作"(contextual action)这个概念时觉得特别酷。上下文操作的前提是系统知道你正在做什么,有一些上下文信息,然后就能帮你。
这一点至关重要。想象一下我们到达的那个世界:agent 每天帮你做上千件事。如果唯一通知你的方式是推送通知,你一天收到一千条推送,AI 说"嘿,我做了这个,你喜欢吗?"那会烦死人。但如果换个场景——回到软件工程——假设我正在看一个监控仪表盘,发现某个关键指标下降了。这时候 AI 可以去查一下原因,然后就在我看仪表盘的那个瞬间,告诉我它对指标下降的分析和可能的修复方案。这样我就能保持在心流状态,同时让 agent 处理更多事情。
所以在我看来,做浏览器让我兴奋的原因之一是:我们能获得更多上下文来判断该帮用户做什么。用户也有更多控制权来决定让我们看什么。比如"如果你想让我们处理某个东西,就用 AI 浏览器打开它;不想的话就用别的浏览器"。控制和边界非常清晰。同时我们还能构建"混合主动"(mixed initiative)的用户体验——在合适的时机给你推上下文操作,而不是随机打扰你。
听到 Codex 作为"超级助手"的愿景——它不只是帮你写代码,而是想作为你的队友在方方面面帮你,让你在工作中更强大。这我理解了。
说到这个,Codex 还有哪些非工程师的常见使用场景?我们之前提到了设计师做原型和产品。还有没有什么意想不到的使用方式?
意想不到的用法确实很多,但目前真正有规模化使用的场景,还是集中在编程相关或者技术导向的领域,就是那些已经有成熟生态的方向。或者比如数据分析之类的。我个人预期未来会看到更多非编程场景。但目前我们让团队非常聚焦在编程这一块,因为还有太多工作要做。
对于想尝试 Codex 的人来说,它支持所有类型的代码库吗?支持哪些语言?比如说如果你是做 SAP 的,能用 Codex 来开发吗?它的甜蜜区在哪里?什么情况下效果还不够好?
你问这个问题我真的很高兴,因为试用 Codex 最好的方式是直接给它你最难的任务。这跟其他一些 coding agent 不太一样。有些工具你可能会想"让我先从简单的开始,随便写个东西试试看"。但我们做 Codex 的定位是:这是一个专业工具,你可以把最难的问题交给它。它能在你那个庞大的、不完美的代码库里写出高质量的代码。
所以如果你要试 Codex,建议你用一个真实的任务,不要特意简化。一个好的场景是:你有一个很难的 bug,不知道原因,让 Codex 帮你排查,或者让它实现修复。
我喜欢这个回答。直接给它最难的问题。
不过我也得说,如果你的"最难问题"是"帮我建一个独角兽公司",那显然不行。至少现在还不行。所以应该是把最难的问题给它,但这个问题要是一个具体的问题,一个具体的任务。这是测试阶段的建议,之后你可以慢慢学会用它处理更大的事情。
它支持哪些编程语言?
基本上 Codex 的训练涵盖了一个语言分布,跟这些语言在现实世界中的使用频率大致对齐。所以除非你用的是某种非常冷门的语言或者私有语言,否则它应该都能处理得不错。
如果有人刚开始用,你能分享一个帮他们上手成功的小窍门吗?假设你可以对一个刚设置好 Codex 的人悄悄说一句话,让他们体验更好,你会说什么?
我会说,试着同时做几件事。给它一个难的任务,让它先理解你的代码库,围绕你的一个想法跟它一起制定计划,然后循序渐进地推进。这里面的核心思路是——就像你跟一个新队友建立信任一样。你不会一上来就扔给新队友一个任务说"做这个,不给任何上下文"。你会先确保他理解代码库,然后对齐方案和思路,然后让他一步步去做。
如果你用这种方式使用 Codex,你会自然而然地学会不同的提示方法。它确实是一个非常强大的 agent 和模型,但 Codex 的提示方式跟其他模型有些不同。
再问几个问题。有一个我们之前稍微提到过的话题。随着 AI 承担越来越多的编程工作,总有人问:我还该不该学编程?该把时间花在什么上面?对于那些在规划职业方向的人,特别是对软件工程和计算机科学感兴趣的人,你觉得计算机科学中哪些方向越来越重要?哪些不用太担心了?随着 AI 在工作中的作用越来越大,大家应该重点发展什么技能?
我觉得可以从几个角度来看。最容易想到的一点是:做一个行动者。随着 coding agent 越来越强,即使是大学生或应届毕业生,能做的事情也比以前多得多。所以你要充分利用这个优势。
我在招聘早期职业阶段的人时,确实会考虑一个问题:他们用最新工具的生产力怎么样?他们应该非常高产。如果从这个角度想,年轻人其实比以前少了很多劣势,因为他们和资深人士之间的差距在缩小——毕竟他们有这些强大的 coding agent。所以第一点建议就是:学你感兴趣的东西,但一定要花时间动手做事,不要只是完成作业。
另一方面,深入理解什么构成了一个好的软件系统,仍然非常值得。我仍然认为强大的系统工程(systems engineering)能力,甚至高效的团队沟通协作能力,这些技能都很重要,而且还会重要很长时间。AI coding agent 不会突然就能在没有你帮助的情况下构建出完美的系统。过程会更渐进——"好的,我们有这些 AI coding agent,它们能验证自己的工作。"
但关键是——还是拿 Atlas 的那个工程师举例,他把 Codex 配置成能验证自己的工作。这件事并不简单,因为 Atlas 产品的特殊性。他的做法是反复提示 Codex:"嘿,你为什么不能验证自己的工作?修好它。"然后循环执行。所以在很多阶段,你仍然需要一个人来帮助配置 coding agent,让它变得高效。你仍然需要能思考这些问题。所以可能不那么重要的是你打字速度有多快、你是不是清楚每一种循环语句怎么写,但你需要能够推理不同的系统以及什么让一个高效的软件工程团队高效。
最后一个角度是:如果你处在某个领域的知识前沿,我觉得那仍然是非常值得深入的方向。一方面是因为 agent 在前沿知识上不会那么强,另一方面是因为当你尝试推进某个领域的边界时,你实际上会被迫利用 coding agent 来加速自己的工作流程。
你说的"处在前沿"能举个例子吗?
Codex 自己写了很多管理训练运行的代码,包括关键基础设施。我们迭代很快,所以我们有一个 Codex 代码审查流程,能捕获很多错误。它实际上发现了一些挺有意思的配置错误。我们开始看到未来的雏形——我们正在让 Codex 为自己的训练值班(on-call),这非常有意思。
等等,为自己的训练值班是什么意思?它在训练过程中,某个东西坏了,需要有人处理——然后它自己来?它会通知人,还是说"我来修这个问题然后继续"?
对,这还是一个早期的想法,我们在探索中。基本思路是,在训练运行期间有大量图表,今天是人类在盯着看的,盯这些图表非常重要。我们内部管这叫"babysitting"(看孩子)。
因为训练成本很高,而且必须快速推进。
没错。训练运行背后有大量底层系统,任何一个系统可能出故障,或者某处引入了错误。我们可能需要修复它、暂停训练,或者采取其他各种操作。所以基本想法就是让 Codex 在一个循环里运行,持续评估那些图表随时间的变化趋势——这就是我们构想的让训练效率大幅提升的方式。
太棒了。这完全符合之前说的——这就是 agent 的未来。Codex 不只是用来写代码的,它能做的远不止于此。
对。
好,最后一个问题。你在 OpenAI,我不能不问 AGI 时间线的问题。你觉得我们离 AGI 还有多远?我知道这不是你直接负责的方向,但各种观点和时间线都有。你认为我们离人类水平的 AI 还有多远,不管你怎么定义它?
对我来说,问题在于我们什么时候能看到加速曲线出现拐点——什么时候能看到那个 hockey stick(指数增长)。我觉得当前的限制因素有很多,但一个被低估的限制因素其实是人类的打字速度,或者说人类写提示词时的多任务处理速度。你之前也提到了——你可以让一个 agent 监控你所有的工作,但如果 agent 不能自行验证它的工作成果,那你仍然受限于你能不能审查完所有那些代码。
所以我的观点是:我们需要打通这些生产力循环,不能让人类必须亲自提示和手动评估所有工作。如果我们能重建系统,让 agent 默认就是有用的,我们就会开始看到指数增长。
不幸的是,这不会是一个二元的变化,而是高度依赖于你在做什么。比如明年,如果你是一家初创公司,在做新的东西——比如一个新 app——你完全可以在一个 agent 能高度自主运转的技术栈上起步。但如果你提到的是 SAP 那样的公司,他们有很多复杂系统,不可能一夜之间就让 agent 在那些系统里自主运转。他们需要慢慢替换或升级系统,让 agent 能端到端地处理更多工作。
所以我对你这个问题的详细回答——可能有点无聊——是:我觉得从明年开始,我们会看到早期采用者的生产力开始出现指数增长。之后的几年里,越来越大的公司也会开始指数增长。在这个模糊的过渡期的某个节点上,这种生产力的指数增长会回流到 AI 实验室本身,那基本上就是我们达到 AGI 水平的时候。
我很喜欢这个回答。非常务实,而且这是这个播客里经常讨论的话题——审查 AI 工作成果这件事真的很烦,是一个巨大的瓶颈。我很高兴你在解决这个问题,因为让编程效率提升是一回事,但搞定最后那一步——这东西到底好不好——是另一回事。你认为这才是限制因素,这一点非常有意思。这也呼应了你前面说的:即使 AI 不再进步,光是学会更有效地使用现有的 AI,我们就还有巨大的潜力可以释放。
这是一个非常独特的观点。我以前从没听人从这个角度分析最大的瓶颈是什么——人类的打字速度和审查 AI 工作的速度。
嗯。
太好了。好的,Alexander,我们今天聊了非常多的内容。还有什么我们没聊到的吗?进入精彩的闪电问答环节之前,有什么想补充或强调的?
有一件事是 Codex 团队在扩招。正如我刚才说的,我们仍然在某种程度上受限于人类的思考速度和打字速度,但我们正在解决这个问题。所以如果你是工程师、销售,或者我正在招的产品经理——请联系我们。我不确定最好的联系方式是什么,不过你可以去看我们的招聘页面。或者他们有你的联系方式吗?你的听众怎么联系你?
在他们发消息说"我想申请 Codex"之前——我在 lennyrachitsky.com 上有一个联系表单。我有点怕这里这么多优秀的人——好吧,联系我吧,就这样,看看会怎样。
对,或者你也可以直接发私信给我们。比如我在 Twitter 上的用户名是 imbrico,如果你对加入团队感兴趣,可以联系我。
对很多人来说这是梦想工作。你怎么做初步筛选,避免收件箱被淹没?
具体来说,如果你想加入 Codex 团队,你需要是一个使用这些工具的技术人员。我觉得你可以问自己一个问题:假设我加入 OpenAI 做 Codex,接下来 6 个月全力投入,那时候软件工程师的工作日常会是什么样?如果你对这个问题有自己的想法,你就应该申请。如果你还没有想法、需要先想一想——具体看你需要想多久吧,这就是筛选标准。很多人都在思考这个领域,所以我们非常感兴趣的是那些已经在思考 agent 未来应该是什么样子的人。我们不需要在方向上完全一致,但我们希望找到对这个话题充满热情的人。
能参与一个影响力这么大、又处在技术最前沿的产品,这种机会非常罕见。对合适的人来说,这是一个多么酷的岗位。希望我们能帮你找到合适的人,那就太好了。好的,接下来是我们精彩的闪电问答环节。Alexander,我有五个问题要问你,准备好了吗?
我不知道是什么问题,但我很期待。来吧。
除了最后一个问题以外,其他都是我问每位嘉宾的固定问题。第一个问题,你最常推荐给别人的两三本书是什么?
我最近读了很多科幻小说。这个可能之前有人推荐过,但我要说 Culture 系列(文明系列)。作者是 Iain Banks。我喜欢它的原因之一是,它是相对较近期的作品,写的是有 AI 的未来世界,而且是一个乐观的 AI 未来。很多科幻都偏反乌托邦,但这个不一样。Culture 的 subreddit 上有个段子,说它是一个"太空共产主义乌托邦",或者更准确地说是"同性恋太空共产主义乌托邦"。我觉得用 Culture 来思考我们可以创造什么样的世界、今天可以做哪些决定来推动实现那样的世界,这真的很有意思。
我觉得还没有人推荐过这套书。我知道你刚才录之前说你正在读《指环王》。如果你想再看一本跟 AI 相关的科幻小说,你读过 A Fire Upon the Deep 吗?
没有。
好的,那本书非常好。是一部科幻太空歌剧,涉及超级智能。很酷。整体不算乐观,但有些乐观的部分。好,下一个问题。你最近有没有特别喜欢的电影或电视剧?
有,一部叫《咒术回战》(Jujutsu Kaisen)的动漫,我非常喜欢。题材偏暗黑,跟恶灵有关。但我喜欢的是主角真的很善良。我觉得现在有一波新的动漫和动画,主角都是非常友善、关心世界的人,而不是像一些早期的经典作品那样——比如《新世纪福音战士》或《阿基拉》,那些主角都有很深的缺陷,非常不快乐。当时不是说这些开创了这个类型,但有一阵子的趋势是讽刺这类动画中主角年纪很小却被赋予拯救世界的荒唐责任。所以出现了一波作品,通过让角色在剧情中经历严重的心理危机来进行反思。
我不是说现在的作品更好,但至少看到这些非常正面的主角努力帮助身边的人,确实很有意思。
从你的推荐里真的能看出你的性格。善良的主角,乐观的未来。
如果你不相信那样的未来,你就没法把它变成现实。需要有这种平衡。这就是你的训练数据。
你最近有没有发现什么特别喜欢的产品?可以是 app、衣服、厨房用品、电子产品、帽子,都行。
我一直对内燃机和汽车很感兴趣。其实我最初来美国是因为想做美国的飞机项目。但后来做了软件。所以很长一段时间我一直开比较老的跑车,老的原因主要是更便宜。最近我们买了一辆 Tesla。我得说,Tesla 的软件确实让我很受启发。
特别是它的自动驾驶功能。我今天多次提到,我觉得思考怎么构建"混合主动"的软件非常有意思——让你作为人类感到最大程度的赋能和掌控,同时又获得大量帮助。我认为 Tesla 在这方面做得非常好:车可以自己开,但有很多方式让你在不关闭自动驾驶的情况下调整它的行为。你可以踩油门,它会响应;你可以转旋钮调整速度;你可以轻微转向。我觉得这其实是一个关于如何构建"agent 仍然让人类保持掌控"的大师课。
这让我想起 Nick Turley 的口头禅:"我们是不是被最大程度地加速了?"
对,这个理念已经渗透到 OpenAI 的方方面面了。很合理。
还有两个问题。你有没有一个经常想起的人生信条,在工作或生活中反复回到的?
我不确定有没有人生信条,但也许我可以跟你分享我创业公司的第一条核心价值观。这句话一直跟着我,就是"be kind and candid"——善良且坦诚。
很符合你。善良且坦诚。
对,我们必须把这两个词放在一起,因为作为创始人我们发现自己经常只是"友好",但那并不是正确的做法。我们会拖延那些困难的对话,不够坦诚。所以每次我们提醒自己这个信条,然后变得更坦诚,再过 6 个月回头看,又发现 6 个月前其实还是不够坦诚,需要更进一步。于是问题变成:怎样才算坦诚?答案是把坦诚当作一种善意的行为——不仅要鼓起勇气做到坦诚,还要在表达方式上让对方感受到善意。
这是对如何做好领导力的一个非常精辟的总结。那本书叫什么来着——要"直接挑战但发自内心地关心"。《极致坦诚》(Radical Candor)。对,所以这其实是对 Radical Candor 的另一种表达。
好,最后一个问题。我查了一下你的姓氏,想知道背后的故事。你姓 Imbiricos。我跟 ChatGPT 聊了一下,它告诉我这个姓氏最有名的人是希腊诗人兼精神分析学家 Andreas Imbiricos,以及他的亲戚——富有的船运大亨和艺术收藏家 George Imbiricos。问题来了:你更认同哪一位?希腊诗人和精神分析学家,还是富有的船运大亨和艺术收藏家?
我得选诗人,因为他热爱我们家族来自的那个岛。等等,你认识这些人?好吧,这对你来说不是新闻。好的。
我是说,这是一个很大的家族,但毕竟是希腊人嘛。你懂的,这种大家族里每个人都是你的叔叔。我妈妈是马来西亚人,在马来西亚也是一样,人人都是我的叔叔阿姨。
对。但他热爱我们家族起源的那个岛。我其实不太确定那个船运大亨是在哪里——好像是纽约,不管了。总之我们都来自一个叫 Andros 的岛。那是一个非常美的地方,岛上的牲畜比人还多,也没什么游客。我觉得他特别酷的一点是他发表了很多作品,其中很多就是关于那个岛之美的,我觉得这非常棒。
这个回答太精彩了。最后两个问题。大家在哪里能找到你?如果想关注你或联系你的话。另外,听众怎样能帮到你?
我是那种只为工作才用社交媒体的人。我的手机每天晚上 9 点就自动变成黑白模式。但我在 Twitter/X 上的用户名是 imbrico。如果你在 r/Codex 上发帖,我大概率会看到。所以你可以去那里。
听众怎样能帮到你?
请试用 Codex,请分享反馈,让我们知道哪里需要改进。我们非常非常重视反馈。说实话,虽然增长很好,但现在还是非常早期的阶段。所以我们仍然会认真对待每一条反馈,希望能一直这样做下去。另外如果你对 coding agent 和 agent 的未来感兴趣,请去我们的招聘页面申请,或者在那些社交媒体上联系我。
Alexander,这次对话太棒了。我一直很喜欢和做 AI 的人聊天,因为 AI 总给人感觉很无菌、很可怕、很神秘。但当你真正认识做这些工具的人,他们总是那么好。你尤其如此——从你分享的例子里能看到你的乐观和善良。这就是我们想成为的人,这就是我们希望由什么样的人来打造驱动未来的工具。非常感谢你来做这期节目。很高兴认识你,谢谢你。
谢谢你的邀请,聊得很开心。
**Alexander Imbiricos:** Yeah, thanks so much for having me. This is fun.