**Kevin Weil:** 你现在用的 AI 模型,是你这辈子用过的最差的 AI 模型。当你真正把这件事想明白的时候,感觉挺疯狂的。我之前工作过的每个地方,你大概都知道自己在什么技术基础上做开发,但 AI 完全不是这样。每隔两个月,计算机就能做到之前从来做不到的事,你必须完全换一种思路来想你在做的东西。
**Kevin Weil:** The AI models that you're using today is the worst AI model you will ever use for the rest of your life, and when you actually get that in your head, it's kind of wild. Everywhere I've ever worked before this, you kind of know what technology you're building on, but that's not true at all with AI. Every two months, computers can do something they've never been able to do before and you need to completely think differently about what you're doing.
**Lenny:** 你是可能是当今世界上最重要的公司的首席产品官。我想聊聊,身处这场风暴的中心是什么感觉。
**Lenny:** You're chief product officer of maybe the most important company in the world right now. I want to chat about what it's just like to be inside the center of the storm.
**Kevin Weil:** 我们的基本心态是:两个月后会有更好的模型,它会碾压现在这些局限性。我们也这样跟开发者说——如果你正在做的产品刚好踩在模型能力的边界上,继续做下去,因为你方向是对的。再给它几个月,模型就会变得很强,你那个勉强能用的产品到时候就真的会起飞了。
**Kevin Weil:** Our general mindset is in two months, there's going to be a better model and it's going to blow away whatever the current set of limitations are. And we say this to developers too. If you're building and the product that you're building is kind of right on the edge of the capabilities of the models, keep going because you're doing something right. Give it another couple months and the models are going to be great, and suddenly the product that you have that just barely worked is really going to sing.
**Lenny:** 众所周知,你在 Facebook 领导了一个叫 Libra 的项目。
**Lenny:** Famously, you led this project at Facebook called Libra.
**Kevin Weil:** Libra 可能是我职业生涯中最大的遗憾。这个东西今天不存在于世界上,让我发自内心地失望,因为如果我们能把这个产品发出去,世界会变得更好。我们试图发布一条新的区块链(blockchain)。最初是一篮子货币。它会整合进 WhatsApp 和 Messenger。我能在 WhatsApp 上免费给你转 50 美分。这个东西应该存在。说实话,现在的政府对加密货币很友好。Facebook 的声誉也今非昔比了。也许他们现在应该去做这件事。
**Kevin Weil:** Libra is probably the biggest disappointment of my career. It fundamentally disappoints me that this doesn't exist in the world today because the world would be a better place if we'd been able to ship that product. We tried to launch a new blockchain. It was a basket of currencies originally. It was integration into WhatsApp and Messenger. I would be able to send you 50 cents in WhatsApp for free. It should exist. To be honest, the current administration is super friendly to crypto. Facebook's reputation is in a very different place. Maybe they should go build it now.
**Lenny:** 今天我的嘉宾是 Kevin Weil。Kevin 是 OpenAI 的首席产品官(Chief Product Officer),而 OpenAI 可能是当今世界上最重要、影响力最大的公司,站在 AI、AGI,也许未来某天是超级智能(super intelligence)的最前沿。他曾任 Instagram 和 Twitter 的产品负责人,是 Facebook Libra 加密货币的联合创造者——我们会聊到这件事。他还担任 Planet、Strava、Black Product Managers Network 和 Nature Conservancy 的董事。他人也特别好,有非常多智慧可以分享。我们会聊 OpenAI 是怎么运作的、AI 对我们工作和做产品的影响、AI 生态中哪些市场是 OpenAI 这样的公司不太可能去做的——因此适合创业者去占据。还有为什么学会写 eval 正在迅速成为产品构建者的核心技能、AI 时代什么能力最重要、他在教孩子关注什么,等等。这是一期很特别的节目,我非常兴奋能呈现给你们。如果你喜欢这个播客,别忘了在你最常用的播客 app 或 YouTube 上订阅和关注。如果你成为我 newsletter 的年度订阅者,你能免费获得一年的 Perplexity Pro、Linear、Notion、Superhuman 和 Granola。去 lennysnewsletter.com 看看,点 bundle。下面有请 Kevin Weil。
[广告段落]
[广告段落]
**Lenny:** Today my guest is Kevin Weil. Kevin is chief product officer at OpenAI, which is maybe the most important and most impactful company in the world right now, being at the forefront of AI and AGI and maybe someday super intelligence. He was previously head of product at Instagram and Twitter. He was co-creator of the Libra Cryptocurrency at Facebook, which we chat about. He's also on the boards of Planet and Strava and the Black Product Managers Network and the Nature Conservancy. He's also just a really good guy and he has so much wisdom to share. We chat about how OpenAI operates, implications of AI and how we will all work and build product, which markets within the AI ecosystem, companies like OpenAI won't likely go after and thus are good places for startups to own. Also, why learning the craft of writing evals is quickly becoming a core skill for product builders, what skills will matter most in an AI era and what he's teaching his kids to focus on and so much more. This is a very special episode and I am so excited to bring it to you. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. If you become an annual subscriber of my newsletter, you get a year free of Perplexity Pro, Linear, Notion Superhuman and Granola. Check it out at lennysnewsletter.com and click bundle. With that, I bring you Kevin Weil.
**Lenny:** Kevin,非常感谢你来,欢迎来到播客。
**Lenny:** Kevin, thank you so much for being here and welcome to the podcast.
**Kevin Weil:** 非常感谢你邀请我。我们说了好久要做这件事,终于做成了。
**Kevin Weil:** Thank you so much for having me. We've been talking about doing this forever and we made it happen. We did it.
**Lenny:** 成了!我无法想象你的生活有多疯狂,所以我真的很感谢你抽出了时间。而且我们录制这期的这一周,你们刚好发布了新的图像模型,这是个开心的巧合。我整个社交信息流里全是大家把自己的生活照和家庭照做成吉卜力风格的图片,所以——干得好。
**Lenny:** I can't imagine how insane your life is, so I really appreciate that you made time for this and we're actually recording this the week that you guys launched your new image model, which is a happy coincidence. My entire social feed is filled with ghiblifications of everyone's life and family photos and everything, so good job.
**Kevin Weil:** 是的,我也是。我太太 Elizabeth 发了一张她自己的给我,所以我跟你感同身受。
**Kevin Weil:** Yep, mine too. My wife, Elizabeth, sent me one of hers, so I'm right there with you.
**Lenny:** 让我直接问——你们预料到会有这种反响吗?感觉这是 AI 领域发生过的最 viral 的事情了,这个标准很高,因为——我不知道——ChatGPT 发布以来。你们是不是预期到它会这么成功?公司内部是什么感觉?
**Lenny:** Let me just ask, did you guys expect this kind of reaction? It feels like this is the most viral thing that's happened in AI, which a high bar since, I don't know, ChatGPT launched. Just like, did you guys expect it to go this well? What does it feel like internally?
**Kevin Weil:** 在我的职业生涯中有几次这样的时刻,你在内部做一个产品,然后内部使用量就爆了。顺便说一下,我们在 Instagram 做 Stories 的时候就是这样。在我职业生涯中,没有哪次比这个更让我强烈地感觉到——它会成功。因为我们内部所有人都在用,周末去玩了回来,你就知道大家在干什么了,然后你会说:"哦嘿,我看到你去露营了,怎么样?" 你会想:"天啊,这玩意儿真的好使。" ImageGen 绝对是其中之一。我们玩了大概几个月,它第一次在公司内部上线的时候,有一个小画廊,你可以生成自己的,也能看到其他人在生成什么,然后就是不停地轰动。所以——是的,我们有预感这会很好玩,大家会喜欢。
**Kevin Weil:** There have been a handful of times in my career when you're working on a product internally and the internal usage just explodes. This was true by the way when we were building stories at Instagram. More than anything else in my career, we could feel it was going to work because we were all using it internally and we'd go away for a weekend. Before it launched we were all using it and we'd come back after a weekend and we would know what was going on and be like, "Oh, hey, I saw you were at that camping trip, how was that?" You were like, "Man, this thing really works." ImageGen was definitely one of those, so we'd been playing with it for, I don't know, a couple months and when it first went live internally to the company, there was kind of a little gallery where you could generate your own, you could also see what everyone else was generating and it was just nonstop buzz. So yeah, we had a sense that this was going to be a lot of fun for people to play with.
**Lenny:** 太酷了。这应该可以作为一种衡量标准——对即将发布的东西有多大信心:如果内部所有人都为之疯狂。
**Lenny:** That's really cool. That should be a measure of just confidence into something going well that you're launching is internally everyone's going crazy for it.
**Kevin Weil:** 是的。尤其是社交类的东西,因为作为一个公司你在社交上是一个很紧密的网络,你们互相认识,而且你们理应是自己产品的专家。所以在某种意义上,如果你做的东西有社交属性但在内部都火不起来,你可能得质疑一下自己在做什么了。
**Kevin Weil:** Yeah. Especially social things because you have a very tight network as a company socially, so you know each other and you're experts in your product hopefully. And so there's some sense in which if you're doing something social and it's not taking off internally, you might question what you're doing.
**Lenny:** 对了,顺便问一下——吉卜力那个事儿,是你们刻意引导的吗?那是怎么开始的?是有意的示例吗?
**Lenny:** Yeah, and by the way, the Ghibli thing, is that something you guys seeded or how did that even start? Was that an intentional example?
**Kevin Weil:** 我觉得只是人们喜欢那种风格,而模型在模仿风格方面非常强,或者说它理解什么是…… 它在指令遵循(instruction following)方面非常好。这其实是我觉得人们正在慢慢发现的一件事——你可以用它做非常复杂的事情。你可以给它两张图片,一张是你的客厅,另一张是一堆照片或纪念品或你想摆的东西,然后你说:"告诉我你会怎么布置这些。" 或者你可以说:"我想让你展示一下,如果把这个放这里,那个放在这个右边,另一个放在那个左边、但在那个下面,会是什么样子。" 模型真的能理解所有这些并做到。非常强大。所以我只是对人们会想出的各种玩法感到兴奋。
**Kevin Weil:** I think it's just the style people love and the model is really capable at emulating style or understanding what... It's very good at instruction following. That's actually something that I think people... I'm starting to see people discover with it, but you can do very complex things. You can give it two images, one is your living room and the other is a whole bunch of photos or memorabilia or things you want and you say, "Tell me how you would arrange these things." Or you can say, "I'd like you to show me what this will look like if you put this over here and this thing to the right of that and this one to the left of this, but under that one." And the model actually will understand all of that and do it. It's incredibly powerful. So I'm just excited about all the different things people are going to figure out.
**Lenny:** 好的。干得好。OpenAI 团队干得好。让我们认真聊聊,把镜头拉远一点。在我看来,你是可能是当今世界上最重要的公司的首席产品官。不是要把标准定太高啊,但你们正在开启 AI、AGI——在某个时间点——超级智能——在某个时间点。没什么大不了的。我有比对任何其他嘉宾都更多的问题想问你。实际上我在 Twitter 和 LinkedIn 以及我的社区里发了个征集帖:"你想问 Kevin 什么?" 我收到了超过 300 个结构良好的问题,我们要把每一个都过一遍。所以让我们开始吧。开玩笑的。
**Lenny:** Yeah. All right. Well, good job. Good job team OpenAI. Let's get serious here and let's zoom out a little bit. The way I see it is you're chief product officer of maybe the most important company in the world right now. Just not to set the bar too high, but you guys are ushering in AI, AGI at some point, super intelligence at some point. No big deal. I have more questions for you than I've had for any other guest. Actually put out a call-out on Twitter and LinkedIn and my community just like what would you want to ask Kevin? And I had over 300 well-formed questions and we're going to go through every single one. So let's just get started. I'm just joking.
**Kevin Weil:** 好。
**Kevin Weil:** Cool.
**Lenny:** 我挑了最好的,有很多东西我真的很好奇。
**Lenny:** I picked out the best and there's a lot of stuff I'm really curious about.
**Kevin Weil:** 现在这里是下午一点,天还要很久才会黑,所以来吧。
**Kevin Weil:** Well, it's 1 PM here. It doesn't get dark for a while, so let's do it.
**Lenny:** 好,走起。首先,我先做个笔记——AGI 什么时候发布?十二月的哪一天?
**Lenny:** Okay, here we go. Okay, so first of all, I'm just going to take notes here. When is AGI launching? When in December?
**Kevin Weil:** 我的意思是,我们刚发了一个很好的图像生成模型。那算吗?越来越接近了。越来越接近了。有一句话我很喜欢:"AI 就是那些还没被做到的事。" 因为一旦做到了、大概能用了,你就管它叫机器学习(machine learning);等它无处不在了,它就只是一个算法(algorithm)。所以我一直觉得很有意思,我们把那些还不太行的东西叫 AI,等到它变成一个给你推荐关注的 AI 算法——哦,那只是个算法而已。但新的那些东西,比如自动驾驶——那才是 AI。我觉得在某种程度上我们永远会处于这种状态:下一个东西永远是 AI,而我们每天在用的、已经是生活一部分的东西——那只是个算法。
**Kevin Weil:** I mean, we just launched a good ImageGen model. Does that count? It's getting there. It's getting there. There's this quote I love, which is "AI is whatever hasn't been done yet" because once it's been done, when it kind of works, then you call it machine learning, and once it's kind of ubiquitous and it's everywhere, then it's just an algorithm. So I've always loved that we call things AI when they still don't quite work and then by the time it's like an AI algorithm that's recommending you follow, oh, that's just an algorithm, but this new thing, like self-driving cars, that's it. I think to some degree we're always going to be there and the next thing is always going to be AI and the current thing that we use every day and is just a part of our lives, that's an algorithm.
**Lenny:** 太有意思了。因为在湾区你看到自动驾驶的车到处跑,现在觉得太正常了。但四年前、三年前,你看到这个会说:"天哪,这是什么…… 我们活在未来。" 现在我们就这么习以为常了。
**Lenny:** It's so interesting because in the Bay Area you see self-driving cars driving around and it's so normal now when four years ago and three years ago, you would've seen this and you'd be like, "Holy shit, what is... We're in the future." And now we're just so take it for granted.
**Kevin Weil:** 什么事都是这样。如果我给你看…… GPT-3 发布的时候,我当时还不在 OpenAI,只是个用户,但那是颠覆认知的。如果我现在给你 GPT-3,就直接接到 ChatGPT 里让你用,你会说:"这是什么鬼东西?" 完全不行。跟我第一次坐 Waymo 的体验一样——你的第一次乘坐,至少我的第一次乘坐,前十秒钟 Waymo 开始开了你会说:"哦天哪,小心那辆自行车!" 你抓住能抓的一切。然后五分钟之后你平静下来了,你意识到有人在没有司机的情况下载着你穿越城市,而且一切正常。你就想:"天哪,我正活在未来。" 再过十分钟,你无聊了,你在手机上处理邮件、回 Slack 消息,突然之间这个人类发明的奇迹就变成了你此后生活中理所当然的一部分。我们所有人适应 AI 的方式确实有点像这样。这些神迹般的事发生了,计算机能做到以前从没做到的事,我们集体震惊一个星期,然后就——哦,对。哦,现在它只是机器学习了,正在通往变成算法的路上。
**Kevin Weil:** I mean there's something like that with everything. If I showed you... When GPT-3 launched, I wasn't at OpenAI then. I was just a user, but it was mind-blowing. And if I gave you GPT-3 now I just plugged that into ChatGPT for you and you started using it, you'd be like, "What is this thing?" It's like mess. Flop, flop. I had the same experience when I first got into a Waymo, your very first ride, at least my very first ride, my first 10 seconds in a Waymo, it starts driving and you're like, "Oh my God, watch out for that bike." You're holding onto whatever you can. And then five minutes in, you've calmed down and you realize that you're getting driven around the city without a driver and it's working. You're just like, "Oh my God, I am living in the future right now." And then another 10 minutes, you're bored, you're doing email on your phone, answering Slack messages, and suddenly this miracle of human invention is just an expected part of your life from then on. And there is really something in the way that we all are adapting to AI that's kind of like that. These miraculous things happen and computers can do something they've never been able to do before and it blows our mind collectively for a week and then we're like, oh, yeah. Oh, now it's just machine learning on its way to being an algorithm.
**Lenny:** 你刚才说的最疯狂的一点其实是——我不知道——ChatGPT,现在用起来感觉好差。3.5 是几年前的事了,想象一下几年后的生活会是什么样。我们会谈到事情的走向、你觉得下一个大的飞跃是什么。但我想从你在 OpenAI 旅程的起点开始。你在 Twitter 工作过,在 Facebook 工作过,在 Planet 工作过,Instagram。某个时间点你被招募去 OpenAI 工作。我很好奇那个故事是怎样的——加入 OpenAI 担任 CPO 的招聘过程。有什么有趣的故事吗?
**Lenny:** The craziest thing about what you just shared actually is, I don't know, ChatGPT, which is now feels terrible. 3.5 was a couple years ago, and imagine what life will be like in a couple years from now. We're going to get to that, where things are going, what you think is going to be the next big leap. But I want to start with the beginning of your journey at OpenAI. So you worked at Twitter, you worked at Facebook, you worked at Planet, Instagram. At some point you got recruited to go and come work at OpenAI. I'm curious just what that story was like of the recruiting process of joining OpenAI as CPO. Is there any fun stories there?
**Kevin Weil:** 如果我没记错时间线的话,我们在 Planet 那边沟通了我要离开,我本来打算休息一段时间。我不会完全停止工作,但我也乐意享受一下夏天。大概是四月左右。我想着:好,我要跟孩子们过一个夏天。我们去 Tahoe 什么的,我真正能陪陪他们,而不是像往常那样来来回回忙。然后 Sam 和我互相认识了很多年,虽然不是非常熟,他总是参与那么多有趣的事——做核聚变的公司之类的。所以他一直是那种当我开始考虑下一步时会打电话聊聊的人,因为我喜欢做大的、技术驱动的、下一波浪潮的事情。所以我打了电话给他,我记得 Vinod 也帮忙重新牵了线。这次他不是说:"哦,你应该去跟那些做核聚变的人聊聊。" 他说:"其实我们在考虑一件事,你应该来跟我们谈谈。" 我说:"好啊,听起来太棒了。来吧。"
然后事情进展得非常快,非常非常快。我在几天内见了管理团队的大部分人,他们跟我说:"听着,我们基本上想多快就多快。如果你跟所有人聊了,大家都喜欢你,你就可以了。" Sam 来我家吃晚饭,我们一起度过了一个很好的晚上,聊 OpenAI、聊未来、更深入地了解彼此。最后我就在想——我本来第二天要去做更大一轮面试——然后 Sam 说:"嘿,进展得很好,我们很兴奋。" 我说:"好。那我怎么理解明天?" 他说:"哦,你会没事的。别担心。如果顺利的话,基本就成了。"
所以第二天我去了,见了一堆人,很开心。我真的喜欢遇到的每一个人。任何面试里你都会事后自我怀疑——哦我不该说那个、那个问题我回答得不好我想重来。但我出来的时候觉得——我想应该还不错。我预期那个周末基本就会有消息,因为他们大概这样设了预期:如果顺利就可以了。然后我什么都没收到。然后周一、周二、周三,还是什么都没有。我联系了 OpenAI 那边的人几次——还是没有。我就想:"天哪,我搞砸了。我不知道在哪搞砸的,但我完全搞砸了。不敢相信。"
我跑去跟 Elizabeth——我太太——说:"我做了什么?你觉得我哪里……" 完全抓狂了。然后还是什么都没有。最后大概九天之后,他们终于回复了我。原来是内部有一堆事情在发生,这个那个什么的,就是有一百万件事在忙。他们终于说:"哦对,那次挺好的。来吧。" 我说:"哦,好,行,来吧。"
但那是九天的煎熬。他们只是内部有事特别忙,而我每一天都在焦虑,反复回想面试过程中的每一句话。
**Kevin Weil:** If I'm remembering the timeline right, we communicated at Planet I was leaving and I was planning to just go take some time. I wasn't going to stop working, but I was also happy to take the summer. This was maybe April or something. I was like, cool, I'm going to have the summer with my kids. We're going to go to Tahoe or something and I'll actually get to hang out rather than what I usually do going up and down and all that. And then Sam and I had known each other lightly for a bunch of years and he's always involved in so many interesting things like companies building fusion and all these things. So he'd always been somebody that I would call occasionally if I was starting to think about my next thing because I like working on big tech forward, sort of next wave kind of things. And so I called him and I think Vinod also helped to put us in touch again. And this time it wasn't like, "Oh, you should go talk to these guys working on fusion." He said, "Actually, we're thinking about something, you should come talk to us." I was like, "Okay, that sounds amazing. Let's do it." And it goes really fast, really, really fast. I met most of the management team in a brief period of time, a few days, and they were telling me, 'Look, we're basically going to move as fast as we want to move. And if you talk to everyone, everyone likes you, you're ready to go." Sam came over for dinner and we had a great evening together just talking about OpenAI in the future and getting to know each other better. And at the end I was like, I was going to go in the next day for a bigger round of interviews and Sam was saying, "Hey, it's going really well. We're really excited." And I said, "Cool. So how do I think about tomorrow?" And he said, "Oh, you'll be fine. Don't worry about it. And if it goes well, we're basically there." And so I go in the next day, meet a bunch of people, have a great time. I really enjoyed everybody I met with. In any interview, you can always second guess yourself like, oh, I shouldn't have said that thing or that thing I gave a bad answer on I wish I could redo, but I came away feeling like I think that went pretty well. And I was expecting to hear that weekend basically because they sort of set expectations as soon as if this goes well, we're ready to go. And I didn't hear anything. And then it was like Monday, Tuesday, Wednesday, I still didn't hear anything and I reached out to folks on the OpenAI side a couple of times, still nothing. And I was like, "Oh my God, I screwed it up. I don't know where I screwed it up, but I totally screwed it up. I can't believe it." And I was going back to Elizabeth, my wife and being like, "What did I do? Where do you think I..." Getting all crazy about it and then it's still nothing. And finally it was like nine days later, they finally got back to me and it turned out there was a bunch of stuff happening internally and this, that and the other thing, and there's just a million things happening. And they finally were like, "Oh yeah, that went well. Let's do this." And I was like, "Oh, okay, cool, let's do it." But it was nine days of agony and they were just super busy on some internal stuff and there I was fretting every single day and re-going over every line of our interview process.
**Lenny:** 这让我想到你在跟一个人约会,你发了短信然后什么都没收到,你就假设出了什么问题。
**Lenny:** It makes me think about when you're dating someone and you've texted them and you're not hearing anything back, you assume something is wrong.
**Kevin Weil:** 对,完全是。他们可能只是忙。到现在想起来还是觉得不容易。
**Kevin Weil:** Yeah, totally. They might just be busy. I have a hard time about it still.
**Lenny:** 太离谱了。我很高兴最终成了。我想这里的教训是——别急着下结论。
**Lenny:** That's wild. I love that it worked out. And I guess the lesson there is don't jump to conclusions.
**Kevin Weil:** 对。淡定一点。
**Kevin Weil:** Yeah. Have a little bit of chill.
**Lenny:** 说到这个,我想聊聊身处这场风暴中心是什么感觉。你之前工作的都是——姑且说是传统公司,虽然它们也不那么传统——Twitter、Instagram、Facebook、Planet。现在你在 OpenAI。我很好奇,OpenAI 日常工作中最大的不同是什么?
**Lenny:** Speaking of that, I want to chat about what it's just like to be inside the center of the storm. Again, you work at a lot of, let's say traditional companies even though they're not that traditional, Twitter and Instagram and Facebook and Planet, and now you work at OpenAI. I'm curious, what is most different about how things work in your day-to-day life at OpenAI?
**Kevin Weil:** 我觉得大概是节奏吧。也许有两点。一是节奏。二是——我之前工作的每一个地方,你都大致知道你在什么技术上做开发。所以你花时间想的是:你在解决什么问题?你在为谁做?你怎么让他们的生活变好?这个问题够不够大,能不能改变习惯?人们在不在乎这个问题被解决?所有那些好的产品思考。但你做开发所依托的技术基本是固定的。你说的是数据库什么的,你今年用的数据库大概比两年前好 5%。但 AI 完全不是这样。每隔两个月计算机就能做到之前做不到的事,你必须完全换思路。这从根本上很有意思,让这里的工作很好玩。
还有一点——我们以后可能会聊 eval——但在这个世界里…… 我们过去对计算机的所有认知都是关于给计算机非常确定的输入。比如 Instagram,有一些按钮做特定的事,你知道它们会做什么。当你给计算机确定的输入,你得到确定的输出。你确信如果做同样的事三次,你会得到同样的输出三次。LLM 完全不一样。它们擅长模糊的、微妙的输入。人类语言和沟通的各种细微之处,它们都还不错。而且它们不会真正给你相同的答案。你大概会对同一个问题得到精神上相同的回答,但肯定不是每次都是同一组词。所以它是更模糊的输入和更模糊的输出。当你做产品的时候,你试图围绕某个使用场景去构建——如果模型 60% 的时候答对,你做的产品和模型 95% 答对时完全不同,又和 99.5% 答对时不同。所以你必须深入你的使用场景、eval 什么的,才能理解应该做什么样的产品。这从根本上是不同的。如果你的数据库能工作一次,它每次都能工作。但在这个世界里不是这样。
**Kevin Weil:** I think it's probably the pace. Maybe it's two things. One is it's the pace. The second is everywhere I've ever worked before this, you kind of know what technology you're building on. So you spend your time thinking about what problems are you solving? Who are you building for? How are you going to make their lives better? How are you going to... Is this a big enough problem that you're going to be able to change habits? Do people care about this problem being solved? All those good product things. But the stuff that you're building on is kind of fixed. You're talking about databases and things and I bet the database you used this year is probably 5% better than the database you used two years ago, but that's not true at all with AI. It's like every two months computers can do something they've never been able to do before and you need to completely think differently about what you're doing. There's something fundamentally interesting about that makes life fun here. There's also something we will maybe talk about evals later, but it also really, in this world of... Everything we're used to with computers is about giving a computer very defined inputs. If you look at Instagram for example, there are buttons that do specific things and you know what they do. And then when you give a computer defined inputs, you get very defined outputs. You're confident that if you do the same thing three times, you're going to get the same output three times. LLMs are completely different than that. They're good at fuzzy subtle inputs. Then all the nuances of human language and communication, they're pretty good at. And also they don't really give you the same answer. You probably get spiritually the same answer for the same question, but it's certainly not the same set of words every time. And so you're much more, it's fuzzier inputs and fuzzier outputs. And when you're building products, it really matters whether there's some use case that you're trying to build around. If the model gets it right 60% of the time, you build a very different product than if the model gets it right 95% of the time versus if the model gets it right 99.5% of the time. And so there's also something that you have to get really into the weeds on your use case and the evals and things like that in order to understand the right kind of product to build. So that is just fundamentally different. If your database works once, it works every time. And that's not true in this world.
**Lenny:** 让我们顺着 eval 这条线继续聊。我肯定想谈这个。我们在 Lenny & Friends 峰会上有那场传奇的 panel——你、Mike Krieger 和 Sarah Guo 做主持人。
**Lenny:** Let's actually follow this thread on evals. I definitely wanted to talk about this. We had this legendary panel at the Lenny & Friends Summit. It was you and Mike Krieger and Sarah Guo moderating.
**Kevin Weil:** 那很好玩。
**Kevin Weil:** That was fun.
**Lenny:** 太好玩了。那个 panel 中让人们印象最深的一个点,是你说的一句话:写 eval 将成为产品经理的核心技能。我觉得这可能不只适用于产品经理。很多人知道 eval 是什么。很多人完全不知道我在说什么。所以你能简单解释一下什么是 eval,然后说说为什么你觉得这对未来做产品的人如此重要?
**Lenny:** So fun. The thing that I heard that kind of stuck with people from that panel was a comment you made where you said that writing evals is going to become a core skill for product managers, and I feel like that probably applies further than just product managers. A lot of people know what evals are. A lot of people have no idea what I'm talking about. So could you just briefly explain what is an eval and then just why do you think this is going to be so important for people building products in the future?
**Kevin Weil:** 好,当然。我觉得最简单的理解方式是——它几乎就是给模型的一次测验,一个测试,看它对某组主题材料掌握得如何,或者它对某组问题的回答有多好。就像你上了微积分课然后参加微积分考试,看你有没有学会该学的东西。你有 eval 来测试模型在创意写作上有多好、在研究生级别科学上有多好、在竞赛编程上有多好。所以你有这些 eval 基本上作为模型有多聪明或多有能力的基准(benchmark)。
**Kevin Weil:** Yeah, sure. I think the easiest way to think about it is almost like a quiz for a model, a test to gauge how well it knows a certain set of subject material or how good it is at responding to a certain set of questions. So in the same way you take a calculus class and then you have calculus tests that see if you've learned what you're supposed to learn. You have evals that test how good is the model at creative writing? How good is the model at graduate level science? How good is the model at competitive coding? And so you have these set of evals that basically perform as benchmarks for how smart or capable the model is.
**Lenny:** 简单来说,可以理解为模型的单元测试(unit test)吗?
**Lenny:** Is it a simple way to think about it, like unit tests for model?
**Kevin Weil:** 对,单元测试,总体上就是模型的测试。完全可以这么理解。
**Kevin Weil:** Yeah, unit tests, tests in general for models. Totally.
**Lenny:** 好的。那为什么这对那些不太明白 eval 到底是怎么回事的人来说如此关键?为什么它是构建 AI 产品的核心?
**Lenny:** Great, great. Okay. And then why is this so important for people that don't totally understand what the hell's going on here with evals? Why is this so key to building AI products?
**Kevin Weil:** 这就回到我刚才说的。你需要知道你的模型是否能…… 有些事情模型 99.95% 的时间会答对,你可以放心。有些它会 95% 正确,有些 60%。如果模型在某件事上只有 60% 正确,你做产品的方式就完全不同。而且这些东西也不是静态的。eval 的一个重要部分是——如果你知道你在为某个使用场景而构建。比如我们的 Deep Research 产品——这可能是我最喜欢的我们发布过的东西之一。这个想法是——对于没有用过 Deep Research 的人来说——你现在可以给 ChatGPT 一个任意复杂的查询。它不是从搜索查询中给你返回一个答案——那个我们也能做。而是——给它一个问题,如果你自己来回答,你得去网上读两个小时的资料,然后你可能还要读一些论文,然后你开始写你的想法,发现思考中有一些空白,于是你又去做更多研究。你可能需要一个星期才能写出一个 20 页的回答。你可以让 ChatGPT 帮你干 25、30 分钟。不是你习惯的即时回答,但它可能工作 25、30 分钟,做的事你要花一个星期。
所以在我们构建这个产品的时候,我们在设计 eval 的同时也在思考这个产品要怎么工作。我们在经历标杆使用案例(hero use cases):这是你想能问的问题,这是这个问题的绝佳回答。然后把这些变成 eval,然后在这些 eval 上爬坡(hill climbing)。所以不只是模型是静态的、我们希望它在某些事上表现还行——你可以教模型,你可以让这成为一个持续学习的过程。当我们在为 Deep Research 微调(fine-tuning)模型来回答这些问题时,我们能够测试——它在我们说的那些重要衡量指标上是否在变好?当你开始看到这个,看到 eval 表现在上升,你就会说:"好,我觉得我们有产品了。"
**Kevin Weil:** Well, it gets back to what I was saying. You need to know whether your model is going to... There are certain things that models will get right. 99.95% of the time and you can just be confident. There are things that they're going to be 95% right on and things they're going to be 60% right on. If the model's 60% right on something, you're going to need to build your product totally differently. And by the way, these things aren't static either. So a big part of evals is if you know you're building for some use case. So let's take our deep research product, which is one of my favorite things that we've released maybe ever. The idea is with deep research for people who haven't used it, you can give ChatGPT now an arbitrarily complex query. It's not about returning you an answer from a search query, which we can also do. It's here's a thing that if you were going to answer it yourself, you'd go off and do two hours of reading on the web and then you might need to read some papers and then you would come back and start writing up your thoughts and realize you had some gaps in your thinking. So you go out and do more research. It might take you a week to write some 20 page answer to this question. You can let ChatGPT just like chug for you for 25, 30 minutes. It's not the immediate answers you're used to, but it might go work for 25, 30 minutes and do work that would've taken you a week. So as we were building that product, we were designing evals at the same time as we were thinking about how this product was going to work and we were trying to go through hero use cases. Here's a question you want to be able to ask. Here's an amazing answer for that question. And then turning those into evals and then hill climbing on those evals. So it's not just that the model is static and we hope it does okay on a certain set of things, you can teach the model. You can make this a continuous learning process. And so as we were fine-tuning our model for deep research to be able to answer these things, we were able to test is it getting better on these evals that we said were important measures of how the product was working? And it's when you start seeing that and you start seeing performance on evals going up, you start saying, "Okay, I think we have a product here."
**Lenny:** 你说过一个类似的观点:AI 能有多厉害,几乎被我们写 eval 的能力所限制。这对你有共鸣吗?还有什么想法?
**Lenny:** You made a kind of a comment along these same lines around evals that AI is almost capped in how amazing it can be by how good we are at evals. Does that resonate? Any more thoughts along those lines?
**Kevin Weil:** 我的意思是——这些模型是智能体,而智能是如此根本地多维度的。你可以说一个模型在竞赛编程上很厉害,但这不一定意味着它在前端编程上也很好——也不一定意味着它在后端编程上好,或者在把 COBOL 写的代码转成 Python 上好。而这还只是软件工程领域内的。所以我觉得你可以把这些模型看作极其聪明、事实知识很丰富的智能体,但世界上大部分数据、知识、流程不是公开的。它们在公司或政府或其他机构的围墙后面。就像你要加入一家公司,你会花头两周做入职培训。你会学习公司特有的流程,获取公司特有的数据。模型足够聪明,你可以教它们任何东西,但它们需要有原始数据来学习。
所以我觉得未来真的会是——极其聪明的、宽泛的基础模型,通过公司特定或使用场景特定的数据进行微调和定制,这样它们在公司特定或使用场景特定的事情上就能表现得非常好。而你会用自定义的 eval 来衡量这些。所以我说的意思就是——这些模型真的很聪明,你仍然需要教它们那些不在训练集里的东西,而有大量的使用场景不会在训练集里,因为它们只跟某一个行业或某一家公司相关。
**Kevin Weil:** I mean, these models are their intelligences and intelligence is so fundamentally multidimensional so you can talk about a model being amazing at competitive coding, which may not be the same as that model being great at front-end coding- ... may not be the same as that model being great at front-end coding or back-end coding or taking a whole bunch of code that's written in COBOL and turning it into Python. And that's just within the software engineering world. So I think there's a sense in which you can think of these models as incredibly smart, very factually aware intelligences, but still most of the world's data, knowledge, process is not public. It's behind the walls of companies or governments or other things. And same way, if you were going to join a company, you would spend your first two weeks onboarding. You'd be learning the company-specific processes. You'd get access to company-specific data. The models are smart enough, you can teach them anything, but they need to have the raw data to learn from. So there's a sense in which I think the future is really going to be incredibly smart, broad-based models that are fine-tuned and tailored with company-specific or use case-specific data so that they perform really well on company-specific, or use case-specific things. And you're going to measure that with custom evals. So what I was referring to is just like these models are really smart, you need to still teach them things if the data's not in their training set, and there's a huge amount of use cases that are not going to be in their training set because they're relevant to one industry or one company.
**Lenny:** 我就顺着你引导的话题继续问,但我后面会回来的,因为关于这些东西我还有更多问题。你刚才说到了一个我觉得很多 AI 创业者都在想的问题:OpenAI 未来不会来碾压我的领域在哪里?或者其他基础模型公司。对很多人来说不清楚:"我该不该在这个领域创业?" 你有什么建议或指引吗——你觉得 OpenAI,或者说基础模型公司一般来说不太可能去做什么,哪里有创业机会?
**Lenny:** I'm just going to keep following the thread that you're leading us down, but I'm going to come back because I have more questions around some of these things. So you came to a space that I think a lot of AI founders are thinking about is just, where's OpenAI not going to come squash me in the future? Or one of the other foundational models. So it's unclear to a lot of people just like, "Should I build a startup in this space or not?" Is there any advice you have or any guidance for where you think OpenAI, or just foundational models in general likely won't go and where you have an opportunity to build a company?
**Kevin Weil:** 这是 Ev Williams 在 Twitter 时常说的一句话,一直留在我脑子里。他说:"不管你的公司变得多大,不管你的人有多厉害,公司外面聪明人的数量永远远超公司里面的。" 这就是为什么我们如此专注于做一个很好的 API。我们有 300 万开发者在用我们的 API。无论我们有多雄心勃勃、长得多大——顺便说一句我们也不想长得特别大——世界上有太多的使用场景和地方,AI 可以从根本上让生活变好。我们不会有那么多人。我们不会有那个专业知识去做其中大部分事情。而且就像我说的,数据是行业特定的、使用场景特定的、在某些公司围墙后面。在世界上的每一个行业和每一个垂直领域,都有巨大的机会去构建基于 AI 的产品来提升现状。我们不可能自己做到这一切。我们也不想。如果我们想也做不了。我们非常兴奋能为 300 多万开发者——以及未来更多的开发者——提供动力。
**Kevin Weil:** So this is something that Ev Williams used to say back at Twitter that's always stuck with me, which is, "No matter how big your company gets, no matter how incredible the people are, there are way more smart people outside your walls than there are inside your walls." And that's why we are so focused on building a great API. We have 3 million developers using our API. No matter how ambitious we are, how big we grow, by the way, we don't want to grow super big, there are so many use cases, places in the world where AI can fundamentally make our lives better. We're not going to have the people. We're not going to have the know-how to build most of these things. And I think, like I was saying, the data is industry-specific, use case-specific, behind certain company walls, things like that. And there are immense opportunities in every industry and every vertical in the world to go build AI-based products that improve upon the state of the art. And there's just no way we could ever do that ourselves. We don't want to. We if we did want to, and we're really excited to power that for 3 million-plus developers and way more in the future.
**Lenny:** 回到你前面说的关于技术在不断变化、越来越快、你在发布的时候不完全知道模型会有多强——我很好奇,是什么让你们能够快速且持续地交付这么好的东西?听起来一个答案是自下而上赋能的团队,而不是一个很自上而下的、规划好一个季度的路线图。还有哪些因素让你们能这么快、这么频繁地出好东西?
**Lenny:** Coming back to your earlier point about the tech changing constantly and getting faster, not exactly knowing what you'll have by the time you launch something in terms of the power, the model. I'm curious what allows you to ship quickly and consistently in such great stuff? And it sounds like one answer is bottoms-up empowered teams versus a very top-down roadmap that's planned out for a quarter. What are some of those things that allow you to ship such great stuff so often, so quickly?
**Kevin Weil:** 嗯,我们试着对我们要去哪里有个大方向感,让自己指向某个方向,这样有个大致的对齐。从主题上来说——我一秒钟都不…… 我们确实做季度路线图。我们制定了一年的战略。我一秒钟都不相信我们写在这些文档里的东西就是我们三个月后真正会发布的东西,更别说六个月或九个月后。但没关系。我觉得这就像 Eisenhower 的那句话:"计划是无用的。做计划是有益的。" 我完全认同这个观点,尤其在这个世界。
这真的很有价值。比如季度路线图,有一个时刻让你停下来想:"好,我们做了什么?什么有用?什么进展好?什么不好?我们学到了什么?现在我们觉得接下来要做什么?" 顺便说,每个人都有一些依赖。你需要基础设施团队做某些事,需要跟研究那边合作。所以你需要有一刻来检查依赖关系、确保没问题,然后开始执行。我们尽量让这个过程非常轻量,因为它反正不会完全正确。我们大概会在做到一半的时候推翻它,因为我们会学到新东西。所以做规划那个时刻是有益的,即使它只部分正确。
我觉得就是要接受你会非常敏捷,写三个月路线图没什么意义,更别说一年的,因为技术在你脚下变化太快了。我们真的非常强调自下而上,前提是在我们整体方向对齐的框架下。我们有很优秀的人。我们有工程师、PM、设计师、研究员,他们对自己做的产品充满热情、有很强的观点,而且他们也是在做这些东西的人。所以他们对能力有真实的感知,这非常重要。所以我觉得你要在这种方式下更加自下而上。我们就这样运作。
我们乐于犯错。我们一直在犯错。这是我真的很欣赏 Sam 的一点。他非常使劲推动我们快速行动,但他也理解——快速行动就会带来,这个没做对、那个东西发了但没 work、我们会回滚。看我们的命名。我们的命名太烂了。
**Kevin Weil:** Yeah. I mean, we try and have a sense of where we're trying to go, point ourselves in a direction so that we have some rough sense of alignment. Thematically, I don't for second, and we do quarterly roadmapping. We laid out a year-long strategy. I don't for a second believe that what we write down in these documents is what we're going to actually ship three months from now, let alone six or nine. But that's okay. I think it's like an Eisenhower quote, "Plans are useless. Planning is helpful," which I totally subscribe to, especially in this world. It's really valuable. If you think about quarterly road roadmapping for example, it's really valuable to have a moment where you stop and go, "Okay. What did we do? What worked? What went well? What didn't go well? What did we learn and now what do we think we're going to do next?" And by the way, everybody has some dependencies. You need the infrastructure team to do the following things, partnership with research here. So you want to have a second to check your dependencies, make sure you're good to go and then start executing. We try and keep that really lightweight because it's not going to be right. We're going to throw it out halfway because we will have learned new things. So the moment of planning is helpful even if it's only partially. So I think just expecting that you're going to be super agile and that there's no sense writing a three month roadmap, let alone a year long roadmap because the technology's changing underneath you so quickly. We really do try and go very strongly bottoms up, subject to our overall directional alignment. We have great people. We have engineers and PMs and designers and researchers who are passionate about the products they're building and have strong opinions about them and are also the ones building them. So they have a real sense of what the capabilities are too, which is super important. So I think you want to be more bottoms up in this way. So we operate that way. We are happy making mistakes. We make mistakes all the time. It's one of the things I really appreciate about Sam. He pushes us really hard to move fast, but he also understands that with moving fast comes, we didn't quite get this right or that we launched this thing, it didn't work. We'll roll it back. Look at our naming. Our naming is horrible.
**Lenny:** 是的,有很多人问到这个问题。
**Lenny:** That was a lot of questions people had for you.
**Kevin Weil:** 模型名字,对。确实烂透了,我们知道,我们总会在某个时候修好它的,但它不是最重要的事,所以我们不花太多时间在上面。但这也说明了——其实不重要。ChatGPT 是历史上最流行、增长最快的产品,是第一的 AI、API 和模型。所以显然命名没那么要紧。而且我们管东西叫 o3 mini high 之类的。
**Kevin Weil:** Model names, yeah. It's absolutely atrocious and we know it, and we will get around to fixing it at some point, but it's not the most important thing and so we don't spend a lot of time on it. But it also shows you how it doesn't matter. Again, ChatGPT the most popular, fastest growing product in history, it's the number one AI, API and model. So clearly it doesn't matter that much. And we name things like o3 mini high.
**Lenny:** 天啊,我爱死了。好的,你谈了路线图、自下而上。我真的很好奇——有没有一种节奏或者仪式来跟你或 Sam 对齐?或者你会审核所有要发出去的东西?每周或每月有一个会议来看发生了什么吗?
**Lenny:** Man, I love it. Okay. So you talked about roadmapping and bottoms up and I'm really curious, is there a cadence or a ritual of aligning with you or Sam or you review everything that's going out? Is there a meeting every week or every month where you guys see what's happening?
**Kevin Weil:** 关键项目是有的。我们做产品评审(product review)之类的,跟你预想的差不多。没有固定仪式,因为没有…… 我绝对不想让我们发布什么东西的时候被堵住——就因为等着跟我或 Sam 的评审排不上。如果我在出差或者 Sam 在忙什么的——那不该成为我们不发布的理由。所以对于最大、最高优先级的东西,我们有一个比较紧密的节奏,但我们真的尽量不——坦白说。我们想赋能团队快速行动,我觉得发布并迭代更重要。
所以我们有一个理念叫迭代部署(iterative deployment)。想法是——我们都在一起学习这些模型。所以在某种意义上,发布一些东西——即使你不知道它的全部能力——然后在公开场景下一起迭代,是好得多的做法。我们跟社会其他部分一起共同进化(co-evolve),一起了解这些东西在哪些方面不同、哪些好、哪些坏、哪些奇怪。我很喜欢这个理念。
我觉得另一个成为我们产品理念一部分的东西是"模型最大主义"(model maximalism)这个概念。模型不完美。它们会犯错。你可以花大量时间围绕它们构建各种脚手架(scaffolding)。顺便说,有时候我们确实会这么做,因为有些类型的错误你就是不想犯。但我们不会花太多时间去给那些不属于这类情况的部分做脚手架,因为我们的基本心态是:两个月后会有更好的模型,它会碾压现在的这些局限性。所以如果你在做开发——我们也这样跟开发者说——如果你做的产品刚好踩在模型能力的边界上,继续做,因为你方向是对的。再过几个月模型就会变得很好,你那个勉强能用的产品就真的会起飞了。这就是你确保自己真的在推动前沿、在做新东西的方式。
**Kevin Weil:** On key projects. So we do product reviews and things like that, like you would expect. There isn't a ritual because there isn't... I would never want us to be blocked on launching something, waiting for a review with me or Sam, if we can't get there. If I'm traveling or Sam's busy or whatever, that's a bad reason for us not to ship. So obviously for the biggest, most high priority stuff, we have a pretty close beat on it, but we really try not to, frankly. We want to empower teams to move quickly, and I think it's more important to ship and iterate. So we have this philosophy, we call iterative deployment, and the idea is we're all learning about these models together. So there's a real sense in which it's way better to ship something even when you don't know the full set of capabilities and iterate together in public. And we co-evolve together with the rest of society as we learn about these things and where they're different and where they're good and bad and weird. I really like that philosophy. I think the other thing that ends up being a part of our product philosophy is the sense of model maximalism. The models are not perfect. They're going to make mistakes. You could spend a lot of time building all kinds of different scaffolding around them. And by the way, sometimes we do because sometimes there are kinds of errors that you just don't want to make, but we don't spend that much time building scaffolding around the parts that don't match that because our general mindset is in two months there's going to be a better model and it's going to blow away whatever the current set of limitations are. So if you're building, and we say this to developers too, if you're building and the product that you're building is right on the edge of the capabilities of the models, keep going, because you're doing something right because you give it another couple months and the models are going to be great, and suddenly the product that you have that just barely worked is really going to sing. And that's how you make sure that you're really pushing the envelope and building new things.
**Lenny:** 我之前在播客上采访了 Bolt 的创始人——公司名字叫 StackBlitz——他分享了这个故事:他们在幕后做了七年,一直失败,什么都没发生。然后突然之间——抱歉提到竞争对手——Claude 出来了,或者说 Sonnet 3.5 出来了,突然之间一切都 work 了。他们一直在做,终于成了。我在 YC 也经常听到这种故事——以前从来不可能的东西,现在每隔几个月随着模型更新就变得可能了。
**Lenny:** I had the founder of Bolt on the podcast, StackBlitz is the company name, and he shared this story that they've been working on this product for seven years behind the scenes and it was failing. Nothing was happening. And then all of a sudden it was, sorry to mention a competitor, but Claude came out or a Sonnet 3.5 came out and all of a sudden everything worked and they've been building all this time and finally it worked. And I hear that a lot with YC, just like things that never were possible now are just becoming possible every few months with the updates to the models.
**Kevin Weil:** 是的,绝对是。
**Kevin Weil:** Yeah, absolutely.
**Lenny:** 让我问一下——我本来没打算问这个——但我很好奇你有没有什么快速想法:为什么 Sonnet 在编程方面这么好?以及你们的东西在实际编程上追上和超过它的想法?
**Lenny:** Let me actually ask this, I wasn't planning to ask this, but I'm curious if you have any quick thoughts just why is Sonnet so good at coding, and thoughts on your stuff getting as good and better at actual coding?
**Kevin Weil:** 嗯,向 Anthropic 致敬。他们做了非常好的编程模型。毫无疑问。我们认为我们也能做到。也许这个播客发出来的时候我们会有更多东西可说,但不管怎样,完全的 credit 给他们。我觉得智能真的是多维度的,所以我认为模型提供方们…… 以前 OpenAI 有巨大的模型领先优势,比所有人领先 12 个月什么的。现在不是了。我愿意认为我们仍然有领先。我会说我们确实有,但肯定不是很大的领先了。这意味着会有不同的地方——Google 的模型很好,或者 Anthropic 的模型很好,或者我们很好,然后竞争对手就想:"我们得在这方面赶上。"
而且实际上,一旦有人证明某件事是可能的,在那个方向上变好比在丛林中开辟一条全新的路要容易得多。就像一个例子——以前没人能跑进 4 分钟一英里,然后终于有人做到了,第二年又有 12 个人做到了。我觉得这种事到处都在发生,这意味着竞争非常激烈,消费者、开发者和企业都会从中大大受益。这也是这个行业发展这么快的原因之一。但完全尊重其他大模型提供商。模型正在变得非常好。我们会尽可能快地前进,我觉得我们有一些好东西在路上。
**Kevin Weil:** Yeah. I mean, kudos to Anthropic. They've built very good coding models. No doubt. We think that we can do the same. Maybe by the time this podcast has shipped, we'll have more to say, but either way, all credit to them. I think intelligence is really multi-dimensional and so I think the model providers... It used to be that OpenAI had this massive model lead, 12 months or something ahead of everybody else. That's not true anymore. I like to think we still have a lead. I'd argue that we do, but it's certainly not a massive one. And that means that there are going to be different places where the Google models are really good or where Anthropic models are really good, or where we're really good and our competitors are like, "We got to get better at that." And it actually is easier to get better at a certain thing once someone's proved it possible than it is to forge a path through the jungle and doing something brand new. So I just think as an example, it was like nobody could break 4 minutes in the mile, and then finally somebody did and the next year 12 more people did it. I think there's that all over the place and it just means that competition is really intense, and consumers are going to win and developers are going to win and businesses are going to win in a big way from that. It's part of why the industry moves so fast, but all respect to the other big model providers. Models are getting really good. We're going to move as fast as we can and I think we've got some good stuff coming.
**Lenny:** 令人兴奋。这也让我想到——在很多方面其他模型在某些事上更好,但 ChatGPT 不知怎么就是…… 如果你看所有的知名度和使用数据,不管你们在排名中处于什么位置,人们似乎就是把 AI 和 ChatGPT 当成同义词。你觉得你们做对了什么,至少到目前为止在消费者心智和全球知名度上赢了?
**Lenny:** Exciting. This makes me also think about, in many ways other models are better at certain things, but somehow ChatGPT is the... If you look at all the awareness numbers and usage numbers, it's like no matter where you guys are in the rankings, people seem to just think of AI ChatGPT almost as the same. What do you think you did right to win in the consumer mindset, at least at this point and awareness in the world?
**Kevin Weil:** 我觉得先发有帮助——这也是我们这么专注于快速行动的原因之一。我们喜欢第一个发布新能力。像 Deep Research 这样的东西。我们的模型能做很多事。它们可以接收实时视频输入,有语音到语音(speech to speech),可以语音转文字和文字转语音。它们能做 Deep Research。它们能在 Canvas 上操作,能写代码。所以 ChatGPT 能成为一个一站式平台,所有你想做的事都可以。随着我们往前走,我们有更多代理式(agentic)的工具——比如 Operator,它在替你浏览网页、在网上替你做事——越来越多地,你可以来 ChatGPT 这一个地方,给它指令,让它在现实世界中为你完成实际的事情。这有根本性的价值。所以我们在这方面想得很多。我们尽量快速行动,让我们始终是人们最有用的首选。
**Kevin Weil:** I think being first helps, which is one of the reasons why we're so focused on moving quickly. We like being the first to launch new capabilities. Things like deep research. Our models, they can do a lot of things. So they can take real-time video input, you have speech to speech, you can do speech to text and text to speech. They can do deep research. They can operate on a canvas, they can write code. So ChatGPT can be this one- stop-shop where all the things that you want to do are possible. And as we go forward in it, we have more agentic tools like Operator where it's browsing for you and doing things for you on the web, more and more you're going to be able to come to this one place to ChatGPT, give it instructions and have it accomplish real things for you in the world. There's something fundamentally valuable in that. So we think a lot about that. We try to move really fast so that we are always the most useful place for people to come to.
**Lenny:** 你在构建 AI 产品或在 OpenAI 工作之后,学到的最反直觉的东西是什么?就是——"我没料到这个。"
**Lenny:** What would you say is the most counterintuitive thing that you've learned after building AI products or working at OpenAI, something that's just like, "I did not expect that?"
**Kevin Weil:** 我不知道,也许我本该预见到。但让我觉得有趣的一件事是:当你试图弄清楚一个产品应该如何跟 AI 配合工作,或者甚至为什么某个 AI 的行为是那样的,你往往可以用你理解人类的方式来推理,而且管用。
举几个例子吧。当我们第一次发布推理模型(reasoning model)的时候——我们是第一个做出能推理的模型的。它不是对你问的每个问题立刻给一个系统一式的快速回答——"神圣罗马帝国第三任皇帝是谁,这是答案。" 你可以问它难的问题,它会推理。就像如果我让你做一个填字游戏,你不能瞬间填完。你会想:"好,这个横排,我觉得可能是这两个中的一个,但那意味着这里有个 A,所以那个必须是这个——等等,回溯,一步一步往上建。" 你回答任何困难的逻辑问题或科学问题都是这样的。
所以这个推理突破很大。但它也是第一次模型需要坐下来思考。这对消费产品来说是个奇怪的范式。你通常不会有一个东西,你问完问题后需要等 25 秒。所以我们试图弄清楚——这个的 UI 该是什么样的?对于 Deep Research,模型要想 25 分钟——其实没那么难,因为你不会坐在那看 25 分钟。你会去做别的事,去另一个标签页或者去吃午饭什么的,回来就做好了。但 20、25 秒——或者 10 秒——等起来很长,但又不够长到去做别的事。
所以你可以想想,如果你问了我一个需要我思考 20 秒才能回答的问题,我会怎样?我不会沉默不语、关机 20 秒然后回来。所以我们也不该那样做——不该就摆一个进度条在那。那很烦人。但我也不会开始疯狂地说出每一个想法。所以我们大概也不该把整个思维链(chain of thought)在模型思考时全部暴露出来。但我可能会说:"好问题。嗯,让我想想。" 可能那样来,然后思考。你可能会给一些小的更新。这实际上就是我们最终发布的版本。
还有类似的情况——你会发现有时候让一组模型都去尝试同一个问题,然后有一个模型看所有的输出、整合、给你最终一个答案——这样你能得到更好的思考。我的意思是——这听起来有点像头脑风暴。当我跟别人一起在房间里头脑风暴的时候,我确实有更好的想法,因为他们的思维方式跟我不同。
所以就是有各种各样的情况,你可以像理解一群人或一个人那样来推理它——而且管用。我不知道,也许我不该惊讶,但我确实惊讶了。
**Kevin Weil:** I don't know, maybe I should have expected this, but one of the things that's been funny for me is the extent to which you're trying to figure out how some product should work with AI, or even why some AI thing happens to be true, you can often reason about it the way you would reason about another human and it works. So maybe a couple examples. When we were first launching our reasoning model, we were the first to build a model that could reason, that could, instead of giving you just a quick system one answer right away to every question you asked, it was the third Emperor of the Holy Roman Empire, here's an answer. You could ask it hard questions and it would reason. The same way that if I asked you to do a crossword puzzle, you couldn't just snap fill in everything. You would be, "Well, okay. On this one across, I think it could be one of these two, but that means there's an A here. So that one has to be this, away, back track, step-by-step build up from where you are." Same way you answer any difficult logistical problem, any scientific problem. So this reasoning breakthrough was big, but it was also the first time that a model needed to sit and think. And that's a weird paradigm for a consumer product. You don't normally have something where you might need to hang out for 25 seconds after you ask a question. So we were trying to figure out what's the UI for this? With deep research where the model's going to go and think for 25 minutes sometimes, it's actually not that hard because you're not going to sit and watch it for 25 minutes. You're going to go do something else. You're going to go to another tab or go get lunch or whatever, and then you'll come back and it's done when it's like 20, 25 seconds or 10 seconds, it's a long time to wait, but it's not long enough to go to do something else. So you can think, if you asked me something that I needed to think for 20 seconds to answer, what would I do? I wouldn't just go mute and not say anything and shut down for 20 seconds and then come back. So we shouldn't do that. We shouldn't just have a slider sitting there. That's annoying. But I also wouldn't just start babbling every single thought that I had. So we probably shouldn't just expose the whole chain of thought as the model's thinking, but I might go like, "That's a good question. All right." I might approach it like that and then think. You're maybe giving little updates and that's actually what we ended up shipping. You have similar things where you can find situations where you get better thinking sometimes out of a group of models that all try and attack the same problem, and then you have a model that's looking at all their outputs and integrating it and then giving you a single answer at the end. I mean, sounds a little bit like brainstorming. I certainly have better ideas when I get in a room and brainstorm with other people because they think differently than me. So anyways, there's just all these situations where you can actually reason about it like a group of humans or an individual human and it works, which I don't know, maybe I shouldn't have been surprised but I was.
**Lenny:** 这太有意思了。因为当我看到这些模型工作时,我从来没想过你们是在设计那个体验。对我来说感觉就是——这是 LLM 做的事,它就在那告诉我它在想什么。我很喜欢你说的这个点——让它感觉像一个人在工作。那人是怎么工作的呢?嗯,他们大声想。他们思考,"这是我该探索的方向。" 而 Deep Research 是这个的极端版本——它们就是告诉你"这是我正在做和想的所有事情。" 而人们其实也喜欢这样。这让你惊讶吗——"也许这也行得通,人们似乎喜欢看到所有东西"?
**Lenny:** That is so interesting because when I see these models operate, I never even thought about you guys designing that experience. To me, it just feels like this is what the LLM does. It just sits there and tells me what it's thinking. And I love this point you're making of let's make it feel like a human operating and well, how does a human operate? Well, they just talk aloud. They think, here's the thing I should explore. And I love that deep sequence to the extreme of that where they're just like, "Here's everything I'm doing and thinking." And people actually like that too, I guess. Was that surprising to you, "Maybe that could work too. People seem to like everything?"
**Kevin Weil:** 是的。实际上我们从中学到了东西。因为我们第一次发布的时候,只给你模型在想什么的小标题,不太多。然后 DeepSeek 发布了——信息量非常大。我们就想,我不确定每个人都想要那个。看模型在想什么有一定新鲜感。我们内部看的时候也有这种感觉——看模型的思维链挺有意思的。但在 4 亿人的规模上——我不觉得你想看模型絮絮叨叨说一堆东西。
所以我们最终做的是用有意思的方式总结它。不只是给小标题,而是给一两句话关于它怎么思考的,你可以从中学习。所以我们试图找一个中间地带,我们觉得这个体验对大多数人是有意义的——但给每个人三段话大概不是正确答案。
**Kevin Weil:** Yeah. We learned from that actually because when we first launched it, we gave you the subheadings of what the model was thinking about, but not much more. And then deep seek launched and it was a lot and we went, I don't know if everyone wants that. There's some novelty effect to seeing what the model's really thinking about. We felt that too when we were looking at it internally. It's interesting to see the model's chain of thought, but it's not... I think at the scale of 400 million people, you don't want to see the model babble a bunch of things. So what we ended up doing was summarizing it in interesting ways. So instead of just getting the subheadings, you're getting one or two sentences about how it's thinking about it and you can learn from that. So we tried to find a middle ground that we thought was an experience would be meaningful for most people, but showing everybody three paragraphs is probably not the right answer.
**Lenny:** 这让我想起你在峰会上说的另一件事,一直留在我脑子里。就是关于 chat——人们总是嘲笑说 chat 不是我们跟 AI 交互的未来界面。但你提了一个很有意思的反面观点:作为人类,我们通过对话来交流,一个人的智商可以从很低到很高,但说话都管用。Chat 也是一样的,它能在各种智能水平上工作。也许我刚才已经说了,但关于为什么 chat 最终成为 LLM 如此有趣的界面——还有什么想补充的?
**Lenny:** This reminds me of something else you said at the summit that has really stuck with me, this idea that chat, people always make fun of chat is not the future interface for how we interact with AI, but you made this really interesting point that may argue the other side, which is, as humans we interface by talking and the IQ of a human can span from really low to really high and it all works talking to them and chat is the same thing and it can work on all kinds of intelligence levels. Maybe I just shared it, but I guess anything there about just why chat actually ends up being such an interesting interface for LLMs?
**Kevin Weil:** 是的。我不知道,也许这是我相信但大多数人不信的事之一。我确实认为 chat 是一个很棒的界面,因为它如此通用。人们倾向于说:"Chat。嗯,我们会想出更好的东西。" 我觉得它非常普适,因为这就是我们说话的方式。我能像现在这样口头跟你交谈。我们能看到彼此并互动。我们可以在 WhatsApp 上发消息。但所有这些都是这种非结构化的沟通方式——这就是我们运作的方式。如果我跟你交谈时有某种更刚性的界面限制我能用什么,我能跟你聊的东西就少得多,它实际上会阻碍我们达到最大的沟通带宽。
所以这有某种魔力。顺便说,过去这从来不好使,因为没有一个模型能够理解人类语言的所有复杂性和细微之处——这就是 LLM 的魔法。所以对我来说,它是一种精确匹配这些模型能力的界面。这并不意味着我总是想打字——但你确实想要那种非常开放的、灵活的沟通媒介。也许是我们在说话、模型在说回来。但你仍然想要那种最低公分母的、无限制的交互方式。
**Kevin Weil:** Yeah. I don't know, maybe this is one of those things I believe that most people don't believe, but I actually think chat is an amazing interface because it's so versatile. People tend to go, "Chat. Yeah. We'll figure out something better." And I think it's incredibly universal because it is the way we talk. I can talk to you verbally like we're talking now. We can see each other and interact. We can talk on WhatsApp and be texting each other, but all of these things is this unstructured method of communication and that's how we operate. If I had some more rigid interface that I was allowed to use when we spoke, I would be able to speak to you about far fewer things and it would actually get in the way of us having maximum communication bandwidth. So there's something magical. And by the way, in the past it never worked because there wasn't a model that was good at understanding all of the complexity and nuances of human speech, and that's the magic of LLMs. So to me, it's like an interface that's exactly fit to the power of these things. And that doesn't mean that it always has to be just like I don't necessarily always want to type, but you do want that very open-ended, flexible communication medium, it may be that we're speaking and the model's speaking back to me, but you still want the very lowest common denominator, no restrictions way of interacting.
**Lenny:** 这太有意思了。你那个观点真的改变了我思考这些东西的方式——chat 就是这么适合与超级智能对话这个非常具体的问题。
**Lenny:** That is so interesting. That's really changed the way I think about this stuff is that point that chat is just so good for this very specific problem of talking to superintelligence basically.
**Kevin Weil:** 顺便说一下,我觉得也不是只有 chat。如果你有高频使用场景,它们更规定性(prescribed),你实际上不需要那么全面的通用性——有很多使用场景里,有一个不那么灵活、更规定性、更快到达特定任务的东西更好。那些也很棒,你可以做各种各样的。但你仍然想要 chat 作为基准(baseline),来承接任何超出你所构建的垂直场景的东西。它是一个万能容器——承接你可能想对模型表达的任何东西。
[广告段落]
[广告段落]
**Kevin Weil:** By the way, I think it's not that it's only chat either. If you have high volume use cases where they're more prescribed and you don't actually need the full generality, there are many use cases where it's better to have something that's less flexible, more prescribed, faster to specific task, and those are great too, and you can build all sorts of those. But you still want chat as this baseline for anything that falls out of whatever vertical you happen to be building for. It's like a catch-all for every possible thing you'd ever want to express to a model.
**Lenny:** 我想回到你之前提到的研究人员和产品团队的关系。我想象中很多创新来自研究人员的某种直觉,然后做出惊人的东西再发布出去,也有一些想法来自 PM 和工程师。这些团队是怎么协作的?每个团队都有 PM 吗?是研究驱动的东西居多吗?给我们讲讲,想法和产品主要是从哪里来的。
**Lenny:** I want to come back to that you talked about researchers and their relationship with product teams. I imagine a lot of innovation comes from researchers just like having an inkling and then building something amazing and then releasing it, and some ideas come from PMs and engineers. How do those teams collaborate? Does every team have a PM? Is it a lot of research-led stuff? Give us a sense of just where ideas and products come from mostly.
**Kevin Weil:** 这是我们在大力改进的领域。坦白说,我对此非常兴奋。如果你回到几年前,ChatGPT 刚起步的时候——显然那时我还不在 OpenAI——显然我还不在 OpenAI,但……我们当时更像是一家纯研究公司。ChatGPT,你还记得吧,当时只是一个低调的研究预览。
**Kevin Weil:** It's an area where we're evolving a lot. I'm really excited about it, frankly. I think if you go back a couple of years when ChatGPT was just getting started, obviously, I wasn't in OpenAI, but... Obviously I wasn't an Open AI, but... We were more of a pure research company at the time. Chat GPT, if you remember, was a low-key research preview.
**Lenny:** 对,很多年了。
**Lenny:** For many years.
**Kevin Weil:** 它不是团队推出时就觉得会成为这种大规模产品的东西。哦,ChatGPT,对。它只是我们让人们试用和迭代模型的一种方式。所以我们主要是一家研究公司,一家世界级的研究公司。随着 ChatGPT 的增长,随着我们建立了 B2B 产品、API 和其他东西,现在我们比以前更像一家产品公司了。但我仍然认为——OpenAI 不应该永远是一家纯产品公司。我们需要同时是世界级的研究公司和世界级的产品公司,两者需要真正协同合作,而这是我觉得我们过去六个月做得越来越好的事情。如果你把这两件事分开——研究人员去做惊人的东西、构建模型,然后模型到了某种状态,产品和工程团队再去拿来做点什么——那我们实际上就只是自己模型的 API 消费者。但最好的产品,就像我之前讲 Deep Research 时说的,是大量的迭代反馈。是理解你要卖的产品或要解决的问题,为它们构建 eval,用这些 eval 去收集数据、微调(fine-tune)模型,让模型在你要解决的这些用例上变得更好。要做好这件事需要大量的来回协作。我认为最好的产品是工程、产品、设计和研究作为一个团队来共同构建新颖的东西。所以这实际上是我们试图用基本上所有产品来做的方式。这对我们来说是新的肌肉,因为我们作为产品公司还比较新,但人们对此非常兴奋,因为我们发现每次这样做,都能做出很棒的东西,所以现在每个产品都是这样起步的。
**Kevin Weil:** Yeah. It wasn't a thing that the team launched thinking it was going to be this massive product. Oh, Chat GPT. Yeah. And it was just a way that we were going to let people play with and iterate on the models. So we were primarily a research company, a world-class research company, and as ChatGPT has grown and as we've built our B-to-B products and our APIs and other things, now we're more of a product company than we were. I still think we can't... Open AI should never be a pure product company. We need to be both a world-class research company and a world-class product company, and the two need to really work together, and that's the thing that I think we've been getting much better at over the last six months. If you treat those things separately and the researchers go do amazing things and build models and then they get to some state and then the product and engineering teams go take them and do something with them, we're effectively just an API consumer of our own models. The best products though are going to be, it's like I was talking about with deep research, it's a lot of iterative feedback. It's understanding the products you're trying to sell or the problems you're trying to solve, building evals for them, using those evals to go gather data and fine-tune models to get them to be better at these use cases that you're looking to solve. It's a huge amount of back and forth to do it well. And I think the best products are going to be ENG product design and research working together as a single team to build novel things. So that's actually how we're trying to operate with basically anything that we build. It's a new muscle for us because we're kind of new as a product company, but it's one that people are really excited about because we've seen every time we do it, we build something awesome, and so now every product starts like that.
**Lenny:** OpenAI 有多少 PM?不知道你能不能分享这个数字,如果可以的话。
**Lenny:** How many product managers do you have at Open AI? I don't know if you share that number, but if you do.
**Kevin Weil:** 其实不多。我不确定,大概 25 个?也许再多一点。我个人的信念是,作为一个组织,你应该保持相对少的 PM。我带着爱说这话,因为我自己就是 PM,但太多 PM 会造成问题。我们会用方案和想法填满世界,而不是执行。所以我觉得当一个 PM 服务的工程师稍微多了一点时,这是好事——因为这意味着他们不会去微观管理,你会把很多影响力和责任留给工程师去做决策。这意味着你需要非常有产品意识的工程师,我们很幸运拥有这样的团队。我们有一个非常有产品感、高 agency 的工程团队。当你有了这样的配置,你有一个感觉超级被赋能的团队,有一个 PM 在真正试图理解问题、温和地引导团队,但同时事情太多以至于无法深入太多细节,最终你就能走得非常快。所以这就是我们的理念。我们要产品工程负责人和贯穿始终的产品型工程师。我们不要太多 PM,但要非常优秀、高质量的 PM,到目前为止这似乎运转得很好。
**Kevin Weil:** Not that many, actually. I don't know, 25. Maybe it's a little more than that. My personal belief is that you want to be pretty PM light as an organization just in general. I say this with love because I am a PM, but too many PMs causes problems. We'll fill the world with decks and ideas versus execution. So I think it's a good thing when you have a PM that is working with maybe slightly too many engineers because it means they're not going to get in and micromanage. You're going to leave a lot of influence and responsibility with the engineers to make decisions. It means you want to have really product-focused engineers, which we're fortunate to have. We have an amazingly product focused, high agency engineering team. But when you have something like that, you have a team that feels super empowered, you have a PM that's trying to really understand the problems and gently guide the team a little bit but has too much going on to get too far into the details, and you end up being able to move really fast. So that's kind of the philosophy we take. We want Product ENG leads and product engineers all the way through. We want not too many PMs, but really awesome, high quality ones, and so far that seems to be working pretty well.
**Lenny:** 我想象在 OpenAI 做 PM 对很多人来说是梦想成真。同时,我觉得它不适合很多人。有研究人员参与,有很有产品思维的工程师。你在招 PM 时看重什么?对于那些想着"也许我不应该去那儿,我甚至不该想这事"的人来说。
**Lenny:** I imagine being a PM at Open AI is a dream come true for a lot of people. At the same time, I imagine it's not a fit for a lot of people. There's researchers involved, very product minded engineers. What do you look for in the PMs that you hire there for folks that are like, "Maybe I shouldn't go work there. I shouldn't even think about that."
**Kevin Weil:** 我觉得——我说过好几次了——高 agency 是我们非常看重的东西。就是那种不会进来等着所有人允许他做某事的人,他们会看到一个问题然后直接去做。这就是我们工作方式的核心部分。我觉得还需要能接受模糊性(ambiguity)的人,因为这里有大量的模糊性。这不是那种——我们有时候跟比较初级的 PM 会有麻烦,就是因为这个——这里不是那种有人会过来说"好,这是全局,这是你的领域,我要你去做这件事"的地方。那是你作为早期职业 PM 想要的东西。我的意思是,这里没人有时间,问题也太不成形了,我们都是边做边想明白的。所以高 agency,非常适应模糊性,准备好来了就帮忙执行、快速推进。这就是我们的配方。我觉得还要乐于通过影响力来领导,因为……作为 PM 这很正常嘛,人们不直接汇报给你,你的团队不直接汇报给你,等等。但你还有研究职能的复杂性,这个更加自主导向,所以和研究团队建立好的关系非常重要。我觉得情商那一面对我们来说也超级重要。
**Kevin Weil:** I think, I've said this a few times, but high agency is something that we really look for, people that are not going to come in and wait for everyone else to allow them to do something, they're just going to see a problem and go do it. It's just a core part of how we work. I think people that are happy with ambiguity, because there is a massive amount of ambiguity here, it is not the kind of place, and we have trouble sometimes with more junior PMs because of this, because it's just not the place where someone is going to come in and say, "Okay, here's the landscape, here's your area, I want you to go do this thing." And that's what you want as an early career PM. I mean, no one here has time and the problems are too ill-formed and we're figuring them all out as we go. And so high agency, very comfortable with ambiguity, ready to come in and help execute and move really quickly. That's kind of our recipe. And I think also happy leading through influence because... I mean it's usual as a PM, people don't report to you, your team doesn't report to you, et cetera, but you also have the complexity of a research function, which is even more sort of self-directed and it's really important to build a good rapport with the research team. I think the EQ side of things is also super important for us.
**Lenny:** 我知道在大多数公司,PM 进来的时候大家就会说"我们为什么需要你?"作为 PM 你必须赢得信任,让人们看到价值。我感觉在 OpenAI 这可能是极端版本,他们会说"我们为什么需要这个人?我们有研究人员,有工程师,你来这里要做什么?"
**Lenny:** I know at most companies, a PM comes in and they're just like, "Why do we need you?" And as a PM you have to earn trust and help people see the value, and I feel like at Open AI it's probably a very extreme version of that where they're like, "Why do we need this person? We have researchers, engineers, what are you going to do here?"
**Kevin Weil:** 对,我觉得人们如果做对了会很感激,但你需要带着人们一起走。我觉得一个 PM 能做好的最重要的事之一是果断。这里有一条很细的线。你不想——我是说,我不是一直都喜欢"PM 是产品的 CEO"这种说法,但就像 Sam 在他的角色中——如果他在参加的每个会议中都做每一个决定,他会犯错。如果他在任何会议中都不做决定,他也会犯错,对吧?关键是理解什么时候该听团队的、让人们去创新,什么时候有一个决定需要做出——要么人们不舒服做,要么不觉得被授权做,或者一个决定有太多分散在大群人中的不同利弊,需要有人果断做出选择——这是 CEO 非常重要的特质。Sam 在这方面做得很好,这也是 PM 在更微观层面非常重要的特质。所以因为有这么多模糊性,在很多情况下答案并不明显,所以有一个 PM 能够进来——顺便说,这不一定非得是 PM,如果是其他人我也完全高兴——但我一般看 PM 的标准是,如果有模糊性而且没人在做决定,你最好确保我们做出决定并且继续往前走。
**Kevin Weil:** Yeah, I think people appreciate it done right, but you bring people along. I think one of the most important things a PM can do well is be decisive. So there's a real fine line. You don't want to be making... I mean it's kind of like, I don't love the PM as the CEO of the product illusion all the time, but just like Sam in his role would be making mistakes if he made every single decision in every meeting that he was in. And he would also be making mistakes if he made no decisions in any meetings that he was in, right? It's understanding when to defer to your team and to let people innovate. And when there is a decision to be made that people either don't feel comfortable with or don't feel empowered to make, or a decision that has too many different disparate pros and cons that are spread out across a big group and someone needs to be decisive and make a call, it's a really important trait of a CEO. It's something Sam does well, and it's also a really important trait of a PM kind of at a more microscopic level. So because there's so much ambiguity, it's not obvious what the answer is in a lot of cases, and so having a PM that can come in and... And by the way, this doesn't need to be a PM, I'm perfectly happy if it's anybody else, but I kind of look to the PM to say, if there's ambiguity and no one's making a call, you better make sure that we get a call made and we move forward.
**Lenny:** 这涉及到我做过的几篇文章——AI 会在哪些方面取代我们的工作 vs 在哪些方面辅助我们。让我从另一个方向来问这个问题——AI 如何影响产品团队和招聘之类的事情。首先,关于 LLM 帮我们写代码有很多讨论。Anthropic 的 Dario 说一年内 90% 的代码会由 AI 编写。同时你们在疯狂招工程师、疯狂招 PM。每个职能都"死了",但你们每个职能都在招。我想先问一下,你和团队——比如工程师、PM——在工作中怎么使用 AI?有没有什么特别有意思的,或者你觉得人们在日常使用 AI 方面忽略了什么?
**Lenny:** This touches on a few posts I've done of just, where is AI going to take over work that we do versus help us with various work? So let me come at this question from a different direction of just how AI impacts product teams and hiring, things like that. So first of all, there's all this talk of LM's doing our coding for us, and 90% of code is going to be written by AI in a year. Dario at Anthropic said that. At the same time, you guys are all hiring engineers like crazy, PM's like crazy. Every function is dead, but you're still hiring every single one. I guess just, first of all, let me just ask this, how do you and the team, say engineers, PMs, use AI in your work? Is there anything that's really interesting or things that you think people are sleeping on in how you use AI in your day-to-day work?
**Kevin Weil:** 我们用得很多。我们每个人都一直在用 ChatGPT 做总结文档、用它帮忙写文档、用写产品规格的 GPT 之类的,所有你能想到的。说到写 eval,你实际上可以用模型帮你写 eval,而且它们做得相当好。话虽如此,我对我们——我真的是说对我自己——还是挺失望的。如果我把五年前在其他公司做产品领导时的我瞬移到现在的日常工作中,我还是能认出来。而我觉得我们应该处于——肯定一年后,甚至可能现在——我几乎认不出来的状态,因为工作流程完全不同了,我在大量使用 AI,但今天我还是能认出来。所以从某种意义上说,我做得还不够好。举个例子,我们为什么不到处 vibe coding 演示呢?与其在 Figma 里展示东西,我们应该展示人们在 30 分钟内 vibe coding 出来的原型,来说明概念验证和探索想法。这在今天完全可以做到,但我们做得不够多。实际上,我们的首席人力官 Julia 前几天跟我说,她 vibe coded 了一个她在上一家公司有的内部工具——她很想在 OpenAI 也有这个工具——她打开了 Windsurf 还是什么,就 vibe coded 出来了。这有多酷?如果我们的首席人力官都在这么做,我们没有借口不多做。
**Kevin Weil:** We use it a lot. I mean, every one of us is in Chat GPT all the time summarizing docs, using it to help write docs with GPTs that write product specs and things like that, all the stuff that you would imagine. I mean talk about writing evals, you can actually use models to help you write evals and they're pretty good at it. That all said, I'm still sort of disappointed by us, and I really mean me, in, if I were to just teleport my five-year-old self leading product at some other company into my day job, I would recognize it still. And I think we should be in a world, certainly a year from now, probably even more now, where I almost wouldn't recognize it because the workflows are so different and I'm using AI so heavily, and I'd still recognize it today. So I think in some sense, I'm not doing a good enough job of that. Just to give an example, why shouldn't we be vibe coding demos right, left and center? Instead of showing stuff in Figma, we should be showing prototypes that people are vibe coding over the course of 30 minutes to illustrate proofs of concept and to explore ideas. That's totally possible today, and we're not doing it enough. Actually, our chief people officer, Julia, was telling me the other day, she vibe coded an internal tool that she had at a previous job that she really wanted to have here at Open AI and she opened, I don't know, Windsurf or something, and vibe coded it. How cool is that? And if our chief people officer is doing it, we have no excuse to not be doing it more.
**Lenny:** 这是一个很棒的故事。有些人可能没听过 vibe coding 这个词。你能描述一下它是什么意思吗?
**Lenny:** That's an awesome story. And some people may not have heard this term vibe coding. Can you describe what that means?
**Kevin Weil:** 对,我觉得这是 Andrej 的词。
**Kevin Weil:** Yeah, I think this was Andrej's term.
**Lenny:** Karpathy。
**Lenny:** Karpathy.
**Kevin Weil:** 对,Andrej Karpathy。你有 Cursor 和 Windsurf 和 GitHub Copilot 这些工具,它们非常擅长建议你可能想写的代码。你可以给它们一个 prompt,它们会写代码,然后当你去编辑时,它在建议你可能想做什么。所有人一开始用这些东西的方式是——给它一个 prompt,让它做东西,你去编辑,给它一个 prompt,你基本上一直在和模型来回交互。随着模型越来越好,随着人们越来越习惯,你可以有点放开方向盘。当模型在建议东西时,就是一直按 tab、tab、tab、tab、tab。继续。是、是、是、是、是。当然模型会犯错,或者它做了什么东西编译不过,但当它编译不过时,你把错误粘贴进去说,继续、继续、继续、继续、继续。然后你测试一下它有一个地方做的不是你想要的,所以你输入一个指令说,继续、继续、继续、继续、继续,你就让模型做它的事。这不是说你今天会用这种方式来写需要非常严谨的生产代码,但对于很多事情——你想做概念验证,你想做演示——你真的可以把手从方向盘上拿开,模型会做得很好,这就是 vibe coding。
**Kevin Weil:** Yeah. Andrej Karpathy. Yeah. So you have these tools like Cursor and Windsurf and GitHub Copilot that are very good at suggesting what code you might want to write. So you can give them a prompt and they'll write code and then as you go to edit it, it's suggesting what you might want to do. And the way that everyone started using that stuff was, give it a prompt, have it do stuff, you go edit it, give it a prompt, and you're kind of really going back and forth with the model the whole time. As the models are getting better and as people are getting more used to it, you can kind of just let go of the wheel a little bit. And when the model's suggesting stuff, it's just like, tap, tap, tap, tap, tap. Keep going. Yes, yes, yes, yes, yes. And of course the model makes mistakes or it does something that doesn't compile, but when it doesn't compile, you paste the error in and you say, go, go, go, go, go. And then you test it out and it does one thing that you don't want it to do, so you enter in an instruction and say, go, go, go, go, go, and you just let the model do its thing. And it's not that you would do that for production code that needed to be super tight today yet, but for so many things, you're trying to get to a proof of concept, you're getting to a demo and you can really take your hands off the wheel and the model will do an amazing job, and that's vibe coding.
**Lenny:** 这是一个很棒的解释。我觉得它的进阶版本,我想 Andrej 描述的时候也提到你是在跟模型说话——有个像 Whisper 或 Super Whisper 之类的步骤,你是在和模型对话,甚至不是打字。
**Lenny:** That's an awesome explanation. I think the pro version of that, which is, I think, the way Andre even described it as you talk, there's a step like whisper or super whisper or something like that where you're talking to the model, not even typing.
**Kevin Weil:** 对,完全是。天哪。
**Kevin Weil:** Yeah, totally. Oh man.
**Lenny:** 让我问一下,当你看未来的产品团队——你说过你们应该更多地这样做——用原型代替设计——你觉得产品团队的结构或组建方式最大的变化会是什么?你觉得未来几年事情会往哪个方向走?
**Lenny:** So let me just ask, I guess, when you look at product teams in the future, you talked about how you guys should be doing this more, instead of designs, having prototypes, what do you think might be the biggest changes in how product teams are structured or built? Where do you think things are going in the next few years?
**Kevin Weil:** 我觉得你肯定会生活在一个每个产品团队都内嵌研究人员的世界。而且我不仅仅是指基础模型公司(foundation model companies),因为我觉得未来——坦白说,有一件事让我对我们行业整体感到惊讶,就是微调模型(fine-tuned models)的使用没有更广泛。很多人——这些模型非常好,所以我们的 API 能做很多事情做得很好,但当你有特定用例时,你总是可以通过微调让模型在特定用例上表现更好。这可能只是时间问题。大家还没有完全习惯在每种情况下都这样做。但对我来说,毫无疑问这就是未来。模型将无处不在,就像晶体管(transistors)无处不在一样,AI 将成为我们所做一切的结构性组成部分。但我觉得会有很多微调模型,因为你为什么不想针对特定用例更具体地定制模型呢?所以我觉得你会想要半研究员半机器学习工程师类型的人作为几乎每个团队的一部分,因为微调模型将成为构建大多数产品的核心工作流的一部分。这是一个变化,你可能已经在基础模型公司开始看到了,随着时间会传播到更多团队。
**Kevin Weil:** I think you're definitely going to live in a world where you have researchers built into every product team. And I don't even mean just at foundation model companies because I think the future... Actually, frankly one thing that I'm sort of surprised about about our industry in general is that there's not a greater use of fine-tuned models. A lot of people... These models are very good, so our API does a lot of things really well, but when you have particular use cases, you can always make the model perform better on a particular use case by fine-tuning it. It's probably just a matter of time. Folks aren't quite comfortable yet with doing that in every case. But to me, there's no question that that's the future. Models are going to be everywhere just like transistors are everywhere, AI is going to be just a part of the fabric of everything we do, but I think there are going to be a lot of fine-tuned models because why would you not want to more specifically customize a model against a particular use case? And so I think you're going to want sort of quasi researcher machine learning engineer types as part of pretty much every team because fine-tuning a model is just going to be part of the core workflow for building most products. So that's one change that maybe you're starting to see at foundation model companies that will propagate out to more teams over time.
**Lenny:** 我好奇有没有一个具体的例子能让这变得更实际。我说一个你讲话时我想到的——当你看 Cursor 和 Windsurf,我从那些创始人那里学到的是,他们用 Sonnet,但他们也有一堆自定义模型在边缘帮忙,让那些不仅仅是生成代码的特定体验变得更好——比如自动补全和预判你要往哪走。这算一个吗?或者有其他例子说明——什么是微调模型?你觉得团队有了研究人员会用这些做什么?
**Lenny:** I'm curious if there's a concrete example that makes that real, and I'll share one that comes to mind as you talk, which is, when you look at Cursor and Windsurf, something I learned from those founders is that they use a Sonnet, but then they also have a bunch of custom models that help along the edges that make the specific experience that's not just generating code even better like auto-complete and looking ahead to where things are going. So is that one or any other examples of which you... What is a fine-tuned model? Do you think teams will be building with these researchers on their teams?
**Kevin Weil:** 对。当你微调一个模型时,你基本上是给模型一大堆你想让它变得更擅长的东西的示例。就是"这是一个问题,这是一个好的回答。这是一个问题,这是一个好的回答。"或者"这是一个问题,这是一个好的回答,重复一千次或一万次。"然后突然之间你就在教模型在那个特定事物上比它开箱即用时好得多。我们内部到处都在用。我们在内部使用模型集成(ensemble)的程度比人们想象的要多得多。所以不是"我有 10 个不同的问题,我就拿基线 GPT-4o 来问这些问题。"如果我们有 10 个不同的问题,我们可能用 20 个不同的模型调用来解决它们,其中一些用的是专门的微调模型,用的是不同大小的模型——因为你对不同问题可能有不同的延迟要求或成本要求。它们可能对每个问题都用了定制的 prompt。基本上你想教模型在——你想把问题分解成更具体的任务,而不是某些更宽泛的高层级任务。然后你可以非常有针对性地使用模型,让每个单独的任务都做得非常好。然后你有一个集成来处理整个事情。我觉得很多好公司今天都在这样做。我还是看到很多公司给模型单一的、通用的、宽泛的问题,而不是把问题分解。我觉得会有更多的问题分解,用特定模型做特定事情,包括微调。
**Kevin Weil:** Yeah. I mean, so when you're a model, you're basically giving the model a bunch of examples of the kinds of things you want it to be better at. So it's, "Here's a problem, here's a good answer. Here's a problem, here's a good answer," Or, "Here's a question, here's a good answer times a thousand or 10,000." And suddenly you're teaching the model to be much better than it was out of the gate at that particular thing. We use it everywhere internally. We use ensembles of models much more internally than people might think. So it's not, "I have 10 different problems. I'll just ask baseline GPT four oh about a bunch of these things." If we have 10 different problems, we might solve them using 20 different model calls, some of which are using specialized fine-tuned models, they're using models of different sizes because maybe you have different latency requirements or cost requirements for different questions. They are probably using custom prompts for each one. Basically you want to teach the model to be really good at... You want to break the problem down into more specific tasks versus some broader set of high level tasks. And then you can use models very specifically to get very good at each individual thing. And then you have an ensemble that tackles the whole thing. I think a lot of good companies are doing that today. I still see a lot of companies giving the model single, generic, broad problems versus breaking the problem down, and I think there will be more breaking the problem down using specific models for specific things, including fine tuning.
**Lenny:** 所以在你们的情况下——因为这真的很有趣——你们在用不同层次的 ChatGPT,比如 o1、o3 和更早的东西因为更便宜。
**Lenny:** And so in your case, because this is really interesting, is that you're using different levels of Chat GPT, like a 1 0 3 and stuff that's earlier because it's cheaper.
**Kevin Weil:** 我们的内部栈会有这样的部分。我给你举个例子。客服——有 4 亿多周活用户,我们收到大量的工单。我不知道我们有多少客服人员,但不多,30、40 个,我不确定,比任何同等规模的公司都要少得多。这是因为我们自动化了很多流程。大多数问题我们都用内部资源、知识库、我们回答问题的指南、什么样的个性等等来处理。你可以教模型这些东西,然后让它自动回答大量问题。或者在它没有完全信心回答某个特定问题时,它仍然可以建议一个答案,请求人类审查,然后那个人类的回答实际上就是对模型的一种微调数据。你在告诉它某个特定情况下的正确答案。我们在各种地方都在用——有些地方你想要多一些推理(reasoning),延迟不是特别敏感,所以你想要多一些推理,我们会用 O 系列模型。其他地方你想快速检查一下什么东西,所以用 4o mini 就行,超快超便宜。总的来说就是——特定模型做特定目的,然后你把它们集成在一起解决问题。顺便说,这和我们人类解决问题的方式也很像。一家公司其实就是一个模型的集成,所有模型都基于我们在大学学了什么、我们在职业生涯中学到了什么进行了微调。我们都被微调出了不同的技能集,你把它们以不同的配置组合在一起,集成的输出比任何单个个体的输出都好得多。
**Kevin Weil:** There'll be parts of our internal stack. I'll give you an example. Customer support, with 400 plus million weekly active users, we get a lot of inbound tickets. I don't know how many customer support folks we have, but it's not very many, 30, 40, I'm not sure, way smaller than you would have at any comparable company, and it's because we've automated a lot of our flows. We've got most questions using our internal resources, knowledge base, guidelines for how we answer questions, what kind of personality, et cetera. You can teach the model those things and then have it do a lot of its answers automatically, or where it doesn't have the full confidence to answer a particular question, it can still suggest an answer, request a human to look at it and then that human's answer actually is its own sort of fine tuning data for the model. You're telling it the right answer in a particular case. We're using... At various places. Some of these places, you want a little bit more reasoning, is not super latency sensitive, so you want a little more reasoning, and we'll use one of our O series models. In other places, you want a quick check on something and so you're fine to use four oh mini, which is super fast and super cheap. In general, it's like specific models for specific purposes and then you ensemble them together to solve problems. By the way, again, not unlike how we as humans solve problems, a company is arguably an ensemble of models that have all been fine tuned based on what we studied in college and what we have learned over the course of our careers. We've all been fine tuned to have different sets of skills and you group them together in different configurations and the output of the ensemble is much better than the output of any one individual.
**Lenny:** Kevin,你把我震住了。这听起来完全正确。而且不同的人——你付他们更少的钱,他们交流成本更低,有些人回答问题需要很长时间,有些人会产生幻觉(hallucinating)。这——
**Lenny:** Kevin, you're blowing my mind. That sounds exactly correct. And also, different people, you pay them less, they cost less to talk to, some people take a long time to answer, some people hallucinating. This is...
**Kevin Weil:** 我跟你说吧。
**Kevin Weil:** I'm telling you.
**Lenny:** 这是一个心智模型(mental model),但在思考上真的管用——哦对。是的。这太好了。有些人是视觉型的,他们想画出自己的思考,有些人想用语言——文字细胞(word cell)。哇,这是一个非常好的隐喻。所以再回到你的建议——因为我很喜欢我们绕回来了——你找到了一种非常好的方式来思考如何设计好的 AI 体验和 LLM——我猜具体来说就是思考一个人会怎么做这件事。
**Lenny:** This is a mental model but really does work in thinking... Oh, right. Yeah. This is great. Some people are visual, they want to dry out their thinking, some people want to talk word cell. Wow, this is a really good metaphor. So again, coming back to your advice here because I love that we circled back to it, you're finding a really good way to think about how to design great AI experiences and LMs, I guess, specifically is think about how a person would do this.
**Kevin Weil:** 嗯,答案可能不总是思考人会怎么做,但有时候为了获得如何解决问题的直觉,你会思考一个等价的人在那些情况下会怎么做,用这个来至少获得对问题的不同视角。
**Kevin Weil:** Well, it's maybe not always the answer is to think about how a person would do it, but sometimes to gain intuition for how you might solve a problem, you think about what an equivalent human would do in those situations and use that to at least gain a different perspective on the problem.
**Lenny:** 哇,这太好了。因为这里面很多确实是在和模型对话。有很多先前的经验可以参考,因为我们每天都在和其他人说话,在各种各样的情况下遇到他们,所以有很多可以从中学习的。好,说到人,我想聊聊未来。你有三个孩子,一个社区成员问了我一个很搞笑的问题,我觉得这是很多人在想的。Patrick [听不清 01:04:47]——我在 Airbnb 跟他共事过。他说,问他在鼓励孩子们学什么来为未来做准备。他说我担心我 6 岁的孩子到 2036 年会面临很大竞争,试图进入顶级的屋顶维修或管道工程项目,需要一个备选方案。
**Lenny:** Wow, this is great. Because some of this really is talking to a model. There's a lot of prior art because we talk to other humans all the time and encounter them in all sorts of different situations, and so there's a lot to learn from that. Okay, so speaking of humans, I want to chat about the future a little bit. So you have three kids, and a community member asked me this hilarious question that I think it's something a lot of people are thinking about. So this is Patrick [inaudible 01:04:47]. I worked with him at Airbnb. He says ask what he's encouraging his kids to learn to prepare for the future. I'm worried my 6-year-old by the year 2036 will face a lot of competition trying to get into the top roofing or plumbing programs and need a backup plan.
**Kevin Weil:** 这很搞笑。我们的孩子——我们有一个 10 岁的和 8 岁的双胞胎——所以他们还挺小的。他们是 AI 原住民这件事很惊人。对他们来说自动驾驶汽车完全正常。他们可以整天和 AI 对话。他们跟 ChatGPT 和 Alexa 还有其他一切东西有完整的对话。我不知道,谁知道未来会怎样?我觉得像编程技能这样的东西会在很长时间内相关,谁知道呢?但我觉得如果你教你的孩子保持好奇心、独立、有自信,你教他们如何思考——我不知道未来会怎样——但我觉得这些在未来的任何一种配置中都会是重要的技能。所以不是说我们有所有答案,但这就是 Elizabeth 和我思考我们孩子的方式。
**Kevin Weil:** That's funny. So our kids, we have a 10 year old and eight year old twins, so they're still pretty young. It's amazing how AI native they are. It's completely normal to them that there are self-driving cars. That they can talk to AI all day long. They have full conversations with Chat GPT and Alexa and everything else. I don't know, who knows what the future holds? I think things like coding skills are going to be relevant for a long time, who knows? But I think if you teach your kids to be curious, to be independent, to be self-confident, you teach them how to think, I don't know what the future holds, but I think that those are going to be skills that are going to be important in any configuration of the future. And so it's not like we have all the answers, but that's how Elizabeth and I think about our kids.
**Lenny:** 你有没有发现 AI——有很多关于 AI 辅导的讨论。这是你们在做的事情吗?我知道他们在用 ChatGPT,我很喜欢你发的那些他们在玩 prompt 的照片什么的。有没有什么你在尝试的,或者你觉得会变得非常重要的?
**Lenny:** And do you find that AI... There's a lot of talk about AI tutoring. Is that something you guys are doing? I know they're using Chat GPT, I love all the photos you post where they're playing with prompts and stuff, but I guess is there anything there you're experimenting with or you think is going to become really important?
**Kevin Weil:** 这是一件——这可能是 AI 能做的最重要的事。也许这个说法太宏大了。AI 能做很多重要的事情,包括加速基础科学研究和发现的步伐,这可能实际上才是 AI 能做的最重要的事。但最重要的事情之一会是个性化辅导(personalized tutoring)。而且让我觉得不可思议的是,现在仍然没有——我知道有一些好产品。Khan Academy 做的很好。他们是我们很好的合作伙伴。Vinod Khosla 有一个非营利组织在这个领域做一些非常有意思的事情,产生了影响。但我有点惊讶没有一个覆盖 20 亿孩子的 AI 个性化辅导产品,因为模型现在已经够好了,而每一项相关研究似乎都表明——教育仍然重要,但当你把教育和个性化辅导结合起来,你会得到多个标准差的学习速度提升。所以这是没有争议的,对孩子好,而且是免费的。ChatGPT 是免费的,你不需要付费,而且模型够好了。我还是觉得不可思议——现在没有一个很棒的东西让我们的孩子在用、你未来的孩子在用,以及世界各地那些没有我们的孩子那么幸运、无法拥有这种内置的扎实教育的人们在用。再说一次,ChatGPT 是免费的。人们到处都有 Android 设备。我真的觉得这可以改变世界,而我很惊讶它还不存在,我希望它存在。
**Kevin Weil:** This is something that... It's maybe the most important thing that AI could do. Maybe that's a grand statement. There are lots of important things that AI can do, including speeding up the pace of fundamental science research and discovery, which maybe is actually the most important thing AI can do. But one of the most important things would be personalized tutoring. And it kind of blows my mind that there is still... I know there are a bunch of good products out there. Khan Academy does great things. They're a wonderful partner of ours. Vinod Khosla has a non-profit that's doing some really interesting stuff in this space and is making an impact. But I'm kind of surprised that there isn't a 2 billion kid AI personalized tutoring thing because the models are good enough to do it now, and every study out there that's ever been done seems to show that when you have... Like, education is still important, but when you combine that with personalized tutoring, you get multiple standard deviation improvements in learning speed. And so it's uncontroversial, it's good for kids, it's free. Chat GPT is free, you don't need to pay, and the models are good enough. It still just kind of blows my mind that there isn't something amazing out there that our kids are using and your future kids are using, and people in all sorts of places around the world that aren't as lucky as our kids to be able to have this sort of built-in, solid education. Again, Chat GPT is free. People have Android devices everywhere. I really just think this could change the world and I'm surprised it doesn't exist and I want it to exist.
**Lenny:** 这有点触及我想花点时间聊的东西——很多人也很担心 AI,它要去哪里,他们担心它会抢走的工作,他们担心超级智能在未来消灭人类。你对此有什么看法?还有那个乐观的理由——我觉得人们需要听到的。
**Lenny:** This kind of touches on something I want to spend a little time on, which is a lot of people also worry a lot about AI, where it's going, they worry about jobs it's going to take, they worry about the super intelligence squashing humanity in the future. What's your perspective on that and just the optimistic case that I think people need to hear?
**Kevin Weil:** 我的意思是,我是一个大的技术乐观主义者。我觉得如果你看过去 200 年,也许更久,技术推动了很多让我们成为今天这个世界和社会的进步。它推动经济进步,推动地缘政治进步,生活质量,寿命的延长。我的意思是,技术几乎是一切的根源,所以我觉得很少有例子表明从长远来看这不是一件好事。……从长远来看是好事。这不意味着没有暂时的错位或没有受影响的个体,这也很重要。所以不能只是平均值好就行。你也得想怎么尽最大可能照顾到每一个人。这是我们非常认真思考的事情,当我们和政府合作,当我们和政策方合作,我们尽我们所能提供帮助。我们在教育方面做了很多。这里的一个好处是,ChatGPT 可能也是你能想要的最好的技能再培训(reskilling)应用。它知道很多东西。如果你有兴趣学新东西,它能教你很多。这些都是非常现实的问题。我对长远非常乐观,作为一个社会,我们需要尽一切所能确保我们让这个转变尽可能优雅、尽可能有充分支持。
**Kevin Weil:** I mean, I'm a big technology optimist. I think if you look over the last 200 years, maybe more, technology has driven a lot of the advancements that have made us the world and the society that we are today. It drives economic advancements, it drives geopolitical advancements, quality of life, longevity advancement. I mean, technology's at the root of just about everything, so I think there are very few examples where this is anything but a great thing over the longer term. That doesn't mean that there aren't... ... a great thing over the longer term. That doesn't mean that there aren't temporary dislocations or where there aren't individuals that are impacted, and that matters too. So it can't just be that the average is good. You've got to also think about how you take care of each individual person as best you can. It is something that we think a lot about and as we work with the administration, as we work with policy, we try and help wherever we can. We do a lot with education. One of the benefits here is that ChatGPT is also perhaps the best reskilling app you could possibly want. It knows a lot of things. It can teach you a lot of things if you're interested in learning new things. These are very real issues. I'm super optimistic about the long run, and we're going to need to do everything we can as a society to ensure that we make this transition as graceful and as well-supported as we can.
**Lenny:** 给人们一个关于事情可能往哪走的感觉——这是很多人心中的大问题。有人问了一个我很喜欢的问题——"AI 已经在以很多不同方式改变创意工作——写作、设计、编程——你觉得下一个大飞跃是什么?我们应该觉得 AI 辅助创造力的下一个大飞跃是什么?然后更广泛地说,你觉得未来几年事情会往哪走?"
**Lenny:** To give people a sense of where things might be going. That's a big question in a lot of people's minds. So someone asked this question that I love, which is, "AI is already changing, creative work in a lot of different ways, writing and design and coding, what do you think is the next big leap? What should we be thinking is the next big leap in AI-assisted creativity specifically, and then just broadly, where do you think things are going to be going in the next few years?"
**Kevin Weil:** 对。这也是一个我很乐观的领域。如果你看 Sora,比如说——我们之前聊了 ImageGen 和人们在 Twitter、Instagram 和其他地方展现的绝对的创意井喷。我是世界上最差的艺术家——最差的。也许我唯一比艺术更差的是唱歌。给我一支铅笔和一张纸,我画得不比我们 8 岁的孩子好。但给我 ImageGen,我能想一些有创意的想法,输入模型,突然就有了我自己不可能做出来的输出。这很酷。即使你看那些真正有才华的人——我最近跟一位导演聊 Sora,一个导演过我们都知道的电影的人。他说,对于他正在拍的一部电影——就比方说某种科幻风格的,想想星球大战——有一个场景是一架飞机冲向某个像死星那样的东西。你有飞机看着整个星球,然后你想切到一个飞机在地面层的场景,突然你看到城市和其他一切。怎么处理那个过渡镜头?那个转场?他说:"在两年前的世界里,我会付一家 3D 特效公司 10 万美元,他们会花一个月时间,给我做出两个版本的过渡镜头。我会评估它们。我们会选一个,因为你能怎么办?再花 5 万美元再等一个月吗?我们就用那个了。它会是可以的。电影很棒。我喜欢。显然我们用已有的技术能做出很棒的东西。但你现在看看用 Sora 能做什么。"他的重点是:"现在我能用 Sora——我们的视频模型——光是我往 prompt 里随便想、模型跟我一起想,就能得到 50 个不同版本的过渡镜头。然后当然我可以在那些基础上迭代和细化、采用不同的想法。现在我还是会去那家 3D 特效工作室做最终版,但我去的时候已经做了头脑风暴,有了一种更有创意的方式,产出更好的结果。而我是在 AI 辅助下做到的。"我个人对创造力整体的看法是——没有人会……你不会输入 Sora "给我做一部好电影。"它需要创造力和匠心以及所有这些东西,但它能帮你探索更多。它能帮你到达一个更好的最终结果。所以我在大多数事情上倾向于乐观,但实际上我觉得这里有一个非常好的故事。
**Kevin Weil:** Yeah. This is also an area where I'm a big optimist. If you look at Sora, for example. I mean we talked about ImageGen earlier and the absolute fount of creativity that people are putting across Twitter and Instagram and other places. I am the world's worst artist like the worst. Maybe the only thing I'm worse at than art is singing. Give me a pencil and a pad of paper and I can't draw better than our eight-year-old. But give me ImageGen and I can think some creative thoughts and put something into the model and suddenly have output that I couldn't have possibly done myself. That's pretty cool. Even you look at folks that are really talented. I was talking to a director recently about Sora, someone who's directed films that we would all know, and he was saying, for a film that he's doing, take the example of some sort of sci-fi-ish, think of Star Wars, and you've got some scene where there's a plane zooming into some Death Star-like thing. And so you've got the plane looking at the whole planet, and then you want to cut to a scene where the plane's kind of at the ground level, and all of a sudden you see the city and everything else. How are we going to manage that cut scene? And that transition? And he was saying, "In the world of two years ago, I would have paid a 3D effects company a hundred grand and they would've taken a month, and they would've produced two versions of this cut scene for me. And I would've evaluated them. We would've chosen one, because what are you going to do? Pay another 50 grand and wait another month. And we would've just gone with it. And it would be fine. Movies are great. I love them. And there've been..." Obviously, we can do great things with the technology that we've had, but you now look at what you can do with Sora. And his point was, "Now, I can use Sora, our video model, and I can get 50 different variations of this cut scene just me brainstorming into a prompt and the model brainstorming a little bit with me. I've got 50 different versions. And then of course, I can iterate off of those and refine them and take different ideas. And now I'm still going to go to that 3D effects studio to produce the final one, but I'm going to go having brainstormed and had a much more creative approach with an outcome that's much better. And I did that assisted by AI." My personal view on creativity in general is that it's no one's going to... You don't type into Sora like, "Make me a great movie." It requires creativity and ingenuity, and all these things, but it can help you explore more. It can help you get to a better final result. So, again, I tend to be an optimist in most things, but actually, I think there's a very good story here.
**Lenny:** 我知道 Sam Altman——我想是他最近发推说了你们在做的创意写作的东西——他非常不擅长写创意内容,他分享了一个例子,实际上真的很好。我想象那也是一个投入的领域。
**Lenny:** I know Sam Altman, I think it was him who tweeted recently, the creative writing piece that you guys are working on where it's... He is very bad at writing creative stuff, and he shared an example where it's actually really good. I imagine that's another area of investment.
**Kevin Weil:** 对,内部有一些关于新研究技术的令人兴奋的东西在发生。我们在某个时候会有更多可以说的。但是对,Sam 有时候喜欢展示一些即将到来的东西,这很聪明。顺便说,这非常符合我们迭代部署(iterative deployment)的理念。我们不会有了什么突破就永远自己留着,然后某天赐给世界。我们基本上就谈论我们在做的事情,能分享的时候就分享,早发布勤发布,然后公开迭代。
**Kevin Weil:** Yeah, there's some exciting stuff happening internally with some new research techniques. We'll have more to say about that at some point. But yeah, Sam sometimes likes to show off some of the stuff that's coming, which is smart. By the way, it's very indicative of this iterative deployment philosophy. We don't have some breakthrough and keep it to ourselves forever, and then bestow it upon the world someday. We kind of just talk about the things we're working on and share when we can and launch early and often, and then iterate in public.
**Lenny:** 我真的很喜欢这个理念。我喜欢这些暗示——有些东西要来了。我知道你不能说太多。你说过编码方面可能很快会有一个飞跃——也许这一集播出的时候就有了。还有什么人们应该想着可能快来了的?有什么你能透露的有趣的、令人兴奋的东西?
**Lenny:** I really like that philosophy. I love all these hints that a few things coming. I know you can't say too much. You talked about how there might be a coding leap coming in the near future maybe by the time this comes out. Is there anything else people should be thinking about, might be coming in the near future? Any things you can tease that are interesting? Exciting?
**Kevin Weil:** 老兄,这些还不够你的?一切每天都在变好。
**Kevin Weil:** Man, this hasn't been enough for you? Only everything is getting better every day.
**Lenny:** 对。我就是——老兄,我希望我们在这一集发布之前能发布一些东西,这样——
**Lenny:** Yeah. I'm like, man, I hope we get some of this stuff out before the episode launches so-
**Kevin Weil:** 这是你的新时限。
**Kevin Weil:** This is your new timebox.
**Lenny:** ——我不会惹恼人们。对我来说惊人的是——我们之前聊到模型在短短几年内走了多远。如果你回到 GPT-3,你会对它有多差感到恶心,即使两年前的 Lenny 当时被它有多好给震惊了。很长时间我们每六到九个月迭代一个新的 GPT 模型。是 GPT-3、GPT-3.5、4。现在有了这个 O 系列推理模型,我们走得更快了。大概每三个月,也许四个月,就有一个新的 O 系列模型,每一个在能力上都是一个台阶。所以这些模型的能力在以巨大的速度增长。它们在规模化的同时也在变便宜。你看看哪怕几年前的情况——我想最早的,不知道是什么,GPT-3.5 还是什么——在 API 中的成本是今天 GPT-4o mini 的 100 倍。几年时间,成本下降了两个数量级,智能程度还大幅提高。所以我不知道世界上还有哪里有这样的趋势组合。模型在变得更聪明、更快、更便宜,而且也更安全了。每次迭代幻觉都更少。
**Lenny:** ... I don't piss people off. The amazing thing to me is we were talking earlier about how far models have come in just a couple of years. If you went back to GPT-3, you'd be disgusted by how bad it was, even though Lenny of two years ago was mind-blown by how good these were. And for a long time, we were iterating every six to nine months on a new GPT model. It was like GPT-3, GPT-3.5, 4, and now with this o-series of reasoning models, we're moving even faster. Every roughly three months, maybe four months, there's a new o-series model, and each of them is a step up in capability. And so the capabilities of these models are increasing at a massive pace. They're also getting cheaper as they scale. You look at where we were even a couple of years ago. I think the original, I don't know, what was it, GPT-3.5 or something was like 100 x the cost of GPT-4o mini today in the API. A couple of years, you've gone down two orders of magnitude in cost for much more intelligence. And so I don't know where there's another series of trends like that in the world. Models are getting smarter, they're getting faster, they're getting cheaper, and they're getting safer too. They hallucinate less every iteration.
**Lenny:** 摩尔定律(Moore's Law)和晶体管变得无处不在——那是关于每 18 个月芯片上晶体管数量翻倍的定律。如果你说的是每年 10 倍的东西,那是一条陡峭得多的指数曲线。
**Lenny:** And so the Morse Law and transistors becoming ubiquitous. That was a law around doubling the number of transistors on a chip every 18 months. If you're talking about something where you're getting 10 x every year, that's a massively steeper exponential.
**Kevin Weil:** 它告诉我们未来会和今天非常不同。我试着提醒自己的一件事是:你今天用的 AI 模型是你余生中会用到的最差的 AI 模型。当你真正把这个想法装进脑子里的时候,还是挺疯狂的。
**Kevin Weil:** And it tells us that the future is going to be very different than today. The thing I try and remind myself is, the AI models that you're using today is the worst AI model you will ever use for the rest of your life. And when you actually get that in your head, it's kind of wild.
**Lenny:** 我本来也想说同样的话,这是我看这些东西时一直跟着我的一点。你在说 Sora,我想象很多人听到会说"不不不,它还没准备好。还不够好。不会像我在电影院看到的电影那么好。"但重点就是你刚才说的——这是它会达到的最差水平。它只会变得更好。
**Lenny:** I was going to actually say the same thing, and that's the thing that always sticks with me when I watch this thing. You're talking about Sora, and I imagine many people hearing that are like, "No, no. It's not actually ready. It's not good enough. It's not going to be as good as a movie I see in the theater." But the point is what you just made that this is the worst it's going to be. It will only get better.
**Kevin Weil:** 对,模型最大化主义(model maximalism)。就是持续为即将到来的能力构建,模型会追上来并且变得惊人。滑向球将要到达的地方。
**Kevin Weil:** Yeah, model maximalism. Just keep building for the capabilities that are almost there, and the model's going to catch up and be amazing. Escape to where the puck is going to be.
**Lenny:** 对。这让我想起——我前几天在用……我在把所有东西都 Ghibli 化——我就想着"怎么这么久。"
**Lenny:** Yeah. This reminds me, I was just using... I was duplifying everything the other day and I was just like, "What is taking so long."
**Kevin Weil:** 很正常嘛。就像切——
**Kevin Weil:** As one does. Just like cut—
**Lenny:** 什么?
**Lenny:** What was that?
**Kevin Weil:** 我说,很正常嘛,现在这年头。
**Kevin Weil:** I said, as one does.
**Lenny:** 我就想"要一分钟才能以这种惊人的方式生成我家人的图片"——拜托,怎么这么久。你就是会习惯魔法在你面前发生。
**Lenny:** As one does these days. I was just like, "It's taking a minute to generate this image of my family in this amazing way." Come on, what's taking so long. You just get so used to magic happening in front of you.
**Kevin Weil:** 对,完全是。
**Kevin Weil:** Yeah, totally.
**Lenny:** 好,最后一个问题。这个方向会完全不同。很多人问到了这个。有一个著名的项目——你在 Facebook 领导的叫 Libra 的项目,后来改名叫 Novi。很多人一直在想"那是怎么回事?那是一个很酷的想法。"我知道有些人了解有监管挑战之类的。我不知道你有没有多谈过这个。所以我想——你能给大家一个简短的总结吗?什么是 Libra?你做的这个项目,然后发生了什么,你怎么看待它?
**Lenny:** Okay, final question. This is going to go in a completely different direction. A lot of people asked about this. So famously, you led this project at Facebook called Libra, which is now called Novi. A lot of people always wondered, "What happened there? That was a really cool idea." I know some people have a sense there's regulation challenges, things like that. I don't know if you've talked about this much. So I guess, could you just give people a brief summary of just what is Libra? This project you working on, and just what happened, and how you feel about it?
**Kevin Weil:** 对。David Marcus 领导的这个项目,我很乐意为他工作、和他共事。我觉得他是一个有远见的人,也是导师和朋友。坦白说,Libra 可能是我职业生涯中最大的遗憾。当我想到我们在解决的问题——那些是非常真实的问题。如果你看比如说汇款(remittance)领域——人们给在其他国家的家人汇钱——这可能是……我的意思是这非常累退(regressive),对吧?没有钱花的人不得不付 20% 的费用把钱寄回家。所以费用高得离谱,要好几天,你还得去……取现金之类的。一切都很糟。而这里有 30 亿人在全世界使用 WhatsApp,每天互相交流,尤其是朋友和家人,而且恰好是那种会互相转账的人。为什么不能像发短信一样即时、便宜、简单地发钱呢?这是那种你退后想想就觉得应该存在的东西。那就是我们试图做的。现在,我不认为我们打好了所有的牌。如果我能回去重来,有很多事我会做不同。我们试图一次搞定所有事情。我们试图推出一个新的区块链(blockchain)。最初是一揽子货币。是集成到 WhatsApp 和 Messenger 的。我觉得全世界就反应"天哪,一次这么多变化。"而且恰好当时 Facebook 的声誉处于绝对的最低点。所以这也没帮上忙。而且也不是人们想要的承担这种变化的信使。我们进去之前就知道这些,但我们还是冲了。我觉得有很多方式可以更温和地引入这些变化,也许还是能到达同样的结果,但一次少一些新东西,一次一个地引入新东西。谁知道呢?那些是我们一起做的决定。所以我们都要负责。我当然也要负责。但从根本上让我失望的是这个东西今天还不存在,因为如果我们能发布那个产品,世界会更好。我应该能在 WhatsApp 里免费给你发 50 美分。即时结算。每个人的 WhatsApp 账户里都会有余额。我们会在——我的意思是,它应该存在。我不知道。坦白说,现在的政府对加密货币超级友好。Facebook 的声誉、Meta 的声誉现在在一个非常不同的位置。也许他们现在应该去做这件事。
**Kevin Weil:** Yeah. I mean, David Marcus led it, and I happily work for him and with him. I think he's a visionary and also a mentor and a friend. Honestly, Libra is probably the biggest disappointment of my career. When I think about the problems we were solving, which are very real problems. If you look at, for example, the remittance space, people sending money to family members in other countries, it is maybe... I mean it's incredibly regressive, right? People that don't have the money to spend are having to pay 20% to send money home to their family. So outrageous fees, it takes multiple days, you have to go then pick up cash from... It's all bad. And here we are with 3 billion people using WhatsApp all over the world, talking to each other every day, especially friends and family, and exactly the kind of people who'd send money to each other. Why can't you send money as immediately, as cheaply, as simply as you send a text message? It is one of those things when you sit back and think about it, that should just exist. And that was what we set out to try and do. Now, I don't think we played all of our cards perfectly. If I could go back and do things, there are a bunch of things I would do differently. We tried to get it all at once. We tried to launch a new blockchain. It was a basket of currencies originally. It was integration into WhatsApp and Messenger, and I think the whole world kind of went like, "Oh my God, that's a lot of change at once." And it happened also to be at the time that Facebook was at the absolute nadir of its reputation. And so that didn't help. It was also not the Messenger that people wanted for this kind of change. We knew all that going in, but we went for it. I think there are a bunch of ways that we could do that that would've introduced the change a little bit more gently, maybe still gotten to that same outcome, but fewer new things at once and introduced the new things one at a time. Who knows? Those were decisions we made together. So we all own them. Certainly, I own them. But it fundamentally disappoints me that this doesn't exist in the world today because the world would be a better place if we'd been able to ship that product. I would be able to send you 50 cents in WhatsApp for free. It would settle instantly. Everybody would have a balance in their WhatsApp account. We'd be transact... I mean, it should exist. I don't know. To be honest, the current administration is super friendly to crypto. Facebook's reputation, Meta's reputation is in a very different place. Maybe they should go build it now.
**Lenny:** 我查了它的历史,显然他们把技术卖给了一家私募公司,2 亿美元。
**Lenny:** I was looking at the history of it, and apparently, they sold the tech to some private equity company for 200 million bucks.
**Kevin Weil:** 对对——
**Kevin Weil:** Yeah, yeah, and-
**Lenny:** 他们得买回来。
**Lenny:** They had to buy it back.
**Kevin Weil:** 有几个现在的区块链是建立在那个技术上的,因为技术从一开始就是开源的。Aptos 和 Mistin 是两个基于这个技术建立的公司。所以至少我们做的所有工作没有死掉,活在了这两家公司里,而且它们都做得很好。但话说回来,我们应该能在 WhatsApp 里互相转账,而今天我们做不到。
**Kevin Weil:** There are a couple of current blockchains that are built on the tech because the tech was open-sourced from the beginning. Aptos and Mistin are two companies that are built off of this tech. So at least all of the work that we did, did not die and lives on in these two companies, and they're both doing really well. But still, we should be able to send each other money in WhatsApp, and we can't today.
**Lenny:** 深有同感。好吧,谢谢你分享这个故事,Kevin。在我们进入非常令人兴奋的闪电轮之前,还有什么你想分享的吗?也许最后一条反面建议或洞察?
**Lenny:** Hear, hear. Well, thanks for sharing that story, Kevin. Is there anything else you want to share or maybe a last negative advice or insight before we get to our very exciting lightning round?
**Kevin Weil:** 哦,闪电轮。我们直接开始吧。
**Kevin Weil:** Ooh, the lightning round. Let's just go do that.
**Lenny:** 我们来吧。好了,Kevin,我们到了非常令人兴奋的闪电轮。你准备好了吗?
**Lenny:** Let's do it. With that, Kevin, we reached our very exciting lightning round. Are you ready?
**Kevin Weil:** 好。我们来吧。
**Kevin Weil:** Yeah. Let's do it.
**Lenny:** 好。你最常推荐给别人的两三本书是什么?
**Lenny:** Okay. What are two or three books that you find yourself recommending most to other people?
**Kevin Weil:** Ethan Mollick 的《Co-Intelligence》,一本关于 AI 以及如何在日常生活中使用它的很好的书——作为学生、作为老师。他非常有深度。顺便说他在 Twitter 上也是一个很值得关注的人。Peter Zeihan 的《The Accidental Superpower》。如果你对地缘政治以及塑造当前动态的力量感兴趣,这本书很好。然后我真的很喜欢《Cable Cowboy》——我不知道作者是谁——John Malone 的传记。如果你喜欢商业,尤其是如果你想深入了解——这个人是令人难以置信的交易撮合者,塑造了很多现代有线电视产业。所以那是一本好传记。
**Kevin Weil:** Co-Intelligence by Ethan Mollick, a really good book about AI and how to use it in your daily life as a student, as a teacher. He's super thoughtful. Also, by the way, a very good follow on Twitter. The Accidental Superpower by Peter Zion. Very good if you're interested in geopolitics and the forces that sort of shape the dynamics happening. And then I really enjoyed Cable Cowboy, I don't know who the author is, but the biography of John Malone. Just fascinating. If you like business, especially if you want to get into... I mean the man was an incredible dealmaker and shaped a lot of the modern cable industry. So that was a good biography.
**Lenny:** 这些都是首次被提到的书,这总是很棒的。
**Lenny:** These are all first-time mentions, which is always a great,
**Kevin Weil:** 哦,好。
**Kevin Weil:** Oh, good.
**Lenny:** 下一个问题。你有没有最近喜欢的电影或电视剧?
**Lenny:** Next question. Do you have a favorite recent movie or TV show that you really enjoyed?
**Kevin Weil:** 我希望我有时间看电视剧,所以我——
**Kevin Weil:** I wish I had time to watch a TV show, so I'm-
**Lenny:** 就看 Sora 视频。
**Lenny:** Just Sora videos.
**Kevin Weil:** 对吧。我不知道。我小时候读了《时光之轮》(Wheel of Time)系列,现在 Amazon 拍了剧,到第三季了,所以我想看。还没看。《壮志凌云 2》(Top Gun 2)是一部很棒的电影。我觉得它已经不算新了。这说明你上一次看电影是什么时候了。但我喜欢这个理念。我想要更多美国精神。我想要更多为强大而骄傲。我觉得《壮志凌云 2》在这方面做得很好。自豪感和爱国心——我觉得美国可以多一些。
**Kevin Weil:** Yeah, right. I don't know. When I was a kid, I read the Wheel of Time series and now Amazon has it as they're in the third season of it, so I want to watch that. I haven't yet. Top Gun 2 was an awesome movie. I think that's no longer new. That shows when the last time you watched a movie was. But I like the idea. I want more Americana. I want more being proud of being strong. And I thought Top Gun 2 did a really good job of that. Pride and patriotism, I think the US could use more of that.
**Lenny:** 有没有你最近发现的、真的很喜欢的产品?除了你们内部的超级智能工具——我开玩笑的。
**Lenny:** Is there a favorite product that you've recently discovered that you really love, other than your super intelligence internal tool that you all have access to? I'm just joking.
**Kevin Weil:** 没错,内部 AGI。对对。嗯,我觉得用 Windsurf 这样的产品做 vibe coding 真的超好玩。我玩得很开心。我还是很爱我们首席人力官 vibe coded 了一些工具这件事。也许另一个是 Waymo。每次有机会我都会坐 Waymo。这就是更好的出行方式,而且仍然有一种未来感。他们做得太好了。
**Kevin Weil:** That's right. Internal AGR. Yeah, that's right. Well, I think vibe coding with products like Windsurf is just super fun. I'm having a great time doing that. I still just love that our chief people officer vibe coded some tools. Maybe the other one is Waymo. Every chance I get, I'll take a Waymo. It's just a better way of riding, and it still feels like the future. So they've done an amazing job.
**Lenny:** 很棒。顺便说,我让 Windsurf 的创始人上了播客。可能在这一集之前或之后发布。Cursor 的 CEO 也要上播客——也是在这一集之前或之后。
**Lenny:** That's awesome. By the way, I had the founder of Windsurf on the podcast. It might come out before this or after this. And also Cursor's CEO is coming on the podcast either before or after this.
**Kevin Weil:** 哦,酷。我非常尊重那些人在做的事情。那些都是很棒的产品。就是在改变每个人构建产品的方式。没什么大不了的。
**Kevin Weil:** Oh, cool. I have a ton of respect for what those guys are doing. Those are awesome products. Just changing the way everyone builds product. No big deal.
**Lenny:** 对。再问几个问题。你有没有一个喜欢的人生格言,你经常重复、在工作或生活中觉得真的很有用的?
**Lenny:** Yeah. A couple more questions. Do you have a favorite life motto that you often repeat yourself, find really useful in work or in life?
**Kevin Weil:** 有。实际上有意思的是,这更多是一种理念,但后来我觉得 Zuck 有一次在 Facebook 的财报电话会上把它总结了。我实际上把这个做成了一张海报。它放在我房间里。但有人在问 Mark——这是真的在财报电话会上,是一个分析师在问他。那是 Facebook 增长很多的某个季度。这应该是在 2010 年代的某个时候。他问:"那你做了什么?你推出了什么?是什么一件事驱动了你所有这些增长?"他说了大意是:"有时候不是某一件事,而只是长期持续做好工作。"("Sometimes it's not any one thing, it's just good work consistently over a long period of time.")这一直留在我心里。我觉得确实是这样。我跑超级马拉松。就是关于磨。我觉得人们太经常找银弹了,而很多人生和很多卓越其实是每天都出现、做好工作、每天进步一点点,你可能一周甚至一个月都注意不到。很多人就会灰心、放弃。但实际上你坚持下去,收益持续复利。一年、两年、五年下来,它积累得疯狂。所以——长期持续做好工作。
**Kevin Weil:** Yeah. So actually, this is interestingly enough, it is more of a philosophy, but then I thought Zuck encapsulated it one time on a Facebook earnings call. So I actually had this made into a poster. It sits in my room. But somebody was asking Mark. This is literally on an earnings call, so it's like an analyst on an earnings call asking him. It was some quarter when Facebook had grown a lot. This was back in the 20 teens sometime, I think. But he's like, "So what did you do? What was it that you launched? What was the one thing that drove all this growth for you?" And he said something to the effect of, "Sometimes it's not any one thing, it's just good work consistently over a long period of time." And that's always stuck with me. And I think it is. I mean I run ultra marathons. It's like it's just about grinding. I think people too often look for the silver bullet when a lot of life and a lot of excellence is actually showing up day in and day out, doing good work, getting a little bit better every single day, and you may not notice it over a week or even a month. And a lot of people then kind of get dismayed and stop. But actually, you keep doing it. The gains keep compounding. And over the course of a year, two years, five years, it adds up like crazy. So good work consistently over a long period of time.
**Lenny:** 我太喜欢了。我也得做一张海报。这——
**Lenny:** I love that. I got to make a poster of this now. That is-
**Kevin Weil:** 我给你弄一张。
**Kevin Weil:** We'll get you one.
**Lenny:** 我太有共鸣了。好,我接受。这太好了。好,最后一个问题。我要问你有没有什么 prompting 技巧。我先铺垫一下——想想你有没有能推荐给人们的让 LLM prompting 更好的技巧。我有一个嘉宾 Alex Komorowski 上过播客——他来自 Stripe,写每周对世界正在发生的事情的反思,很多和 AI 相关。他曾经把 LLM 描述为所有人类知识的压缩文件(zip file)。所有答案都在里面,你只需要找到正确的问题去问,就能得到基本上每个问题的答案。这提醒我 prompt 工程(prompt engineering)有多重要,知道怎么好好 prompt 有多重要。你一直在 prompt ChatGPT。你发现的一个有用的技巧或窍门是什么?
**Lenny:** I so resonate with that. Okay, I'll take it. That is so good. Okay, final question. I'm going to ask if you have any prompting tricks, and I'm going to set it up first. But think about if you have a trick that you could recommend to people for prompting LLMs better. I had a guest, Alex Komorowski, come on the podcast. He's from Stripe and writes his weekly reflections on what's happening in the world. A lot of them are AI-related. And he once described an LLM as a zip file of all human knowledge. All the answers are in there, and you just need to figure out the right question to ask to get the answer to every problem basically. And so it just reminded me how important prompt engineering is and knowing how to prompt well. You're constantly prompting ChatGPT. What's one tip, one trick that you found to be helpful in helping you get what you want?
**Kevin Weil:** 嗯,首先我想说,我想打消"你必须是一个好的 prompt engineer"这个概念。我觉得如果我们做好我们的工作,这就不再是真的了。这只是模型的那些棱角之一,专家可以学会。但随着时间推移,你不应该需要知道那些。就像你以前需要深入了解"你 MySQL 的存储引擎是什么?你用的是 InnoDB 4.1 吗?"如果你在 MySQL 性能的深度边缘,这仍然有用例。但大多数人不需要关心。如果 AI 真的要被广泛采用,你不应该需要关心 prompting 的细微细节。但今天我们还没有完全到那里。顺便说,我觉得我们在取得进展。我觉得比以前需要的 prompt 工程更少了。但与我之前说的微调的重要性和给例子的重要性一致——你可以做"穷人版微调"(poor man's fine-tuning),就是在你的 prompt 里包含你可能想要的东西的示例和好的回答。就像"这是一个例子,这是一个好的回答。这是一个例子,这是一个好的回答。现在帮我解决这个问题。"模型真的会听从并从中学习。不如做完整微调效果那么好,但比你不提供任何示例要好得多。我觉得人们做得不够多。
**Kevin Weil:** Well, I'll say, first of all, I want to kill the idea that you have to be a good prompt engineer. I think if we do our jobs, that stops being true. It's just one of those sharp edges of models that experts can learn. But then, just over time, you shouldn't need to know all that. The same way you used to have to get deep into, "What's your storage engine in MySQL? Are you using InnoDB 4.1?" There's still use cases for that if you're at the deep edge of MySQL performance. But most people don't need to care. And you shouldn't need to care about minute details of prompting if AI is really going to become broadly adopted. But today, we're not totally there. I think by the way, we are making progress there. I think there is less prompt engineering than there had to be before. But in line with some of the fine-tuning stuff I was talking about and the importance of giving examples, you can do effectively poor man's fine-tuning by including examples in your prompt of the kinds of things that you might want and a good answer. So like, "Here's an example and here's a good answer. Here's an example, and here's a good answer. Now, go solve this problem for me." And the model really will listen and learn from that. Not as well as if you do a full fine-tune, but much more than if you don't provide any examples. And I think people don't do that often enough.
**Lenny:** 太棒了。我听到的一个技巧,我好奇这管不管用——你告诉它"这对我的职业生涯非常非常重要。"让它真正理解"如果你不正确回答我,会有人死。"这管用吗?
**Lenny:** That's awesome. One tip that I heard, I'm curious if this works is you tell it, "This is very, very important to my career." Make it really understand like, "Someone will die if you don't answer me correctly." Does that work?
**Kevin Weil:** 这真的很奇怪。可能有一个很好的解释。但你也可以说——所以对,我觉得这有一定的有效性。你还可以说一些像"我要你做 Einstein。现在回答这个物理问题"或者"你是世界上最伟大的营销人员,世界上最伟大的品牌营销人员。现在这里有一个命名问题。"确实有某种东西让模型转移到某种心态,这实际上可以是非常正面的。
**Kevin Weil:** It's really weird. There's probably a good explanation for this. But you can also say things. So, yes, I think there is some validity to that. You can also say things like, "I want you to be Einstein. Now, answer this physics problem for me," or, "You are the world's greatest marketer, the world's greatest brand marketer. Now here's a naming question." And there is something where it sort of shifts the model into a certain mindset that can actually be really positive.
**Lenny:** 我其实一直在用这个技巧。当我为采访想问题的时候——我偶尔用它来想出我没想到的东西——我实际上会打"你是世界上最好的播客采访者。"对。我有 Kevin Weil 要来上节目……
**Lenny:** I use that tip all the time actually. I always... When I'm coming up with questions for interviews and I use it occasionally to come up with things I haven't thought of, I actually type, "You're the world's best podcast interviewer." Right. I have Kevin Weil coming on the pod...
**Kevin Weil:** 对,这确实管用。顺便说,回到我们之前几次提到的那个点。你有时候确实对人也这样做。你给他们定框架、把他们带进某种心态,然后回答完全不同。所以我觉得这个也有人类的类比,又一次。
**Kevin Weil:** Yeah, it actually works. By the way, back to our other point that we made a few times. You do do that sometimes with people. You sort of put them... You frame things, you get them into a certain mindset, and the answer is completely different. So I think there are human analogs of this one more time.
**Lenny:** Kevin,这太精彩了。我刚在想怎么结束。我的感觉是——不仅你站在未来的最前沿,你和团队实际上就是正在创造未来的那个刃。所以让你上这里聊天、听你觉得事情会往哪走、我们需要想些什么,这真的是一种荣幸。谢谢你来,Kevin。
**Lenny:** Kevin, this was incredible. I was just thinking about a way to end this. The way I feel like... I feel like not only are you at the cutting edge of the future. You and the team are kind of actually the edge that is creating the future. And so it's a real honor to have you on here and to talk to you and to hear where you think things are going and what we need to be thinking about, so thank you for being here, Kevin.
**Kevin Weil:** 哦,太感谢你邀请我了。我有幸和世界上最好的团队合作,所有功劳归于他们。但真的很感谢你邀请我。超级开心。
**Kevin Weil:** Oh, thank you so much for having me. I get to work with the world's best team, and all credit to them, but really appreciate you having me on. It's been super fun.
**Lenny:** 我忘了问最后两个问题。大家在哪能找到你?如果他们想联系你的话。还有听众怎么能对你有帮助?
**Lenny:** I forgot to ask you the two final questions. Where can folks find you if they want to reach out, and how can listeners be useful to you?
**Kevin Weil:** 我是 @kevinweil,K-E-V-I-N-W-E-I-L,在几乎所有平台上。这么多年了我还是一个 Twitter DAU。我猜现在是 X DAU 了。LinkedIn,哪里都行。我觉得我想从人们那里得到的是——给我反馈。人们在用 ChatGPT。告诉我哪里对你来说真的很好用,你想让我们在哪里加倍投入。告诉我哪里不行。我在 Twitter 上非常活跃和投入。我喜欢听人们说什么好用什么不好用,所以不要害羞。
**Kevin Weil:** I am @kevinweil, K-E-V-I-N-W-E-I-L on pretty much every platform. I'm still a Twitter DAU after all these years. I guess an X DAU, LinkedIn, wherever. And I think the thing I would love from people, give me feedback. People are using ChatGPT. Tell me where it's working really well for you and where you want us to double down. Tell me where it's failing. I'm very active and engaged on Twitter. I love hearing from people, what's working and what's not, so don't be shy.
**Lenny:** 我发现关注你有助于了解你们在发布什么。你每天、每周、每月都在分享那些推出的东西,所以这也是一个好处。顺便说——4 亿周活用户都给你发反馈邮件了。
**Lenny:** And I learned following you helps you figure out all the stuff that you're launching. You share all the things that are going out every day, or week, month, so that's also a benefit. And by the way, 400 million weekly active users all emailing you feedback.
**Kevin Weil:** 来吧。
**Kevin Weil:** Here we go. Yes, let's do it.
**Lenny:** 是。我们来吧。会很好的。好。谢谢你,Kevin。感谢你来这里。
**Lenny:** It's going to work out great. Okay. Well, thank you, Kevin. Thanks for being here.
**Kevin Weil:** 好的老兄,太感谢了。回头见。
**Kevin Weil:** All right, man, thanks so much. See you soon.
**Lenny:** 再见各位。非常感谢你们收听。如果你觉得这有价值,你可以在 Apple Podcasts、Spotify 或者你的……
**Lenny:** Bye, everyone. Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com. See you in the next episode.