**Andrej Karpathy:** Andrej Karpathy
**Andrej Karpathy:** Code's not even the right verb anymore, right? [laughter] But I have to express my will to my agents for 16 hours a day. Manifest. [music] How can I have not just a single session of Claude code or Codex or some of these agent harnesses? How can I have more of them? How can I do that appropriately? The agent part is now taken for granted. Now the claw-like entities are taken for granted and now you can have multiple of them and now you can have instructions to them and now you can have optimization over the instructions. But there
**Host:** Sarah Guo & Elad Gil(No Priors 播客)
---
**Host:** [laughter] I mean this is why it gets to the psychosis is that this is like infinite and everything is a skill issue.
**Host:** 听众们好,欢迎回到 No Priors。今天我们请到了 Andrej Karpathy,将和大家展开一场广泛的对话,涵盖代码 agent、工程和 AI 研究的未来、如何让更多人参与研究、机器人领域的最新动态、他对 agent 如何进入现实世界的预测,以及下一个时代的教育。欢迎你,Andrej。感谢你来参加。
**Host:** Hi listeners, welcome back to No Priors. Today I'm here with Andre Karpathy and we have a wide-ranging conversation for you about code agents, the future of engineering and AI research, how more people can contribute to research, what's happening in robotics, his prediction for how agents can reach out [music] into the real world, and education in this next age. Welcome, Andre. Andre, thanks for doing this.
**Andrej Karpathy:** 谢谢你们邀请我。
**Andrej Karpathy:** Yeah, thank you for having me.
**Host:** 过去几个月 AI 领域非常令人兴奋。
**Host:** Uh so it's been a very exciting couple of months in AI. Uh yeah, you could say that. I remember um walking into the office at some point and you were like really locked in and I was asking what you were up to and you're like, I just I have to code for 16 hours a day or code's not even the right verb anymore, right? But I have to um express my will to my agents for 16 hours a day. Manifest um because like there's been a jump in capability. Uh what's happening? Tell me about your experience. Yeah, I kind of feel like I was just in this perpetual I still am often in this state of AI psychosis just like all the time um because there was a huge unlock in what you can achieve as a person as an individual, right? Because you were bottlenecked by, you know, your typing speed and so on. But now with these agents it really, I would say in December is when it really just something flipped where I kind of went from 80/20 of like, you know, uh to like 20/80 of writing code by myself versus just delegating to agents. And I don't even think it's 20/80 by now. I think it's a lot more than that. I don't think I've typed like a line of code probably since December basically. [laughter]
**Andrej Karpathy:** 确实如此。
**Andrej Karpathy:** Um which is like an extremely large uh change. Um I was talking to it like for example, I was talking about it to for example my parents and so on and I don't think like a normal person actually realizes that this happened or how dramatic it was. Like literally like if you just find a random software engineer or something like that at their at their desk and what they're doing, like their default workflow of, you know, building software is completely different as of basically December. Uh so I'm just like in this state of psychosis of trying to figure out like what's possible, uh trying to push it to the limit. How is it how can I have not just a single session of, you know, um Claude code or Codex or some of these agent harnesses? How can I have more of them? How can I do that uh appropriately? And then how can I use these claws? What are these claws? Uh and uh so there's like a lot of new things. I want to be at the forefront of it, you know, and I'm very antsy that I'm not at the forefront of it and I see lots of people on Twitter doing all kinds of things and they all sound like really good ideas and I need to be at the forefront or I feel extremely nervous. And so I guess I'm just in this psychosis of like what's possible like because it's unexplored fundamentally.
**Host:** 我记得有一次走进办公室,你当时非常专注,我问你在忙什么,你说:"我必须每天写 16 小时的代码"——或者说"写代码"已经不是一个准确的说法了,对吧?你说的是"我必须每天 16 小时向我的 agent 表达意志。" 因为能力上有一个跳跃。到底在发生什么?跟我们说说你的体验。
**Host:** Well, if you're nervous, the rest of us are are nervous. We have a we have a team that we work with at Conviction that their setup is everybody is like, you know, none of the engineers write code by hand and they they're all microphoned and they just like whisper to their agents all the time. It's the strangest work setting ever. Uh and I thought they were crazy and now I like I fully accept I was like, oh this was the way. Like you're just ahead of it. Um what uh how do you think about your own capacity now to like explore or to do projects? Like what is it limited by?
**Andrej Karpathy:** 我觉得自己一直处于一种 AI 精神错乱的状态。因为作为个体,你能做到的事情有了一个巨大的解锁。以前你受限于打字速度之类的东西。但现在有了这些 agent,我觉得大概是去年 12 月的时候,突然就发生了翻转——我从自己写代码和委托给 agent 的比例 80/20,变成了 20/80。而且我觉得现在甚至不止 20/80,比这还要夸张得多。基本上从 12 月以来,我可能一行代码都没自己敲过了。
**Andrej Karpathy:** Yeah, what is it limited by? Uh just I think everything like so many things even if they don't work, I think to a large extent you feel like it's a skill issue. It's not that the capability is not there. It's that you just haven't found a way to string it together of what's available. Like I just don't I didn't give good enough instructions in the agents from the file or whatever it may be. I don't have a nice enough memory tool that I put in there or something like that. So it all kind of feels like skill issue when it doesn't work to some extent. You want to see how you can parallelize them etc. and you want to be Peter Steinberg basically. Uh so Peter is famous. He has a funny photo where he's in front of a monitor with lots of uh like he uses Codex. So lots of Codex agents tiling the the monitor and they all take about 20 minutes if you prompt them correctly and use the high effort. And so they all take about 20 minutes. They have multiple, you know, 10 repos checked out. And so he's just um going between them and giving them work. It's just like you can you can you can move in much larger macro actions. It's not just like here's a line of code, here's a new function. It's like here's a new functionality and delegate it to agent one. Here's a new functionality that's not going to interfere with the other one. Give it agent two. And then try to uh review their work as best as you can
**Host:** [笑]
**Host:** [laughter]
**Andrej Karpathy:** 这是一个极其巨大的变化。比如我跟我父母聊这件事,我觉得普通人根本没意识到这件事发生了,也不知道它有多剧烈。你随便找一个软件工程师看看他在干什么,他构建软件的默认工作流从 12 月起就彻底变了。所以我就一直处于这种精神错乱的状态,试图搞清楚什么是可能的,试图把它推到极限。怎样才能不只开一个 Claude Code 或 Codex 或者其他 agent 工具的会话?怎样才能同时开更多?怎样才能合理地做到这些?这些 claw 是什么?怎么用它们?有太多新东西了。我想站在最前沿,而且我非常焦虑自己没站在最前沿。我看到 Twitter 上很多人在做各种各样的事情,听起来都是非常好的点子,我必须站在前沿,否则就感觉极度紧张。所以我就是处于这种精神错乱之中——因为这一切根本上是未被探索过的。
**Andrej Karpathy:** depending on how much you care about that code. Like where are these macro actions that I can like manipulate my software repository by? And like another agent is doing some like research, another agent is writing code, another one is coming up with a plan for some new implementation. And so everything is just like happens in these like macro actions over your repository. Um and you're just trying to become like really good at it and develop like a muscle memory for it is extremely um Yeah, it's very rewarding number one because it actually works. Uh but it's also kind of like the new thing to learn. So that's why hence the psychosis.
**Host:** 如果你都紧张,那我们其他人就更紧张了。我们在 Conviction 有一个团队,他们的设置是——没有一个工程师手写代码,他们都戴着麦克风,一直对着 agent 小声说话。那是我见过最奇怪的工作环境了。我一开始觉得他们疯了,但后来我完全接受了。我当时就想,"哦,这才是正确的路径,你只是走在前面了。" 你现在怎么看自己探索或做项目的能力上限?受什么限制?
**Host:** Yeah, I I do feel like my instinct is like whenever I'm waiting for an agent to complete something, the obvious thing to do is like, well, I can do more work, right? Like if I have access to more tokens then like I should just parallelize at tasks. And so that's that's very stressful because if you don't feel very bounded by your ability to spend on tokens, then you know, you are the bottleneck in the system that is max capability.
**Andrej Karpathy:** 受什么限制呢?我觉得是一切。很多事情即使不成功,你也会觉得是技术问题——不是能力不够,只是你还没找到把已有的东西串起来的方法。我只是没在 agent 的文件里给足够好的指令,或者没有一个足够好的记忆工具之类的。所以当东西不好使的时候,在某种程度上都感觉是技术问题。你想研究怎么并行化它们,你想成为 Peter Steinberg。Peter 很出名,他有一张很搞笑的照片,坐在显示器前面,屏幕上排满了 Codex agent 的窗格。如果你提示词写得好并且用高强度模式,每个任务大概要跑 20 分钟。他有 10 个仓库同时 checkout 出来,就在它们之间切换分配工作。你可以用更宏观的动作来操作了——不再是"这里改一行代码,那里加一个函数",而是"这里有一个新功能,交给 agent 一号;那里有一个不会跟前一个冲突的新功能,交给 agent 二号。"然后尽可能地 review 它们的工作——
**Andrej Karpathy:** Yeah, if you're not maximizing your subscription at least. And ideally for multiple agents. Like if you run out of the quota on Codex, you should switch to Claude or whatnot. I don't know. Like that's what I've been trying to do a little bit and I feel nervous when I have subscription left over. That just means I haven't maximized my token throughput. So I actually kind of experienced this when I was a PhD student. You would feel nervous when your GPUs are not running. Like you have GPU capability and you're not maximizing your the available flops to you. But now it's not about flops, it's about tokens. So what is your token throughput and what token throughput do you command?
**Host:** [笑]
**Host:** I would actually argue that it's very interesting that we had, you know, at least 10 years where in many engineering tasks people just did they didn't feel compute bound. Right? Um and now the entire industry feels that now. They feel like they they they felt resource bound uh and now that you have this big capability jump, you're like, oh, actually it's not, you know, my ability to access the computer anymore. Like I'm I'm the binding constraint.
**Andrej Karpathy:** ——取决于你对那些代码有多在意。这些宏观动作是什么,我可以用来操控我的代码仓库?另一个 agent 在做研究,另一个在写代码,还有一个在为某个新实现制定计划。所有事情都是以这些宏观动作的方式在仓库上发生的。你就是在努力变得非常擅长这些,形成某种肌肉记忆。这非常有成就感,因为它确实管用。但同时也是一个需要学习的全新东西。所以才会有那种精神错乱感。
**Andrej Karpathy:** Yeah, it's a skill issue. Which is very empowering cuz um yeah, cuz you could be getting better. So that's why that's why I think it's very addictive because there's unlocks when you when you get better. Where do you think it goes? Like if you just think about like, okay, you know, Andre's iterating and everybody else is for 16 hours a day getting better at using coding agents. Like what does it look like in a year? Of like you've reached mastery.
**Host:** 是的,我也有同样的本能反应——每当我在等一个 agent 完成任务的时候,很自然的想法就是"好吧,我还能做更多工作呀"。如果我能获得更多 token,那我就应该并行化更多任务。这压力非常大,因为如果你不觉得自己在 token 花费上有什么限制,那你自己就是这个系统里的瓶颈。
**Host:** [laughter]
**Andrej Karpathy:** 没错,至少应该把订阅额度用完吧。理想情况下还应该同时用多个 agent。如果 Codex 的额度用完了,你应该切换到 Claude 之类的。我一直在尝试这么做,如果发现自己还有订阅额度剩余就会感到紧张——那说明我没有把 token 吞吐量最大化。我在读博士的时候其实也有类似的体验:当 GPU 没在跑任务的时候你会感到紧张。你有 GPU 算力但没最大化你可用的浮点运算量。但现在不是关于浮点运算了,而是关于 token。你的 token 吞吐量是多少,你掌控着多少 token 吞吐量?
**Andrej Karpathy:** Yeah, what does mastery look like, right? At the end of the year or like two, three years, five years, 10 years, etc. Well, I think everyone is basically interested in like going up the stack. So I would say it's yeah, it's not about a single session with your agent. Multiple agents, how do they collaborate and teams and so on. So everyone's trying to figure out what that looks like. And then I would say Claude is also kind of an interesting direction because it really, when I say a Claude, I mean this like layer that kind of takes persistence to a whole new level. Like it's something that like keeps looping. It's it's like um it's not something that you are interactively in the middle of. It kind of like has its own little sandbox, its own little you know, it kind of like does stuff on your behalf even if you're not looking kind of thing. Um and then also has like maybe more sophisticated memory systems etc. that are not yet implemented in agents. So um Open Claude has a lot more sophisticated memory I would say than what you would get by default uh which is just a memory compaction when your context runs out, right?
**Host:** 我觉得很有意思的是,过去至少有 10 年,在很多工程任务中人们并不觉得算力受限。而现在整个行业都感受到了。他们以前觉得资源受限,现在有了这个巨大的能力跳跃,你会发现"哦,其实不是我访问计算机的能力限制了我——我自己才是那个约束。"
**Host:** You think that's the piece that resonated for more users versus like perhaps like broader tool access? For Open Claude?
**Andrej Karpathy:** 对,是技术问题。这其实很令人振奋,因为你可以变得更好。所以这就是它让人上瘾的原因——当你进步的时候,会有新的解锁。
**Andrej Karpathy:** Yeah. Uh there's like I think there's at least five things that are really good ideas in here. Yeah, good job, Peter. I mean Peter has done a really amazing job. Um I saw him recently. Uh and I talked to him about it and I he's very humble about it. But I think he innovated simultaneously in like five different ways and put it all together. Um so for example like the soul and D document. Like he actually really crafted a personality that is kind of compelling and interesting. And I feel like a lot of the current agents they don't get this correctly. I actually think a Claude has a pretty good personality. It feels like a teammate uh and it's excited with you etc. I would say um for example Codex is a lot more dry um which is kind of interesting because [laughter] in it's true. You know, it doesn't it and the other thing I would say is for example with Claude I think they dialed the sycophancy fairly well where when Claude gives me praise, I do feel like I slightly deserve it because sometimes I kind of give it like not very well formed thoughts and uh I give it an idea that I don't think it's fully baked and it doesn't actually react very strongly. It's like, oh yeah, we can implement that. But when it's a really good idea by my own account, it does uh seem to reward it a bit more. And so I kind of feel like I'm trying to like earn its praise which is really weird. And so I do think the personality matters a lot uh and I think a lot of the other uh tools maybe don't appreciate it as much. And I think in this aspect also Peter really cares about this and so that was correct. And then the memory system and then uh just, you know, he's just having fun with this um and then the the single WhatsApp portal to all of the automation.
**Host:** 你觉得接下来会怎么发展?如果你想象一下,Andrej 和所有人每天 16 小时在迭代,越来越擅长使用代码 agent,一年之后会是什么样子?达到精通会是什么样的?
**Host:** Yeah. Is there something that you have done personally with your claws beyond software engineering that you think is fun or interesting?
**Andrej Karpathy:** [笑] 对,精通是什么样的呢?在年底,或者两三年、五年、十年后。我觉得每个人基本上都在往上走。不只是跟你的 agent 进行单次会话,而是多个 agent、它们如何协作、团队化等等。大家都在试图弄清楚那是什么样子。然后我觉得 claw 也是一个有意思的方向,因为它真的把持久性提升到了一个全新的水平。它不是那种你需要待在中间进行交互的东西。它有自己的小沙箱,有自己的空间,在你不看着的时候也在替你干活。而且可能还有更复杂的记忆系统,虽然在 agent 里还没完全实现。Open Claude 的记忆系统比你默认得到的要复杂得多——默认的只是在上下文用完时做一个记忆压缩。
**Andrej Karpathy:** Yeah, so in January I had a claw I went through a period of claw psychosis. So I built um I have a claw basically that takes care of my home and I call him Dobby the elf uh claw. Um and uh basically I used uh the agents to find all of the smart home subsystems of my home on the local area network which I was kind of surprised that it worked out of the box. Like I just told it that I think I have Sonos at home. Like can you try to find it? And it goes and it did like IP scan of all of the um basically um computers on the local area network and and found the Sonos thing uh the Sonos uh, system and it turned out that there's no password protection or anything like that. It just logged in and it's like, "Oh, yeah, you have these Sonos systems installed. I Let me try to reverse engineer how it's working." It does some web searches and it finds like, "Okay, these are the API endpoints." And then it's like, "Do you want to try it?" And I'm like, "Whoa, like you just did that." And I'm like, "Yeah, can you try to play something in the study?" And, uh, it does and music comes out and I'm like, "I can't believe I just That's crazy. That's like three prompts. Yeah.
**Host:** 你觉得更让用户产生共鸣的是记忆这一块,还是更广泛的工具接入?对于 Open Claude 来说。
**Host:** I can't believe I just typed in like, "Can you find my Sonos?" and then suddenly it's playing music. And it did the same for lights. And so like it kind of hacked in, figured out the whole thing, uh, created APIs, created dashboard so I could see the command, uh, kind of center of like all of my lights in the home. And then it was like switching lights on and off and, you know, so I can ask it like, "Dobby, it's sleepy time." And when it's sleepy time that just means all the lights go off, etc. and like so on. So it controls all of my lights, my HVAC, my shades, uh, the pool and, uh, the spa and also my security system. So I have a camera pointed outside of the house and anytime someone rolls in I have a Quinn, uh, a Quinn, uh, model that looks at the videos. So first of all there's change detection. Right.
**Andrej Karpathy:** 我觉得这里面至少有五个点子做得非常好。Peter 干得真的很出色。我最近见了他,聊了这个项目,他本人非常谦虚。但我觉得他同时在至少五个不同维度上进行了创新,然后把它们整合在一起。比如那个"灵魂文档"(soul document),他真的精心打造了一个引人入胜、有吸引力的人格。我觉得很多现有的 agent 在这一点上做得不对。我其实觉得 Claude 的人格做得挺好的——感觉像一个队友,会跟你一起兴奋。相比之下,比如 Codex 就干巴巴的多了。
**Andrej Karpathy:** And then based on change detection it goes to Quinn and then it actually like tells me, um, it sends me a text to my WhatsApp. It shows an image from the outside and it says, "Hey, a FedEx truck just pulled up. FedEx truck just pulled up and you might want to check it and you got new mail or something like that." And Dobby just text me this. This is really incredible. Um, so so Dobby is in charge of the house. I text through with it through WhatsApp, um, and it's been like really fun to have these macro actions that maintain my house. I haven't like really pushed it, uh, like way more beyond that and I think people are doing a lot more crazy things with it, uh, but for me even just the home automation setup I used to use like six apps, uh, completely different apps and I don't have to use these apps anymore. Like Dobby controls everything in natural language. It's amazing. Um, and so I think like I haven't even pushed the paradigm fully but already that is so helpful and so inspiring I would say.
**Host:** [笑]
**Host:** Do you think that's indicative of like what people want from a user experience perspective with software, right? Because I I don't think, you know, it's pretty ignored that it takes humans effort to like learn new software, like new UI.
**Andrej Karpathy:** 确实如此。而且另外一点,我觉得 Claude 在谄媚度上调得相当好。当 Claude 表扬我的时候,我确实感觉自己稍微配得上那个表扬,因为有时候我给它的想法并不太成熟,我自己都觉得没想透,它也不会反应很强烈,就只是说"好的,我们可以实现这个。"但当我自己也认为是个很好的想法时,它确实会给予更多认可。所以我有点像是在努力赢得它的表扬,这真的很奇怪。人格真的很重要,我觉得很多其他工具可能没那么重视这一点。Peter 在这方面确实很用心,这是对的。然后是记忆系统,还有就是他在玩得开心的过程中做出来的东西,还有那个通过 WhatsApp 进入所有自动化的统一入口。
**Andrej Karpathy:** Yeah. I think, uh, to some extent that's right. It's like working backwards from how people think an AI should be because what people have in their mind of like what an AI is is not actually what an LLM is by by like in the raw sense. Like LLM is a token generator, you know, like more tokens come out. But what they think of is like this this persona identity that they can tell stuff and it remembers it, you know? And, uh, it's just kind of an entity behind the WhatsApp. It's like a lot more understandable. Mhm. Uh, so I think to some extent it's like matching the expectations that humans already have for what an AI should behave but under the hood it's like a lot of technical details go into that. And LLMs are too raw of a primitive, uh, to actually, um, type check as AI I think for most people if that makes sense.
**Host:** 你个人有没有用 claw 做过软件工程以外的、觉得好玩或有意思的事情?
**Host:** Yeah. Um, I think that's like how we understand what the AI is and like the, um, description of it as Dobby or some persona obviously resonates with people. Um, I also think that it it uh, the unification that you did across your six different software systems for your home automation speaks to a different question of like do people really want all of the software that we have today? Yeah. Right? Um, because I I would argue like, well, you have the hardware but you've now thrown away the software or the UX layer of it. Um, do you think that's what people want?
**Andrej Karpathy:** 有的。一月份的时候我经历了一段 claw 精神错乱期。我基本上造了一个 claw 来管理我的家,我管它叫 Dobby 家养小精灵 claw。我用 agent 去发现家里所有的智能家居子系统在局域网上的位置,我挺惊讶它开箱就能用的。我就跟它说,我觉得家里有 Sonos,你能不能试着找到它?它就去做了局域网的 IP 扫描,扫描了所有本地网络上的设备,然后找到了 Sonos 系统。结果发现没有密码保护之类的,它直接就登进去了,说:"哦,你安装了这些 Sonos 系统。让我试着逆向工程一下它是怎么工作的。" 它做了一些网络搜索,找到了 API 端点,然后说"你想试试吗?" 我说:"天哪,你刚才做了什么。" 然后我说:"你能在书房放首歌吗?" 它就放了,音乐响了,我简直不敢相信。
**Andrej Karpathy:** Yeah, I think there's this like there's this sense that these apps that are on the app store for using these smart home devices, etc. Uh, these shouldn't even exist kind of in a certain sense. Like shouldn't it just be APIs and shouldn't agents be just using it directly? And, um, wouldn't it like I can do all kinds of home automation stuff that, uh, in any individual app will not be able to do, right? Um, and an LLM can actually drive the tools and call all the right tools and do uh, do pretty complicated things. Um, and so in a certain sense it does point to this like maybe there's like an overproduction of lots of custom bespoke apps that shouldn't exist because agents kind of like crumble them up and everything should be a lot more just like exposed API endpoints and agents are the glue of the intelligence that actually like tool calls all the all the parts. Um, another example is like my treadmill. Uh, there's an app for my treadmill and I wanted to like keep track of how often I do my cardio, uh, but like I don't want to like log into web UI and go through a flow and etc. Like all this should just be like make APIs available and this is kind of, you know, going towards the agentic, um, sort of web or like agent first, uh, tools and all this kind of stuff. So I think the industry just has to reconfigure in so many ways that's like the customer is not the human anymore. It's like agents who are acting on behalf of humans and this refactoring will be will probably be substantial in a certain sense. One way that people sometimes push back on this is like, do people Do you Do we expect people to write code some of these tools? Do we expect normal people to do this kind of stuff that I described? Mhm. But I think to some extent this is just, you know, technology as it exists today and right now there is some write coding and I'm actually watching it and I'm working with the system but I kind of feel like this kind of stuff that I just talked about this should be free like in a year or two or three. There's no write coding involved. This is trivial. This is table stakes. This is like any AI, even the open source models, etc. can like do this. You should be able to translate it from a less technical humans intent very easily to this outcome.
**Host:** 就三条提示词就搞定了,太疯狂了。
**Host:** Yeah. Today it's write coding and it's involved and not many people are going to do it but
**Andrej Karpathy:** 我不敢相信我只是打了一句"你能找到我的 Sonos 吗?"然后突然就放起了音乐。灯也是一样的。它基本上就是黑进去了,搞清楚了整个体系,创建了 API,建了一个仪表盘让我能看到所有灯的控制中心。然后就是开灯关灯,我可以跟它说"Dobby,该睡觉了",到了睡觉时间所有灯就全关了,诸如此类。它控制着我所有的灯、暖通空调、窗帘、泳池和水疗,还有安防系统。我有一个摄像头对着房子外面,只要有人靠近,我有一个 Qwen 模型来看视频。首先是变化检测——
**Andrej Karpathy:** And you still have to make some design decisions, right? We were talking about like we take frames for example. Yeah. Yeah. But I kind of feel like this will just, uh, start to the barrier will just come down and it's just ephemeral software on your behalf and some kind of like claw is handling all the details for you but you're not involved. Claw has a Claw has a machine and it will figure it out and it's just presenting you UIs and you're like saying stuff, you know? Mhm.
**Host:** 然后——
**Host:** Why haven't you, um, I guess like pushed the boundaries of what you can do personally with claws? Like is it, you know, you're focusing on more important projects, auto research, etc. or, uh, you're climbing the hill to mastery or something else, right?
**Andrej Karpathy:** 然后根据变化检测传给 Qwen,它就会给我发 WhatsApp 消息。显示一张外面的图片,然后说"嘿,一辆 FedEx 的卡车刚停下来了,你可能需要去看看,你有新包裹之类的。" Dobby 就这样给我发消息。真的太不可思议了。所以 Dobby 管着整个房子。我通过 WhatsApp 跟它交流,用这些宏观动作来维护我的家真的很有趣。我还没把它用到极致——我觉得别人在做更疯狂的事情——但光是这个家居自动化的设置,我以前要用六个完全不同的 App,现在不需要了。Dobby 用自然语言控制一切。太棒了。虽然我都还没完全发挥这个范式的潜力,但已经非常有用、非常鼓舞人心了。
**Andrej Karpathy:** Yeah, I just feel like I'm so distracted by everything so I spend I [laughter] spend like a week on the claw stuff and I I have more to do almost, um, but I will say that, um,
**Host:** 你觉得这是否代表了人们在软件用户体验上真正想要的东西?因为我觉得人们学习新软件、新 UI 所花费的精力,这件事一直被低估了。
**Host:** It's like Jensen told us we're all just busier, unfortunately.
**Andrej Karpathy:** 我觉得在某种程度上确实如此。这其实是从人们心目中 AI 应该是什么样子倒推出来的。人们脑海中的 AI,其实跟 LLM 的原始形态不一样。LLM 就是个 token 生成器——更多 token 出来。但人们想到的是一个有人格、有身份的实体,你可以告诉它事情,它会记住。它就是 WhatsApp 背后的一个实体,这好理解得多。所以在某种程度上,这是在匹配人们对 AI 已有的期望,但在底层有大量技术细节。LLM 作为原始形态太"生"了,对大多数人来说不太算是"AI"——如果这么说有道理的话。
**Andrej Karpathy:** Uh, I didn't really take advantage of a lot of like email and calendar and all this other stuff and I didn't really have access cuz I'm still a little bit like suspicious and it's still very new and rough around the edges. So I didn't want to give it like full access to my digital life yet and part of it is just the security, privacy and uh, just being very cautious in that in that realm. And, um, so some of it is like held back by that I would say. Yeah, maybe that's like the dominant dominant feature but some of it is also just I feel so distracted because I feel like I had a week of claw and then other stuff is happening and
**Host:** 我觉得你在六个不同的家居软件系统之上做的统一,其实指向一个不同的问题:人们真的想要我们今天拥有的所有这些软件吗?因为你保留了硬件,但把软件或者说 UX 层给扔掉了。你觉得人们想要的就是这个吗?
**Host:** What was the, um, I mean you've talked about like being able to train or at least optimize a uh, a a model as a task you want to see agents do for a long time. Like what was the motivation behind auto research? Auto research, yeah. So I think like I had a tweet earlier where I kind of like said something along the lines of to get the most out of the tools that have become available now you have to remove yourself as the as the bottleneck. You can't be there to prompt the next thing. You're You need to take yourself outside. Um, you have to arrange things such that they're completely autonomous. And the more you you know, how can you maximize your token throughput and not be in the loop? This is the this is the goal. And so I kind of mentioned that the the name of the game now is to increase your leverage. Uh, I put in just very few tokens just once in a while and a huge amount of stuff happens on my behalf. And so auto rese
**Andrej Karpathy:** 是的,我觉得有这样一种感觉——App Store 上那些用来控制智能家居设备的 App,在某种意义上根本不应该存在。难道不应该只是 API、让 agent 直接调用吗?而且我可以做到任何单个 App 做不到的各种家居自动化。LLM 可以驱动工具、调用所有对的工具、做相当复杂的事情。所以在某种意义上,这确实指向了一个方向:也许存在大量过度生产的定制 App,它们根本不应该存在,因为 agent 可以把它们揉在一起。一切都应该更多地只是暴露 API 端点,agent 作为智能粘合剂来做工具调用,把所有部分串起来。另一个例子是我的跑步机。有一个跑步机 App,我想记录我做有氧运动的频率,但我不想登录一个网页 UI、走一堆流程之类的。这些都应该只是开放 API。这就是走向 agentic 化的网络、agent 优先的工具的方向。我觉得整个行业需要在很多方面进行重构——客户不再是人类了,而是代表人类行事的 agent。这个重构会相当巨大。
有人会反驳说:我们真的期望普通人去写代码来实现这些工具吗?我觉得在某种程度上,这只是当下技术的现状。现在确实需要写一些代码,我也在看着、在跟系统协作。但我觉得我刚才描述的这些东西,一两年或者三年之后应该是免费的。不需要写代码。这是最基本的。这是理所当然的。即使是开源模型也能做到。你应该能很容易地把一个不太懂技术的人的意图转化为这个结果。
**Andrej Karpathy:** arch like I tweeted that and I think people liked it and whatnot but they haven't like maybe worked through like the implications of that and for me auto research is an example of like an implication of that. Where it's like I don't want to be like the researcher in loop like looking at results, etc. Like I'm I'm holding the system back. So the question is how do I refactor all the abstractions so that I'm not — I have to arrange it once and hit go. The name of the game is how can you get more agents running for longer periods of time without your involvement doing stuff on your behalf? And auto research is just, yeah, here's an objective, here's a metric, here's your boundaries of what you can and cannot do. And go. And, uh, yeah, it worked.
**Host:** 是的。今天确实需要写代码,而且不简单,大多数人不会去做。而且你还得做一些设计决策,对吧?我们之前聊到比如取帧的问题。
**Host:** Are you surprised at its effectiveness?
**Andrej Karpathy:** 对。但我觉得这个门槛会持续降低。它就是为你服务的临时软件,某种 claw 在处理所有细节,但你不需要参与。Claw 有一台机器,它会搞定的,只是给你呈现 UI,你说几句话就行了。
**Andrej Karpathy:** Yeah, I I didn't expect, uh, it to work because so I have the project data chat, um, and fundamentally like I think a lot of people are very confused with my obsession for like training GPT-2 models and so on. But for me, uh, training GPT models and so on is just a little harness, a little playground for training LLMs. And fundamentally what I'm more interested in is like this idea of recursive self-improvement and to what extent you can actually have LLMs improving LLMs because I think all the frontier labs this is like the thing — uh, for obvious reasons and they're all trying to recursively self-improve roughly speaking. And so for me this is kind of like, um, a little playpen of that. Um, and I guess I like tuned Nan Chat already quite a bit by hand in the good old fashion way that I'm used to. Like I'm a researcher. I've done this for like, you know, two decades. I have some amount of like — what is the opposite of hubris? Uh, yeah.
**Host:** 你为什么没有更进一步去挑战 claw 的边界呢?是因为你在专注更重要的项目、auto research 之类的,还是你在攀登通向精通的山坡,或者其他原因?
**Host:** Earned confidence?
**Andrej Karpathy:** 我就是太容易分心了。我花了大概一周时间搞 claw 的东西,然后还有更多要做的——
**Andrej Karpathy:** Okay. I have like two decades of like, "Oh, I've trained this model like thousands of times. I've like, um, so I've done a bunch of experiments. I've done hyperparameter tuning. I've done all the things I'm very used to and I've done for two decades. Yeah. And I've gotten to a certain point and I thought it was like fairly well tuned and then I let auto research go for like overnight and it came back with like tunings that I didn't see.
**Host:** 就像 Jensen 跟我们说的,大家只是变得更忙了,很不幸。
**Host:** Mhm.
**Andrej Karpathy:** 而且我确实没怎么利用邮件、日历等那些功能,因为我还是有点不放心,这些东西还很新、还有很多粗糙的地方。所以我不想给它完全访问我的数字生活,一部分原因是安全和隐私方面的谨慎。这是一个阻碍因素。但也有一部分原因就是我太容易分心了——搞了一周 claw,然后其他事情又来了。
**Andrej Karpathy:** And yeah, I did forget like the weight decay on the value embeddings and my Adam betas were not sufficiently tuned and these things just jointly interact. So like once you tune one thing the other things have to potentially change too. You know, I shouldn't be a bottleneck. I shouldn't be running these hyperparameter optimizations. I shouldn't be looking at the results. There's objective criteria in this case. Uh, so you just — you just have to arrange it so that it can just go forever. So that's a single sort of version of auto research of like a single loop trying to improve. And I was surprised that it, um, it found these things that I — you know, the repo was already fairly well tuned and still found something. And that's just a single — it's a single loop. Like these frontier labs they have GPU clusters of tens of thousands of them. And so it's very easy to imagine how you would basically get a lot of this automation on, um, smaller models. And fundamentally everything around like frontier level intelligence is about extrapolation and scaling loss. And so you basically do a ton of the exploration on the smaller models and then you try to, um, extrapolate out.
**Host:** 你之前提到过,你一直想看到 agent 能做训练模型或者至少优化模型的任务。auto research 的动机是什么?
**Host:** So you're saying our research efforts are going to get more efficient. Like we're going to have better direction for when we scale as well if we can do this experimentation better.
**Andrej Karpathy:** auto research,对。我之前发过一条推,大意是说,要最大程度利用现在这些工具,你就必须把自己从瓶颈位置移除。你不能在那儿等着提示下一步。你需要把自己拿出来。你必须把事情安排成完全自主的。你越是能做到——怎样才能最大化你的 token 吞吐量而不在回路之中?这就是目标。所以我说现在的游戏名称是增加你的杠杆:我偶尔只投入很少的 token,但大量的事情在我的名义下发生。auto research 就是这个想法的一个具体体现。我不想做那种"研究者在回路中"的工作——看结果之类的——我是在拖系统后腿。问题是我怎么重构所有抽象层,让我只需要安排一次然后按下"开始"。游戏的核心是:你怎么让更多 agent 运行更长时间、不需要你参与、替你做事?auto research 就是这样——这是目标,这是指标,这是你能做和不能做的边界。开始吧。然后,它确实管用了。
**Andrej Karpathy:** Yeah, I would say that like the most interesting project and probably what the frontier labs are working on is uh —
**Host:** 你对它的效果感到惊讶吗?
**Host:** Mhm.
**Andrej Karpathy:** 是的,我没想到它能行。我有一个项目叫 Data Chat,我知道很多人搞不懂我为什么痴迷于训练 GPT-2 之类的小模型。但对我来说,训练 GPT 模型只是一个小工具、一个训练 LLM 的游乐场。我更感兴趣的是递归自我改进这个概念——LLM 在多大程度上真的可以改进 LLM。我觉得所有前沿实验室在做的核心就是这件事,原因很明显,它们都在试图进行递归自我改进。所以对我来说这就像一个小小的试验场。我已经用传统方式手动调优了 Nano Chat 相当多了——我是研究员,做这个做了差不多二十年。我有一定程度的……谦虚的反面是什么?
**Andrej Karpathy:** Yeah. You know, you experiment on the smaller models. You try to make it as autonomous as possible. Remove researchers —
**Host:** [笑] 赢得的自信?
**Host:** [laughter]
**Andrej Karpathy:** 好吧。我有大概二十年的经验——"我训练过这个模型几千次了",做过大量实验、做过超参数调优,做过所有这些我非常熟练、做了二十年的事情。我调到了一个我认为已经调得相当好的点。然后我让 auto research 跑了一个晚上,它回来时发现了一些我没看到的调优点。是的,我确实忘了 value embeddings 上的 weight decay,我的 Adam betas 也没调够好,而且这些东西是联合交互的——你调了一个,其他的也可能需要跟着变。我不应该是瓶颈。我不应该去手动跑这些超参数优化。我不应该看结果。这种情况有客观标准。所以你只需要把它安排好让它可以一直跑下去。
这只是 auto research 的一个单线程版本——一个循环在试图改进。我很惊讶它在一个已经调得相当好的仓库里仍然能找到东西。而这只是单线程。这些前沿实验室有几万张 GPU 的集群。很容易想象你可以把大量这种自动化放在更小的模型上。而前沿级别智能的一切本质上都是关于外推和损失缩放的。所以你在小模型上做大量探索,然后试图外推。
**Andrej Karpathy:** — from the loop. They have way too much — what is the — what is the opposite of too much confidence? Yeah, yeah, they don't know. They shouldn't be touching any of this really. And so you have to like rewrite the whole thing because right now, I mean certainly they can contribute ideas. But okay, they shouldn't actually be enacting these ideas. There is a queue of ideas and there's maybe an automated scientist that comes up with ideas based on all the archive papers and GitHub repos and it funnels ideas in or researchers can contribute ideas, but it's a single queue and there is workers that pull items and they try them out. And whatever works just gets sort of put on the feature branch and maybe some people like monitor the feature branch and merge to the main branch sometimes. So yeah, just removing humans from all the processes and automating as much as possible and getting high token tokens per second throughputs and it does require rethinking of all the abstractions and everything has to be reshuffled. So yeah, I think it's very exciting. If we take one more recursive step here, when is the model going to write a better program MD than you?
**Host:** 所以你是说我们的研究工作会变得更高效?如果我们能更好地做这种实验,当我们去做规模化的时候也会有更好的方向?
**Host:** Yeah. Also program MD is like —
**Andrej Karpathy:** 我觉得最有意思的项目——可能也是前沿实验室正在做的——就是在小模型上实验,尽可能让它自主化,把研究员——
**Andrej Karpathy:** loop.
**Host:** [笑]
**Host:** Yeah, exactly.
**Andrej Karpathy:** ——从回路中移除。他们有太多的……过度自信的反面是什么?总之,他们不太知道。他们其实不应该碰这些东西。你得把整个流程重写,因为目前——当然他们可以贡献想法。但他们不应该亲自去实施这些想法。应该有一个想法队列,可能还有一个自动化的科学家基于所有 arXiv 论文和 GitHub 仓库来生成想法,或者研究员贡献想法到一个统一的队列里。然后有工人从队列中取出任务去尝试。管用的就放到 feature branch 上,可能有人监控 feature branch,偶尔合并到主分支。总之就是尽量把人从所有流程中移除,尽可能自动化,获得高 token 每秒的吞吐量。这需要重新思考所有抽象层,一切都需要重新排列。所以是的,我觉得这非常令人兴奋。
**Andrej Karpathy:** Yeah. So program MD is my crappy attempt at describing like how the auto researcher should work. Like oh, do this then do that and that and then try these kinds of ideas and then here's maybe some ideas like look at architecture, look at optimizer, etc. But I just came up with this in markdown, right?
**Host:** 如果我们再往上递归一层——什么时候模型会写出比你更好的 program MD 呢?
**Host:** Mhm. And so yeah, exactly. You want some kind of an auto research loop maybe that looks for — you can imagine that different program MDs would give you different progress. So you basically — every research organization is described by program MD. A research organization is a set of markdown files that describe all the roles and how the whole thing connects. And you can imagine having a better research organization. So maybe they do fewer stand-ups in the morning because they're useless. And this is all just code, right? And so you can — so one organization can have fewer stand-ups, one organization can have more. One organization can be very risk-taking, one organization can be less. You can definitely imagine that you have multiple research orgs and then they all have code. And once you have code, then you can imagine tuning the code. Did you see my text about my contest idea? My contest idea was like let people write different program MDs, right? And so for same hardware, where do you get most improvement?
**Andrej Karpathy:** 是的。而且 program MD 本身就是——
**Andrej Karpathy:** Oh, I see.
**Host:** 循环。对,没错。
**Host:** And then you can take all that data and then give it to the model and say write a better program MD.
**Andrej Karpathy:** 对。program MD 是我写的一个粗糙版本,描述 auto researcher 应该怎么工作——先做这个、再做那个、试试这些方向、也许看看架构、看看优化器之类的。但这就是我用 markdown 随手写的,对吧?所以没错,你会想要某种 auto research 循环来寻找——你可以想象不同的 program MD 会带来不同的进展。所以基本上每个研究组织都可以用 program MD 来描述。一个研究组织就是一组 markdown 文件,描述所有的角色以及整个系统如何连接。你可以想象一个更好的研究组织——也许他们早上少开些站会,因为那没用。这一切都只是代码。所以一个组织可以少开站会,一个多开;一个可以更冒险,一个更保守。你完全可以想象有多个研究组织,它们都有代码。有了代码,你就可以想象去调优这些代码。所以百分之百存在这样一个元层。
**Andrej Karpathy:** Yes, yes. Yeah, exactly.
**Host:** 你看到我发的那个关于竞赛的想法了吗?我的想法是让大家写不同的 program MD,对吧?在相同的硬件下,谁能获得最多改进?
**Host:** We're going to get something better. Like there's no way we don't, right?
**Andrej Karpathy:** 哦,我明白了。
**Andrej Karpathy:** 100% look at where the improvements came from and like can I change the program MD such that more of these kinds of things would be done or like things that didn't work — except you can 100% imagine doing that. So I think this is a great idea, but it's like you know, I think like you can sort of go one step at a time where you sort of have one process and then second process and then the next process and these are all layers of an onion. Like the LLM sort of part is now taken for granted. The agent part is now taken for granted. Now the claw-like entities are taken for granted and now you can have multiple of them and now you can have instructions to them and now you can have optimization over the instructions and it's just like a little too much, you know, but I mean this is why it gets to the psychosis is that this is like infinite and everything is scale issue and that's why I feel like — yeah, that's just coming back to — this is why it's so insane.
**Host:** 然后你可以把所有这些数据拿去给模型,让它写一个更好的 program MD。
**Host:** Okay, well — [laughter] — we're we're just trying to like diagnose the current moment and what is a relevant skill right now. What do you think is the implication that this is the loop we should be trying to achieve in different areas and then it works, right? Like you know, remove — create the metric or create the ability for agents to continue working on it without you. Do we still have performance engineering? Like what —
**Andrej Karpathy:** 是的,没错。
**Andrej Karpathy:** Yeah, I mean so there's a few caveats that I would put on top of the LLM psychosis. So number one, this is extremely well suited to anything that has objective metrics that are easy to evaluate. So for example, like writing kernels for more efficient CUDA, you know, code for various parts of the model, etc. are a perfect fit because you have inefficient code and then you want efficient code that has the exact same behavior but it's much faster. Perfect fit. So a lot of things like like are perfect fit for auto research, but many things will not be. And so if you can't evaluate then you can't auto research it, right? So that's like caveat number one. And then maybe caveat number two I would say is — you know, we're we're kind of talking about the next steps and we kind of see what the next steps are, but fundamentally the the whole thing still doesn't — it still kind of like bursting at the seams a little bit and there's cracks and it doesn't fully work and if you kind of try to go too far ahead, the whole thing is actually net not useful if that makes sense. Because these models like still are not — you know, they've improved a lot, but they're still are like rough around the edges is maybe the way I would describe it. I simultaneously feel like I'm talking to an extremely brilliant PhD student who's been like a systems programmer for their entire life and a 10-year-old. And it's so weird because humans like there's like — I feel like they're a lot more coupled like you have to — you know, um —
**Host:** 我们肯定会得到更好的东西,对吧?没有理由不会。
**Host:** Yes, you wouldn't — you wouldn't encounter that combination.
**Andrej Karpathy:** 百分之百。看看改进从哪里来的,然后思考"我能不能调整 program MD,让更多这类的事情被执行"或者"那些没用的东西——"你完全可以想象这么做。我觉得这是个很好的想法,但你知道,我觉得你可以一步一步来:先有一个流程,然后第二个流程,然后下一个流程,这些都是洋葱的层。LLM 那部分现在被视为理所当然了。agent 那部分现在也被视为理所当然了。现在 claw 这样的实体被视为理所当然了,你可以有多个,你可以给它们指令,你可以对指令进行优化——这有点太多了。但我说这就是为什么它会让人精神错乱——因为这是无限的,一切都是技术问题。
**Andrej Karpathy:** This jaggedness is really strange and humans have a lot less of that kind of jaggedness, although they definitely have some.
**Host:** 好吧——[笑]——我们试着诊断一下当下的状态。如果说这就是我们在不同领域应该追求的循环,而且它确实管用——创建指标,或者创建让 agent 可以在没有你的情况下继续工作的能力——那我们还需要性能工程吗?
**Host:** [laughter]
**Andrej Karpathy:** 是的,我得在 LLM 精神错乱之上加几个注意事项。第一,这非常适合任何有客观指标且容易评估的东西。比如为模型的各个部分编写更高效的 CUDA kernel,就是完美的适用场景——你有低效的代码,然后你想要行为完全相同但快很多的高效代码。完美适配。很多东西非常适合 auto research,但很多东西不适合。如果你没法评估,你就没法做 auto research。这是第一个注意事项。第二个注意事项是——我们一直在谈下一步,也看得到下一步是什么,但整个系统根本上还是有些"爆边"的感觉,还有裂缝,还没完全能用。如果你试图走得太远,整件事实际上净效果是没用的——如果这么说有道理的话。
因为这些模型虽然改进了很多,但还是有粗糙的地方。我同时觉得我在跟一个极其聪明、一辈子都在做系统编程的博士生说话,同时又在跟一个十岁的孩子说话。这太奇怪了,因为人类的能力是更耦合的——你不会遇到这种组合。
**Andrej Karpathy:** But humans have a lot more jaggedness. Uh sorry, the agents have a lot more jaggedness where sometimes like you know, I ask for functionality and it like comes back with something that's just like totally wrong and then we get into loops that are totally wrong and then I'm just — I get so frustrated with the agents all the time still because you feel the power of it, but you also — there's still like it does not say statistical things once in a while for me as well.
**Host:** 这种参差感(jaggedness)确实很奇怪。人类有一些参差,但——
**Host:** I get very annoyed [clears throat] when I feel like the agent wasted a lot of compute on something it should have recognized was an obvious problem.
**Andrej Karpathy:** agent 的参差要大得多。有时候我请求一个功能,它回来的东西完全不对,然后我们就陷入完全错误的循环,我对 agent 经常极度沮丧——因为你感受到了它的力量,但同时它偶尔还是会犯一些令人发疯的错误。
**Andrej Karpathy:** Yeah. I think like some of the bigger things is like maybe what's underneath it if I could hypothesize is fundamentally these models are trained via reinforcement learning. So they're actually struggling with the exact same thing we just talked about which is the labs can improve the models in anything that is verifiable or that [clears throat] has rewards. So did you write the program correctly and does it — do you — the unit tests check out? Yes or no. But some of the things where they're struggling is like for example, I think they have a tough time with like nuance of maybe what I what I had in mind or what I intended and when to ask clarifying questions. Um or like what I — yeah, it's just um anything that feels softer is like worse. And so you're kind of like — you're either on rails and you're part of the super intelligence circuits or you're not on rails and you're outside of the verifiable domains and suddenly everything kind of just like meanders. Like maybe another way to put it is if you go to — if today if you go to like state-of-the-art model, ChatGPT and you ask it tell me a joke, um do you know what joke you're going to get?
**Host:** 我也会——当我觉得 agent 在它应该一眼看出的问题上浪费了大量算力时,我会非常恼火。
**Host:** There's the joke. The joke? I do feel — I I I can't tell you like the you know, standard form of it, but I do feel like ChatGPT has like three jokes. Yeah, yeah. So the the joke that apparently all the LLMs like love the most is why do scientists not trust atoms? Okay. Because they make everything up. Okay. They make everything up. So this is still — emerge?
**Andrej Karpathy:** 我觉得更深层的原因,如果让我猜的话,根本上来说这些模型是通过强化学习(RL)训练的。所以它们在跟我们刚才说的同一件事做斗争——实验室能改进的是任何可验证的、有奖励的东西。你写的程序对不对,单元测试过不过?是或否。但它们挣扎的地方是——比如说,我觉得它们在理解我意图的细微之处方面有困难,或者在什么时候该问澄清性问题方面有困难。任何比较"软"的东西,表现就差一些。所以你要么在它被训练过的"轨道"上,是超级智能的一部分;要么就不在轨道上,在可验证领域之外,一切就开始游离了。
另一个说法是——如果你今天去用最先进的模型,比如 ChatGPT,让它说个笑话,你知道你会得到什么笑话吗?
**Andrej Karpathy:** So this is the joke you would get like three or four years ago and this is the joke you still get today. Okay. So even though the models have improved tremendously and if you give them an agentic task, they will just go for hours and move mountains for you. And then you ask for like a joke and it has a stupid joke. It's crappy joke from five years ago and it's because it's outside of the — it's outside of the RL. It's outside of the reinforcement learning. It's outside of what's being improved. It's like — and it's part of the jaggedness of like shouldn't you expect models as they get better to also have like better jokes or more diversity of them or — it's just it's not being optimized and stuck. Do you think that that implies that we are not seeing like generalization in the sense of like broader intelligence of joke smartness being attached to code smartness? Yeah, I think there's some decoupling where some things are verifiable and some things are not and some things are optimized for arbitrarily by the labs depending on like what data went in and some things are not and um and — But I mean the the premise — there's a you know, premise from some research groups that if you're smarter at code generation or in these verifiable fields, you should be better at everything. And like the the joke situation suggests that that's not happening at all. Okay. Yeah, I don't think that's happening. I think I think maybe we're seeing like a little bit of that, but not like a satisfying amount. Yeah, that jaggedness exists in humans. You [laughter] can be very very good at math and still tell really bad jokes. Yeah, that's true. Yeah, but it just — it still means that we're not getting like — the story is that we're getting a lot of the intelligence and capabilities in all the domains of society like for free as we get better and better models and that's not like exactly fundamentally what's going on and there's some blind spots and some things are not being optimized for and this is all clustered up in these neural net opaque models, right? So you're either on rails of what it was trained for and everything is like you're going at speed of light or you're not. And so it's the jaggedness. So um — so that's why I think like even though the the progression is obvious what should happen, you can't let it fully go there yet because it doesn't fully work or it's a scale issue and we just haven't like figured out how to use it. So you know, it's hard to tell. Can I ask a somewhat blasphemous question which is like if this jaggedness is persisting and it's all rolled up in a — at least monolithic interface, right? But you know, single model. Does that make sense or do you — should it be unbundled into things that are can be optimized and improved against different domains of intelligence? Like unbundling the models into multiple experts in different areas, etc. More directly. Yeah. Um instead of just MOE that we have no exposure to because that can be like confusing as a user from the outside which is like why is it so good at this, but not at this other thing? Yeah, I think currently my impression is the labs are trying to have a single sort of like monoculture of a model that is arbitrarily intelligent in all these different domains and they just stuff it into the parameters. I do think that we will — I do think we should expect more speciation in the intelligences. Um like, you know, the animal kingdom is extremely diverse in the brains that exist and there's lots of different niches of of nature and some animals have overdeveloped visual cortex or other kind of parts and I think we we should be able to see more speciation and um you don't need like this oracle that knows everything. You can speciate it and then you put it on a specific task and we should be seeing some of that because you should be able to have like much smaller models that still have the cognitive core — like they're still competent but then they specialize and then um and then they they can become more efficient in terms of latency or throughput on specific tasks that you really care about. Like if you're a mathematician working in Lean, I saw for example there's a few releases that really like target that as a domain. Um uh so there's a probably going to be a few examples like that where the unbundling kind of makes sense. One question I have is whether or not the capacity constraint on available compute infrastructure — Mhm.
**Host:** 我觉得 ChatGPT 大概就三个笑话。
**Host:** — drives more of this because efficiency — yeah. Actually matters more. Your — if you financing aside, though financing's involved in all of this. If you have access to full compute for anything you do like even one single model, right? But if you actually feel pressure where you're like I can't serve — Mhm. — model of massive size for every use case. Like do you think that leads to any speciation? Does that question make sense to you? The question makes sense and I guess like what I'm what I'm what I — what I'm struggling with is I don't think we've seen too much speciation just yet, right? No. Uh we're seeing a monoculture of models. Yeah. So um — and there's like clearly pressure for like make a good code model, put it back in the main, merge again. Yeah. Um even though there already is pressure on the models. Mhm. I guess perhaps I — I feel like there's a lot of very short-term supply crunch and like maybe that causes more speciation now. Yeah, I think fundamentally like the the labs are serving a model and they don't really know what the end user is going to be asking about. So maybe that's like some part of it because they kind of have to multitask over all the possible things they could be asked. But I think if you're coming to a business and maybe partnering on some specific problems you care about then maybe you would see that there. Um or there would be some very high-value applications that are like more niche. Um but I think right now they're kind of like going after the totality of what's available. I don't think that the science of manipulating the brains is like fully developed yet partly. What do you mean manipulating?
**Andrej Karpathy:** 对。所有 LLM 最爱的笑话显然是:为什么科学家不信任原子?
**Andrej Karpathy:** So like — so fine-tuning without losing capabilities as an example. And I — we don't have these primitives for actually like working with the intelligences in ways other than just context windows. Our context windows kind of just work and it's very cheap to manipulate etc. And this is how we're getting some of the customization etc. Uh but I think if it was — I think it's a — it's a bit more of a developing science of how you like more deeply adjust the models, how you have continual learning maybe or how you um — how you fine-tune in a certain area, how you get better in a certain area or like how you actually touch the weights not just the context windows. And so it's a lot more tricky I would say to touch the weights than just the context windows uh because you're actually fundamentally changing the full model and potentially its intelligence. And so um — so maybe it's just like not a fully developed science if that makes sense of speciation. And it also has to be like cheap enough — Yeah. — for that speciation to be worthwhile in these given — contexts. Can I ask a question about like an extension to auto research that you described in terms of open ground? You say okay, well, you know, we have this thing. Um we need more collaboration surface around it essentially for people to contribute to research overall. Can you talk about that? Yeah, so we talked about auto research has a single thread of like I'm going to try stuff in a loop but fundamentally the parallelization of this is like the interesting component. And I guess I was trying to like play around with a few ideas but I don't have anything that like clicks as simply as like — I don't have something I'm like super happy with just yet but it's something I'm like working on the side when I'm not working on my claw. Um so I think like one issue is if you have a bunch of nodes of parallelization available to then it's very easy to just have multiple auto researchers talking through a a common system or something like that. What I was more interested in is how you can have an untrusted pool of workers out there on the internet. Mhm. So for example in auto research you're just trying to find um the piece of code that trains a model to a very low validation loss. If anyone gives you a candidate commit, it's very easy to verify that that commit is correct — is good. Like — someone could claim from the internet that this piece of code will optimize much better and give you much better performance. You could just check. Yeah. But probably a lot of work goes into that checking. But fundamentally they could lie and etc. So you're basically dealing with a similar kind of — it's almost actually like — looks a little bit like my my designs that incorporate an untrusted pool of workers actually look a little bit more like a blockchain a little bit uh because instead of blocks you have commits and these commits can build on each other and they contain like changes to the code as you're improving it. Um and uh the proof of work is basically doing tons of experimentation to find the commits that work. Um and that's hard and then the reward is just being on the leaderboard right now. There's no monetary reward whatsoever. Uh but I don't want to push the analogy too far but it fundamentally has this issue where you a huge amount of search goes into it but it's very cheap to verify that a candidate solution is indeed good because you can just train a single you know, someone had to try 10,000 ideas but you just have to check that the thing that they produced actually works because the 99,000 of them didn't work, you know? Um and so basically long story short is like you have to come up with a system where an untrusted pool of workers can collaborate with a trusted pool of workers that do the verification. And the whole thing is kind of like asynchronous and works and and so on and it's it's like safe from a security perspective because if anyone sends you arbitrary code and you're going to run it, that is very sketchy and dodgy. So um but fundamentally it should be totally possible. So you're familiar with projects like SETI@home and Folding@home. All of these problems have a similar kind of setup. So Folding@home you're folding a protein and it's very hard to find a configuration that is low energy. But if someone finds a configuration that they value to be low energy, that's perfect. You can just use it. You can easily verify it. So a lot of things have this property that you know, very expensive to come up with but very cheap to verify. And so in all those cases things like Folding@home or SETI@home or auto research at home will be good fits. And so um long story short a swarm of agents on the internet could collaborate to improve LLMs and could potentially even like run circles around frontier labs. Like who knows, you know? Um yeah, like maybe that's even possible. Like frontier labs have a huge amount of trusted compute but the earth is much bigger and has huge amount of untrusted compute. But if you put systems in check systems in place that you know, deal with this then maybe it is possible that the swarm out there could could come up with with better with better solutions. And people kind of like contribute cycles um to to a thing that they care about. And so sorry to so the last thought is uh lots of companies or whatnot they could maybe have like their own things that they care about and you if you have compute capacity you could contribute to different kind of auto research tracks. Like maybe you care about certain you know, like you care about like cancer or something like that of certain type. You don't have to just donate money to an institution. You actually could like purchase compute and then you could join the auto research swarm for that project, you know? Uh so if everything is rebundled into auto researchers then compute becomes the thing that you're contributing to the pool.
**Host:** 好。
**Host:** Yeah. That's very inspiring and it's also interesting. Like I don't I don't know how far this goes but it is interesting that at least some audience of people you know, here in Silicon Valley or lining up at you know, retail stores in China have discovered that like having access to personal compute is interesting again. Yeah. Right? So maybe they're really motivated to do that for their claws and then they can contribute to auto research. almost like dollars the thing everyone cares about but is flop the thing that actually everyone cares about in the future? Like is there going to be like a flipening almost of like what's the thing that you care about? Like right now for example it's really hard to get compute even if you have money. Yeah. So actually it almost seems like the flop is like dominant [laughter] in a certain sense. Um Yeah, so so maybe that's kind of like that. Kind of like that. Like how much how many flops do you control instead of like what wealth you control? I don't actually think that's true but it's kind of interesting to think about. The last thing you released was like a little bit of jobs data analysis. Is that right? What and might have touched a nerve even though you're just like visualizing some public data. Yeah. Uh what was you know, what were you curious about? Yeah, I guess I was curious to um I mean everyone is like really it's everyone is really thinking about the impacts of AI on the job market and what's going to look like. So I was just interested to take a look like what does the job market look like? Where are the different roles um and how many people are in different professions? And I was like really just interested to like look through the individual cases and try to think myself about like you know, with these AIs and how they're likely to evolve like are these going to be tools that people are using? Are these going to be displacing tools for these professions? And like what are the current professions and how are they going to change? Are they going to grow or uh adjust to a large extent or like what could be new professions? So it's really just like a way to fuel my own chain of thought about the industry I suppose. Mhm. Um and so yeah, the jobs data basically is just a Bureau of Labor Statistics. They actually have um percent outlook for each profession about how much it's expected to grow over the next I think almost a decade. Uh yeah, I think it's a decade but it was made in 2024. Mhm. We need a lot of health care workers.
**Andrej Karpathy:** 因为原子组成了一切(they make everything up,双关:它们编造一切)。这个笑话是三四年前你就会得到的笑话,今天还是这个笑话。所以即使模型有了巨大的改进,你给它一个 agentic 任务,它可以连续干好几个小时,为你移山。但你要个笑话?还是五年前那个烂笑话。因为这在 RL 之外,在强化学习之外,不在被改进的范围内。这也是参差感的一部分——你难道不应该期望模型变好的同时笑话也变好、或者更多样化吗?但它就是没被优化,停滞了。
**Andrej Karpathy:** Yeah. So so they've already made those projections and I'm not sure actually 100% what the methodology was that they they put into their projections. Um I guess I was interested to color things by like if people think that what's like primarily being developed now is this kind of like more digital AI that is kind of like almost like these ghosts or spirit entities that can like interact in the digital world and manipulate a lot of like digital information and they currently don't really have a physical embodiment or presence. And the physical stuff is probably going to go slightly slower because you're manipulating atoms. So flipping flipping bits and and the ability to copy-paste digital information is like makes everything a million times faster than accelerating matter, you know, so Um so energetically, I just think we're going to see a huge amount of activity in the digital space, huge amount of rewriting, huge amount of activity, boiling soup. And I think the we're going to see something that in the digital space goes at the speed of light compared to I think what's going to happen in the physical world to some extent. If it would be the extrapolation. And so I think like [clears throat] there's currently kind of like I think overhang where there can be like a lot of unhubbling almost potentially of like a lot of digital information processing that used to be done by computers and people. And now with AIs there's like a third kind of manipulator of digital information. There's going to be a lot of refactoring in those in those disciplines. Um but the physical world is actually going to be like I think behind that by some amount of time. And so I think what's really fascinating to me is like So that's why I was highlighting the the professions that fundamentally manipulate digital information. This is work you could do from your home, etc. Uh because I feel like those will be like things will change. And it doesn't mean that there's going to be less of those jobs or more of those jobs because it does has to do with like demand elasticity and many other factors. But things will change in these professions because of these new tools and um because of this upgrade to the nervous system of the human superorganism [laughter] if you want to think about it that way. Given the look you had at the data, do you have either any observations or um uh guidance for people facing the job market or thinking about what to study now or what skills to develop? I mean we can all go get like I'm very thankful that I have to like meet people for my job right now. Yeah. [laughter] Yeah, more physical. Yeah. Could you do your work from home though?
**Host:** 你觉得这是否意味着我们没有看到真正的泛化——就是说"讲笑话的聪明"和"写代码的聪明"并没有关联?
**Host:** I could. I think there are relationship parts of it that are hard, but most of it I could. Yeah. I think it's really hard to tell because again like the job market is extremely diverse. I think the answers will probably vary, but uh to a large extent like these tools are extremely new, extremely powerful. And so just being you know, just trying to keep up with it is like the first thing. Um and um yeah, because I think a lot of people kind of like dismiss it or Or they're afraid of it.
**Andrej Karpathy:** 我觉得存在某种脱钩。有些东西可验证有些不可验证,有些东西实验室在任意优化取决于输入了什么数据,有些则没有。
**Andrej Karpathy:** Or they're afraid of it, etc. As which is totally understandable, of course. Yeah, I think like um it's fundamentally an empowering tool at the moment. Um and these jobs are bundles of tasks. And some of these tasks can go a lot faster. And so people should think of it as primarily a tool that it is right now. Um and I think the long-term future of that is uncertain. Yeah, it's kind of really hard to forecast, to be honest. And like I'm not professionally like doing that really. And I think this is a job of like economists to do properly.
**Host:** 但有些研究组的前提假设是,如果你在代码生成或可验证领域更聪明,你应该在一切方面都更好。笑话这个情况表明根本没发生这种事。
**Host:** You are an engineer though. And like one thing I thought was interesting is that like the demand for engineering jobs is continuing to increase.
**Andrej Karpathy:** 对,我不觉得这在发生。也许有一点点,但不令人满意。
**Andrej Karpathy:** Yeah. Um I I can't tell if that's like a temporary phenomenon. I'm not sure how I feel about it.
**Host:** 人类其实也有这种参差——[笑]——你可以数学非常好,但讲的笑话很烂。
**Host:** Yeah, do you know?
**Andrej Karpathy:** 对,这倒是。但这仍然意味着——那个流传的说法是我们随着模型越来越好,会免费获得社会各领域的所有智能和能力——这并不完全是实际发生的事情。有盲区,有些东西没被优化,而这一切都团在这些不透明的神经网络模型里。你要么在它被训练过的轨道上,以光速前进;要么不在轨道上。这就是参差感。所以即使进展路径很明显——应该往哪个方向走——你还不能完全放手让它去做,因为它还没完全能用,或者说是我们还没搞清楚怎么用。这很难判断。
**Andrej Karpathy:** Yeah, that's like the demand elasticity almost like uh software was scarce, right? And so the reason we don't have more demand for software is just there's its scarcity and it's too expensive.
**Host:** 我问一个有点离经叛道的问题——如果这种参差持续存在,而且都包裹在一个整体的接口里——至少是单一模型——这有道理吗?还是说应该把它拆分成可以在不同智能领域分别优化和改进的东西?更直白地说,把模型拆分成不同领域的多个专家。
**Host:** So if the barrier comes down, then actually you have the Jevons paradox, which is like you know, you actually the demand for software actually goes up. It's cheaper and there's more
**Andrej Karpathy:** 不只是我们看不到内部的 MoE(混合专家模型),因为那从外面看会让人困惑——为什么它在这个方面这么好,在那个方面却不行?我觉得目前实验室正在尝试的是做出一个单一的、在所有领域都有任意智能的"单一栽培"模型,把一切塞进参数里。我确实认为我们会——也应该期望——在智能方面看到更多的物种分化。就像动物王国中大脑的多样性极其丰富,有各种不同的生态位。有些动物的视觉皮层过度发达,有些则是其他部分。我觉得我们应该能看到更多物种分化。你不需要一个什么都知道的全知神谕。你可以让它特化,然后把它放在特定任务上。我们应该能看到更小的模型——仍然有认知核心、仍然有能力——但针对你真正在意的特定任务进行了专门化,可以在延迟或吞吐量方面变得更高效。比如如果你是一个在 Lean 里工作的数学家,我看到有一些版本专门瞄准这个领域。所以大概会有一些这样的例子,拆分是有意义的。
**Andrej Karpathy:** More powerful, yeah. The the classical example of this always is the ATMs and the bank tellers uh because there was a lot of like fear that um ATMs and computers basically uh would displace tellers. But what happened is they made like the cost of operation of of a bank branch much cheaper. And so there are more bank branches, so there are more tellers. It's like the canonical example people cite. Uh but basically it's just Jevons paradox. Like something becomes cheaper, so there's a lot of unlocked demand for it. Uh so I do think that that's probably I do have like cautiously optimistic view of this in software engineering where I do think um it does seem to me like the demand for software will be extremely large. Um and it's just become a lot cheaper. And um so I do think that for quite some time um it's very hard to forecast, but it does seem to me like right now at least locally there's going to be more demand for software. Um because software is amazing. It's like you know, digital information processing. You're not forced to use like arbitrary tools that were given to you. They're imperfect in various ways. You're not forced to subscribe to what exists. Code is now ephemeral and it can change and it can be modified. Um and so I think there's going to be a lot of activity in the digital space to like rewire everything in a certain sense. And I think it's going to create a lot of demand for for this kind of stuff. I think long-term um yeah, obviously even with auto research like OpenAI or or you know, Anthropic or these other labs like they're employing what like a thousand something researchers, right?
**Host:** 一个问题是——可用算力基础设施的产能约束——
**Host:** Mhm.
**Andrej Karpathy:** 嗯。
**Andrej Karpathy:** These researchers are basically like glorified auto like you know.
**Host:** ——是否会更多地推动这种物种分化?因为效率真的更重要了。如果抛开融资不谈——虽然融资跟所有这些都有关——如果你做任何事都能获得充足的算力,即使是单一模型。但如果你确实有压力——比如你做不到为每个用例都提供一个超大模型。你觉得这会导致物种分化吗?
**Host:** [laughter]
**Andrej Karpathy:** 这个问题有道理。我的困惑是——我们还没看到太多物种分化,对吧?我们看到的是模型的单一栽培。确实有压力说"做一个好的代码模型,然后合并回主线"之类的。也许我感觉现在存在非常短期的供给紧缩,这可能会导致更多物种分化。但我觉得根本上来说,实验室提供的模型是通用的——它们不知道终端用户会问什么。所以也许因为需要对所有可能被问到的东西多任务处理,这是一部分原因。但如果你面向一个企业、合作解决某些特定问题,那可能就会看到物种分化。或者一些高价值的细分应用。但我觉得目前实验室还是在追求全面覆盖。我不认为操纵大脑的科学已经完全成熟——
**Andrej Karpathy:** They're like automating themselves away like actively and this is like the thing they're all trying to do.
**Host:** "操纵"是什么意思?
**Host:** Yeah. I like I went around um Some of those researchers also fear that feel the psychosis, right? Because they can it's working, right? And and so they're like it's over for me, too.
**Andrej Karpathy:** 比如微调而不丢失能力。我们还没有这些原语来真正操纵智能,除了上下文窗口之外。上下文窗口确实管用,而且操作成本低。但我觉得——更深层地调整模型是一门发展中的科学——怎么进行持续学习,怎么在某个领域微调、变得更好,怎么真正去碰权重而不只是上下文窗口。碰权重比碰上下文窗口要棘手得多,因为你在根本性地改变整个模型及其智能。所以也许这只是一门还没完全成熟的科学。而且这种物种分化的成本也必须足够低,才值得在特定场景下去做。
**Andrej Karpathy:** I did spend a bunch of time going around OpenAI and I was like, you guys realize if we're successful like we're all out of job like like this is just going to we're just building automation for Sam or something like that. Like I or the board or I'm not sure, but like uh they're just building all this automation for yeah, the board or the CEO or something like that. And we're all out of our job and maybe contributing on the side. And so yeah, it's kind of like unnerving from that perspective.
**Host:** 我可以问一个关于 auto research 延伸的问题吗?你提到了开放地面的想法——我们有这个东西,但需要更多协作界面让人们参与到研究中来。你能展开说说吗?
**Host:** Is it okay if I ask you Noam's question? Mhm. You know, you could be doing that, right? Auto researching with a lot of compute scale and a bunch of colleagues at one of the frontier
**Andrej Karpathy:** 好的。我们聊到 auto research 是单线程的——在一个循环里不断尝试。但真正有趣的是并行化。我一直在摸索一些想法,但还没有一个让我觉得足够简洁、足够满意的方案,这是我在不搞 claw 的时候在旁边研究的事情。如果你有一堆并行化的节点,让多个 auto researcher 通过某种公共系统交流是很容易的。但我更感兴趣的是——你怎么利用互联网上一个不受信任的工人池。比如在 auto research 中,你试图找到一段能把模型训练到很低验证损失的代码。如果任何人给你一个候选提交(commit),验证它是否好用是非常容易的。某人可以从互联网上声称这段代码会优化得更好、给出更好的性能——你直接就能检查。
**Andrej Karpathy:** [clears throat]
**Host:** 但检查本身可能也需要不少工作。
**Host:** labs. Like why not?
**Andrej Karpathy:** 根本上来说他们可以撒谎之类的。所以你面对的其实是一个类似的问题——我设计的包含不受信任工人池的方案,看起来有点像区块链。只不过区块变成了 commit,这些 commit 可以互相构建,包含你在改进过程中的代码变更。工作量证明基本上就是做大量实验来找到管用的 commit。这很难,然后奖励就是登上排行榜——目前没有任何金钱奖励。我不想把这个类比推得太远,但本质上是这样:大量搜索才能找到答案,但验证候选方案是否好用非常便宜。因为有人可能试了一万个想法,但你只需要检查他们最终交出来的东西是否管用,因为九千九百多个都没用。
长话短说——你需要设计一个系统,让不受信任的工人池能和受信任的、做验证的工人池协作。整个系统是异步的、能正常运转、而且在安全角度是安全的——因为别人发给你任意代码然后你要执行它,这是非常可疑的。但从根本上说这应该完全可行。你熟悉 SETI@home 和 Folding@home 这些项目吧?所有这些问题都有类似的结构。Folding@home 是折叠蛋白质——找到低能量构型非常难,但如果有人找到了一个他们评估为低能量的构型,太好了,你可以直接用,验证起来很容易。很多东西都有这个性质——产生答案非常昂贵,但验证答案非常便宜。在这些情况下,Folding@home、SETI@home 或者 "auto research at home" 都是很好的范式。
所以长话短说——互联网上的一群 agent 可以协作改进 LLM,甚至有可能跑赢前沿实验室。谁知道呢?也许这是可能的。前沿实验室有大量受信任的算力,但地球大得多,有大量不受信任的算力。如果你建立了合适的检查机制,也许外面的群体确实能找到更好的方案。人们可以贡献算力周期给他们在意的事情。
所以——很多公司之类的可能有自己关心的方向,如果你有算力余量,你可以贡献给不同的 auto research 赛道。也许你关心某种特定类型的癌症——你不一定要捐钱给一个机构,你可以买算力然后加入那个项目的 auto research 群体。如果一切都重新整合进 auto researcher,那算力就成了你往池子里贡献的东西。
**Andrej Karpathy:** g to be maybe more um and maybe that's where a lot of the frontier closed intelligence is where going to are going to be interacting with. And open-source kind of like going to eat through a lot of the more basic use cases or something like that. You know, at some point what is frontier today is going to be, you know, probably later this year what's frontier today in terms of what I'm using right now from the closed labs uh might be open-source and that's going to be doing a lot of work. So I kind of expect that this dynamic will actually basically continue. Like we'll have frontier labs that have closed um AIs that are kind of like these oracles, and then we'll have open-source kind of like behind with some amount of months. And I kind of expect that to uh to continue. And I actually think that's like a pretty pretty good setup uh overall. Um because I I'm a little bit hesitant of having um I don't actually think it's like structurally I think there's some systemic risk attached to just having intelligence that are closed and that's like that's it.
**Host:** 这非常鼓舞人心。也很有意思——至少有一部分人,无论是硅谷的还是在中国零售店排队的,都发现拥有个人算力又变得有意思了。
**Host:** Mhm.
**Andrej Karpathy:** 对。
**Andrej Karpathy:** And I think that that's a, you know, centralization has a very poor track record in my view uh in in the past and has um
**Host:** 也许他们是为了自己的 claw 而有动力去做的,然后他们还可以贡献给 auto research。
**Host:** You mean like in political or economic systems in in general. [laughter]
**Host:** 就像美元是每个人都在意的东西——但浮点运算量(flop)会不会才是未来每个人真正在意的东西?会不会发生某种"翻转"(flipening),让大家在意的东西从财富变成算力?比如现在就算你有钱也很难拿到算力。
**Host:** Exactly.
**Andrej Karpathy:** 所以实际上 flop 在某种意义上似乎是主导性的。[笑] 嗯,也许有点像那样——你掌控多少 flop 而不是掌控多少财富。我其实不认为这完全是真的,但想想挺有意思。
**Andrej Karpathy:** I think there's like a lot of like pretty
**Host:** 你最近发布的东西好像是一小段就业数据分析,对吧?好像还触到了一些人的神经,虽然你只是在可视化一些公开数据。
**Host:** an Eastern European.
**Andrej Karpathy:** 对。
**Andrej Karpathy:** A lot of pretty bad precedents, so I want there to be a thing that is maybe not at the edge of capability because it's new and unexplored, etc. But I want there to be a thing that's behind and that uh is kind of like a common working space for intelligences that the entire industry has access to. Yeah, that seems to me like a pretty decent power balance for the industry.
**Host:** 你当时是出于什么好奇?
**Host:** Yeah. I also think there's just like there are many problems to solve, right? Like if you keep advancing intelligence from the frontier, we can do new things and there are a lot of like very big problems for humanity, right? And so like it seems that that will continue to be a very expensive game. And so I want to like root for labs that are doing that because there are problems we cannot solve without continuing to advance the models in a very expensive way. And yet, as you point out, like if what we have today as frontier is open, that's a lot of capability, right? And and so I I I think, you know, the power of that or the democratization of that seems like
**Andrej Karpathy:** 我就是想看看——每个人都在认真思考 AI 对就业市场的影响和未来的走向。所以我就想看看就业市场到底长什么样——不同岗位有哪些,各个职业有多少人。我真的很有兴趣逐个看看,然后自己想想:有了这些 AI 以及它们可能的演变方向,这些 AI 会是人们使用的工具呢,还是会取代这些职业?现在的职业会怎么变——是增长还是大幅调整?会不会出现新职业?这其实就是给我自己关于这个行业的思考链提供素材。就业数据来自劳工统计局(Bureau of Labor Statistics),他们实际上对每个职业都有未来大约十年的增长展望——我记得是十年,但这是 2024 年做的。
**Andrej Karpathy:** Yeah.
**Host:** 我们需要大量的医疗护理人员。
**Host:** very useful and also healthy.
**Andrej Karpathy:** 对。他们已经做了这些预测,我不确定他们具体用了什么方法论。我比较感兴趣的是给这些数据加上一层颜色——如果人们认为当前主要在发展的是这种数字化 AI,它就像某种幽灵或精灵实体,能在数字世界中交互、操纵大量数字信息,但目前还没有物理具身或存在。而物理方面的进展大概会慢一些,因为你在操纵的是原子。翻转比特和复制粘贴数字信息的能力比加速物质快了一百万倍。所以从能量角度讲,我觉得我们会在数字空间看到巨量的活动——大量的重写、大量的活动,像沸腾的汤一样。数字空间的发展速度会是光速级别的,而物理世界在某种程度上会落后。
所以我觉得目前存在一种悬置(overhang)——在数字信息处理方面可以发生大量的释放。以前是由计算机和人来做的,现在有了 AI 这个第三类数字信息操纵者。那些领域会有大量重构。但物理世界会落后一段时间。所以我特别关注那些本质上操纵数字信息的职业——那些你在家就能做的工作。我觉得那些职业会发生变化。这不意味着那些工作会变少或变多——因为这跟需求弹性和很多其他因素有关。但因为这些新工具、因为人类超级有机体的神经系统升级——
**Andrej Karpathy:** Yeah. I think basically by accident we're actually like in an okay spot.
**Host:** [笑]
**Host:** An optimal. Yeah. [laughter]
**Andrej Karpathy:** ——如果你想这么理解的话——这些职业会发生变化。
**Andrej Karpathy:** Yeah. Like by accident we we are it happened to be in a good spot in a certain sense. Mhm. Um Well, and and to some degree the the longer this endures, like this dynamic, um the the the healthier of a spot like the ecosystem might be in, right? Because you have more and more area under the curve.
**Host:** 基于你看到的数据,你对那些面临就业市场、正在想学什么或培养什么技能的人有什么观察或建议吗?
**Host:** Mhm. And I will say that even on the closed side, I I almost feel like it's been like even further centralizing recently because I think a lot of the frontrunners are like not necessarily like the top tier. And so uh yeah, like in that sense I think it's um it's not super ideal. I would love there to be more more frontier labs because yeah, I'm like by default very suspicious of like um I want there to be more people in the room. I want I think like in machine learning ensembles always outperform any individual model. And so I want there to be ensembles of people thinking about all the hardest problems and I want there to be ensembles of people in the room when they um to be all well informed and to make those decisions, you know, so uh I don't want it to be like a closed doors with two people or three people. I feel like that's like not a good not a good future. I almost wish like there were more labs as long as they're short and I I I do think that open-source has a has a has a place to play. I hope it sticks around and I basically I it's currently slightly behind and it's actually kind of like a good thing.
**Host:** 我现在很庆幸我的工作需要亲自见人。
**Host:** Okay, you worked on the precursor to generalized robotics autonomy um in cars, right? Uh a a lot has happened in the last couple months with robotics companies as well, like acceleration of really impressive generalization of environment, of tasks, like increasingly long horizon tasks, lots of money going into the space. Like, is it going to happen? Has anything in your view changed recently?
**Andrej Karpathy:** [笑] 对,更多物理性的。
**Andrej Karpathy:** Uh so like my view is kind of informed by what I saw in self-driving and I do feel like self-driving is the first robotics application. So probably what I saw is at the time, like 10 years ago, there were a large number of startups. And I kind of feel like um like most of them basically like didn't long-term make it. Um and what I saw is that like a lot of capital expenditure had to go in and a lot of time. And so um I think it's like I think robotics, because it's so difficult, is so messy, and requires a huge amount of capital investment, and a lot of like conviction. Um just it's like a big problem and I think atoms are really hard. So I kind of feel like they will lag be it will lag behind what's going to happen in digital space. And in digital space there's going to be a huge amount of unhobbling, uh basically like things that weren't super efficient becoming a lot more efficient by like a factor of a hundred.
**Host:** 你的工作能远程做吗?
**Host:** Mhm. Because bits are so much easier. And so I think currently in terms of what's going to change and like where the activity is, I kind of feel like digital space is going to like change a huge amount. And then the physical space will lag behind. And what I find very interesting is like this interface in between them as well. Because I think in this like if you we do have more agents acting on behalf of humans and more agents kind of like talking to each other and and doing tasks and participating in kind of economy of agents, etc. Um you're going to run out of things that you're going to do purely in the digital space. At some point you have to go to the universe and you have to ask it questions. Um you have to run an experiment and see what the universe tells you to get back to learn something. And so we currently have a huge amount of like digital work uh because there's an overhang in how much we collectively thought about what already is digital. So we just didn't have enough thinking cycles among the humans to think about all the information that is already digital and already uploaded. Um and so we're going to start running out of stuff that is actually like um already up uploaded. Uh so you're going to at some point read all the papers and process them and have some ideas about what to try, but um yeah, we're just going to uh I don't actually know how much you can like get intelligence that's like fully closed off and was just information that's available in the you know. And so I think what's going to happen is first there's going to be a huge amount of unhobbling and I think there's a huge amount of work there. Then actually it's going to move to like the interfaces between physical and digital. So I and that's like sensors of like seeing the world and actuators of like doing something to the world.
**Host:** 可以。涉及关系维护的部分比较难,但大部分可以。
**Host:** Mhm.
**Andrej Karpathy:** 我觉得真的很难说,因为就业市场太多样了。答案可能会因情况而异。但在很大程度上,这些工具非常新、非常强大,所以仅仅是跟上它们的步伐就是第一要务。很多人要么对它不屑一顾——
**Andrej Karpathy:** So I think a lot of interesting companies will actually come from that interface of like can we feed the superintelligence in a certain sense uh data and can we actually like take data out and manipulate the physical world um per its bidding if you want to like anthropomorphize the whole thing, right? And then the the physical world actually I almost feel like the the total addressable market, etc. in terms of like the amount of work and so on is is massive, possibly even much larger maybe what can happen in digital space. So actually think it's like a much bigger opportunity as well. But um I do feel like it's a huge amount of work and and in my in my mind the atoms are just like a a million times harder. So um so it will lag behind, but it's also I think a little bit of a bigger market. So it's kind of like uh yeah, I think the opportunity is kind of like follow that kind of trajectory. So right now is digital is like my main interest. Then interfaces will be like after that and then maybe like some of the physical things um like their time will come and they'll be huge when they do come.
**Host:** 要么害怕。
**Host:** Well, it's it's it's an interesting framework for it, too, because uh certain things, not the things I'm working on right now, but certain things are much easier even in the world of atoms.
**Andrej Karpathy:** 对,要么害怕,这当然完全可以理解。我觉得它目前从根本上来说是一个赋能工具。这些工作是由一组任务组成的,其中一些任务可以因此快很多。人们应该首先把它看作它现在就是的东西——一个工具。至于长期前景,不确定性很大。说实话,很难预测。而且我并不是专门做预测的。这应该是经济学家的工作。
**Andrej Karpathy:** Mhm.
**Host:** 但你是工程师。有一点我觉得有意思——工程岗位的需求还在继续增长。
**Host:** Right? Like if you just think about like read and write to the physical world, like read, like sensors, cameras, like there's a lot of existing hardware and you can imagine like enriching agent capabilities or capturing a lot of new data if you just clever about it and like you don't necessarily have to invest a lot to like get something valuable.
**Andrej Karpathy:** 对。我不确定这是不是一个暂时的现象。
**Andrej Karpathy:** Yeah.
**Host:** 你知道为什么吗?
**Host:** Right. Yeah.
**Andrej Karpathy:** 这就像需求弹性——软件一直是稀缺的,我们对软件的需求不够多的原因只是因为它太稀缺、太贵了。如果门槛降下来,你就会遇到杰文斯悖论(Jevons paradox)——软件的需求其实会上升。它更便宜了,也更强大了。经典的例子是 ATM 和银行柜员——当年很多人害怕 ATM 和计算机会取代柜员。但实际发生的是,它们让银行分行的运营成本大大降低了,所以出现了更多分行,因此也有了更多柜员。这就是杰文斯悖论的经典案例——某样东西变便宜了,所以释放了大量被压抑的需求。
所以我确实持一种审慎乐观的看法——在软件工程领域,对软件的需求会极其巨大,只是变得更便宜了。所以在相当长的时间内——虽然很难预测——但至少从目前来看,对软件的需求会更多。因为软件太棒了——数字信息处理。你不再被迫使用别人给你的、各方面都不完美的工具,你也不必受限于已有的东西。代码现在是临时的,可以改变、可以修改。我觉得在数字空间会有大量活动来重新连接一切。这会创造大量需求。
当然长期来看,连 auto research 这样的东西——OpenAI 或 Anthropic 这些实验室雇了大概一千来个研究员,对吧?
**Andrej Karpathy:** So like examples of this that I saw for example are, you know, um a friend of mine, Liam, is running is a CEO of Periodic. I visited them last week. Yeah. So it was just on top of mind. Like they're trying to do auto research for materials science. Mhm. Um and so in that case it's like the sensors to the intelligence are actually like pretty expensive lab equipment. And the same is true in biology. I think a lot of people are very interested in engineering biology and, you know, the sensors will be more than just like video cameras. Does that make sense? And then the other thing I was I saw for example is companies that are trying to have um like you basically pay people for training data. Yeah. Yeah. Yeah. Yeah.
**Host:** 嗯。
**Host:** To feed the Yeah.
**Andrej Karpathy:** 这些研究员基本上就是——美化版的 auto……你知道的。
**Andrej Karpathy:** programmatically.
**Host:** [笑]
**Host:** Yeah. To feed to feed the Borg. Uh um and so like these are all examples of like sensors in a certain sense. So they take many diverse shapes and forms if that makes sense.
**Andrej Karpathy:** 他们在主动让自己自动化,这就是他们都在努力做的事情。
**Andrej Karpathy:** Mhm.
**Host:** 其中一些研究员也感到恐惧——也有那种精神错乱感,对吧?因为它在起作用。所以他们也会想"我也完了"。
**Host:** Yeah, so I'm looking forward to the point where I can ask for a task in the physical world and I can put a price on it and just tell the agent like, you know, you figure out how to do it. Go get the data.
**Andrej Karpathy:** 我在 OpenAI 的时候确实花了不少时间到处跟人说:"你们意识到如果我们成功了,我们所有人都要失业吗?这就是在给 Sam 或者董事会搭建自动化。" 我也不确定具体是谁,但他们就是在搭建所有这些自动化。然后我们都没工作了,也许以后在旁边贡献一下。从这个角度看确实令人不安。
**Andrej Karpathy:** I'm actually kind of surprised we don't have enough like information markets. Mhm. Like if for example if Polymarket or other betting markets or even stocks, etc. If they have so much autonomous activity and rising amount of activity,
**Host:** 我可以问你 Noam 的问题吗?你可以在做那些事情的——大规模算力的 auto research,在前沿实验室和一群同事一起做。为什么不呢?
**Host:** Mhm.
**Andrej Karpathy:** 嗯,我在里面待过一段时间,也确实回去过。所以在某种程度上我同意。但这个问题可以从很多角度来切入,有点复杂。我会说——我非常认可人在前沿实验室之外能产生的影响力,不只是在行业里,也包括更偏生态系统层面的角色。比如你的角色就更偏生态系统层面,我目前的角色也是。我觉得在那种角色里人们可以产生非常大的影响。
反过来说,在我看来,完全绑定于前沿实验室确实存在一些问题。根本上来说——你在这些前沿实验室有巨大的经济激励。而你自己也承认,AI 将以非常剧烈的方式改变人类和社会。然而你就在那里搭建这项技术、从中获益、通过经济手段与它深度绑定。这恰恰是 OpenAI 最初成立时核心要解决的矛盾。而这个矛盾——
**Andrej Karpathy:** like um why should like for example if Iran was just happening now, like how come there isn't a process where like taking a photo or video from somewhere in Tehran should cost like 10 bucks? Like someone should be able to pay for that, you know, like and that's an example of like feeding the intelligence. There's not going to be a human looking at it, it's going to be like agents who are trying to guess the betting games and stock markets and so on.
**Host:** 还是没有完全解决。
**Host:** Mhm.
**Andrej Karpathy:** 对,还没完全解决。这是第一点。你不是一个完全自由的主体,你没法以完全自主、自由的方式参与那些讨论。在前沿实验室内部——有些东西你不能说,反过来也有些东西组织希望你说。他们不会强迫你,但你能感受到你应该说什么的压力——否则就是非常尴尬的对话、奇怪的侧目,"你在干嘛"之类的。所以你没法真正做一个独立的主体。我觉得在前沿实验室外面,我在某种意义上跟人类更 aligned——因为我不受那些压力,我可以想说什么就说什么。
在前沿实验室里当然也能产生影响。里面有很多研究员,也许你是其中之一,也许你的想法确实很好。也许有很多决策要做,你想在那些关键对话发生时在场。我觉得目前总体来说利害关系还不是特别高,所以一切还比较和谐。但最终——当利害关系真的很高的时候——如果你是一个组织的员工,我其实不确定你对那个组织要做什么事情能有多大的影响力。根本上来说——你在场、你贡献想法,但你并不真正掌控你所属的那个实体。这些是某种程度上的利益错位来源。
不过在另一方面我也很同意那个看法——不管怎么说,实验室是不透明的,大量工作在那里进行。它们处在能力和可能性的前沿,在研究即将到来的东西。如果你在前沿实验室外面,你的判断力会不可避免地偏移,因为你不了解即将到来的东西。我觉得我的判断力也会不可避免地偏移。我不会真正理解这些系统在底层是怎么工作的。那是个不透明的系统。我不会很好地理解它将如何发展。所以从这个意义上我同意——这也是我担忧的事情。我觉得保持与实际发生的事情的联系是有价值的,实际待在一个前沿实验室里是有价值的。如果某些前沿实验室愿意让我来待一段时间,做一些真正有价值的工作,然后来来去去——
**Andrej Karpathy:** So I kind of feel like the agentic web is still like fairly new, but there's no like mechanisms for this, but this is an example of what I I think might happen. Uh there's a good book that maybe is inspiring called Daemon.
**Host:** 你在找工作!这太棒了。[笑]
**Host:** Mhm.
**Andrej Karpathy:** 我觉得这也许是一个不错的安排——这样我能连接到实际发生的事情,但又不觉得自己完全被那些实体控制。说实话,我觉得 Noam 在 OAI 能做出非常好的工作,但他最有影响力的工作也很可能在 OpenAI 之外。
**Andrej Karpathy:** You potentially read it. In Daemon, the intelligence um ends up like puppeteering almost a little bit like humanity in a certain sense, you know? And so, humans are kind of like it's actuators, but humans are also like its sensors. Um and so, I think like collectively like society will kind of like reshape in a certain way in uh to to serve that kind of a that will kind of like end up happening collectively across the industry. Where yeah, there's just a lot more automation and it has certain needs and kind of humans will be serving those needs of that of that machine, not necessarily like to each other.
**Host:** Noam,这是在号召你做一个拥有 auto research 的独立研究员。[笑]
**Host:** Well, we were um on this very specific point of uh like missing pieces of training data. We needed um we needed something like auto research, right? Like we we need the training cycle or the SFTP piece to be uh far more mechanized.
**Andrej Karpathy:** 对,外面有很多事情可以做,我觉得最理想的方案也许是来回切换。你在两个地方都能产生令人惊叹的影响力。所以很复杂,我不知道。这是一个相当沉重的问题。但我确实加入过前沿实验室,也离开了。也许将来我还会再加入。大概就是这么看的。
**Andrej Karpathy:** Mhm.
**Host:** 一个相关的问题是——世界或者 AI 生态系统对前沿有多少可见性——也就是开源离前沿有多近,以及这是否可持续。
**Host:** For for which part?
**Andrej Karpathy:** 我觉得这整个事件序列确实相当令人惊讶。一系列来自中国和全球的模型——我觉得近期还会继续发布更多——在能力上比行业预期的要接近得多。
**Andrej Karpathy:** In order to make the uh collection like to in order to take the human out of the loop to ask for a task that is just like improve my model quality with new data, right? Uh yes. Does that make sense to you? Like we um if you can't have the model do the training runs by itself, then your ability to do this as a like closed loop task with uh by pricing data is um more challenged.
**Host:** 你对此感到惊讶吗?你是开源的长期贡献者,你对这个问题的预测是什么?
**Host:** Yes, yes, 100%. Yeah. But now you do.
**Andrej Karpathy:** 大致来说,封闭模型是领先的,但人们一直在监测开源模型落后的月数。从什么都没有,到 18 个月——
**Andrej Karpathy:** The thing is for LLM training, it actually is like very easily it like really fits the paradigm.
**Host:** 然后开始收敛。
**Host:** Mhm.
**Andrej Karpathy:** 对,然后也许落后 8 个月、6 个月、8 个月这样。我是开源的超级支持者。比如在操作系统领域,你有封闭的 Windows 和 Mac OS——这些是大型软件项目,有点像 LLM 将要变成的样子——然后有 Linux。但 Linux 实际上非常成功,运行在绝大多数计算机上。上次我看的数据好像是 60% 来自 Linux。因为行业确实需要一个每个人都觉得可以安全使用的通用开放平台。我觉得现在也一样。企业确实有这种需求。
最大的区别是——这里面有大量的资本支出(capex)。所以这是事情变得有点难以竞争的地方。我确实觉得当前的开源模型已经很好了。另一件我觉得很有意思的事是——对于绝大多数消费级用例来说,即使是当前的开源模型其实也相当好。往后看几年,我觉得大量简单用例都会被很好地覆盖,甚至可以在本地运行。但始终会有对前沿智能的需求——那一块可能非常大。但有可能对前沿智能的需求会集中在诺贝尔奖级别的工作——
**Andrej Karpathy:** Um so, you'd actually expect
**Host:** 嗯。
**Host:** metric.
**Andrej Karpathy:** ——或者"把 Linux 从 C 转成 Rust"这种大项目。会是更大的项目、更大范围的东西。也许前沿封闭智能主要在那里发挥作用。而开源会吃掉大量更基础的用例。某种程度上,今天的前沿到今年晚些时候可能就变成开源了——而那些东西就能做大量工作。
所以我基本上预期这种动态会继续:前沿实验室有封闭 AI,像某种神谕;然后开源落后几个月。我预期这会持续下去。我其实觉得这是一个相当不错的格局。因为我对只有封闭智能的局面是有一些不安的。我觉得这存在一些系统性风险。中心化在我看来历史记录很差——
**Andrej Karpathy:** Yeah, like LLM training actually fits the paradigm really well, really easily. Like all the optimization of all the code and so, it runs faster. And then you also have like metrics that you can optimize against. I do think that if you had an autonomous loop over those metrics, there's going to be a lot of like good herding going on where the system will like overfit to those metrics. And so, um but then you can use the system to devise more metrics and you just have a really good coverage. So, it's kind of hard to tell, but um in a certain sense it's like a pretty pretty good fit.
**Host:** 你是说政治或经济体系吗?
**Host:** I want to talk about a little uh tiny side project you have before we end. Um tell me about the micro GPT arts.
**Andrej Karpathy:** [笑] 没错。有很多相当糟糕的前例。所以我希望有一个东西——它也许不在能力的最前沿,因为那是新的、未被探索的——但我希望有一个落后一些的东西,是整个行业都能使用的通用智能工作空间。这在我看来是一个相当不错的行业权力平衡。
**Andrej Karpathy:** Oh, yeah. Okay, so micro GPT. So, I have this like running obsession of like maybe a decade or two of just like simplifying and boiling down the uh basically LLMs uh to like their bare essence. And I've had a number of projects along these lines. So, like nano GPT and um make more and uh micro GPT micro grad etc. So, I feel like micro GPT is now the state of the art of me trying to like just boil it down to just the essence. Because the thing is like training neural nets and LLMs specifically um is a huge amount of code, but all of that code is actually complexity from efficiency. It's just because you need it to go fast. If you don't need it to go fast and you just care about the algorithm, then that algorithm actually is uh 200 lines of Python, very simple to read. And this includes comments and everything. Um because you just have like uh your data set which is a text um and you need your neural network architecture which is like 50 lines. You need to do your forward pass and then you have to do your backward pass to calculate the gradients. And so, an auto grad engine uh to calculate the gradients like 100 lines. And then you need an optimizer and Adam for example, uh which is a very state of the art optimizer is like again 10 lines, really. And so, putting everything together in the training loop is like yeah, 200 lines. And what's interesting to me like normally before like maybe a year ago or more, if I had come up with micro GPT, I would be tempted to basically explain to people. Like I have a video like stepping through it or something like that. Uh and I actually tried to make that video a little bit. And I tried to make like a little guide to it and so on. But I kind of realized that this is is not really is not really adding too much because people cuz it's already so simple that it's 200 lines that anyone could ask their agent to explain it in various ways. And the agents like I'm not explaining to people anymore. I'm explaining it to agents. If you can explain it to agents, then agents can be the router and they can actually target it to the human in their language uh with infinite uh you know, patience and uh just at their capability and so on.
**Host:** 我也觉得——有很多问题需要解决。如果你持续从前沿推进智能,我们能做到新的事情,而人类有很多非常重大的问题。所以这将继续是一个非常昂贵的游戏。我想支持那些在做这件事的实验室,因为有些问题不继续以昂贵的方式推进模型就无法解决。但正如你指出的,如果今天的前沿变成开源,那就是大量的能力——这种民主化看起来——
**Host:** Right. If I don't understand um this particular function, I can ask the agent to explain it to me like three different ways and I'm not going to get that from you.
**Andrej Karpathy:** 对。
**Andrej Karpathy:** Exactly. And so, I kind of feel like, you know, what is education? Like it used to be guides, it used to be lectures, it used to be this thing, but now I feel like now more I'm explaining things to agents and maybe I'm coming up with skills uh where like um uh so, basically skill is just a way to instruct the agent how to teach the thing. So, maybe I could have a skill for micro GPT of the progression I imagine the agent should take you through if you're interested in understanding the codebase. And it's just like hints to the model to like uh first start off with this and then with that. And so, I could just script the curriculum a little bit as a skill. Uh so, uh so, I I don't feel like um yeah, I feel like there's going to be less of like explaining things directly to people and it's going to be more of just like does the agent get it? And if the agent gets it, they'll do the explanation. And we're not fully there yet because they I still can I still think I can probably explain things a little bit better than the agents, but I still feel like the models are improving so rapidly that um I feel like it's a losing battle to some to some extent. Um and so, I think education is going to be kind of like reshuffled by this quite substantially uh where it's the end of like teaching each other things a little bit like if I have a um library for example of code or something like that. It used to be that you have documentation for other people who are going to use your library, but like you shouldn't do that anymore. Like you should have instead of HTML documents for humans, you have markdown documents for agents. Cuz if agents get it, then they can just explain all the different parts of it. So, it's this redirection through agents, you know? Um and that's why. So, I think we're going to see a lot more of that playing out.
**Host:** ——非常有用,也很健康。
**Host:** Well, we'll see if the great teachers know like to develop intuition for how to explain things to agents differently.
**Andrej Karpathy:** 我觉得基本上是误打误撞之下,我们其实处在一个还不错的位置上。
**Andrej Karpathy:** ultimately, so for example, micro GPT, like I asked I tried to get an agent to write micro GPT. So, I told it like try to boil down the simplest things. Like try to boil down my um neural network training to the simplest thing and it can't do it. Like micro GPT is like my is it's like my end of my obsession. It's the 200 lines. I thought about this for a long time. I was obsessed about this for a long time. This is this is the solution. Trust me, it can't get simpler. And this is this is my value add. Everything else like agent gets it. It just can't come up with it, but it totally gets it and understands why it's done in a certain way etc. Uh so, like my contribution is kind of like these few bits, but everything else in terms of like the education that goes on after that is like not my domain anymore. So, maybe yeah, it's like education kind of changes in those ways where you kind of have to infuse the few bits that you feel strongly about the curriculum or the the best the better way of explaining it or something like that. The things that agents can't do is your job now. The things that agents can do, they can probably do better than you or like very soon. And so, you should um be strategic about what you're actually spending time on.
**Host:** 最优的。[笑]
**Host:** Well, we appreciate the few bits. Thank you, Andre. Okay. Find us on Twitter at No Priors Pod.
**Andrej Karpathy:** 对。[笑] 误打误撞地处在了一个不错的位置。而且这种动态持续得越久,生态系统可能就越健康——因为曲线下面积越来越大。
不过我要说,即使在封闭这一侧,最近也有进一步集中化的趋势——一些原来的领跑者可能不一定还在第一梯队了。所以从这个意义上来说,情况不太理想。我希望有更多前沿实验室。因为我本能上就对——我希望有更多人在场。我觉得在机器学习里,集成学习(ensembles)总是优于任何单一模型。所以我希望有多组人在思考所有最难的问题,有多组人在关键时刻在场,都充分知情来做那些决策。我不希望是两三个人关起门来做决定。我觉得那不是一个好的未来。我几乎希望有更多实验室——只要它们是精简的。我确实觉得开源有自己的位置。我希望它能持续存在。目前它稍微落后,而这其实是一件好事。
**Andrej Karpathy:** [music] Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. [music] That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.